Work with XML files using a simple, consistent interface. Built on top of the 'libxml2' C library.
You can install xml2 from CRAN,
or you can install the development version from github, using
library("xml2")x <- read_xml("<foo> <bar> text <baz/> </bar> </foo>")xxml_name(x)xml_children(x)xml_text(x)xml_find_all(x, ".//baz")h <- read_html("<html><p>Hi <b>!")hxml_name(h)xml_text(h)
There are three key classes:
xml_node: a single node in a document.
xml_doc: the complete document. Acting on a document is usually the same
as acting on the root node of the document.
xml_nodeset: a set of nodes within the document. Operations on
xml_nodesets are vectorised, apply the operation over each node in the set.
xml2 has similar goals to the XML package. The main differences are:
xml2 takes care of memory management for you. It will automatically free the memory used by an XML document as soon as the last reference to it goes away.
xml2 has a very simple class hierarchy so don't need to think about exactly what type of object you have, xml2 will just do the right thing.
More convenient handling of namespaces in Xpath expressions - see
xml_ns_strip() to get started.
xml_double() functions to make it easy to extract
integer and double text from nodes (@jimhester, #97, #99).
xml2 now supports modification and creation of XML nodes. New functions
and replacement methods for
xml_text() (@jimhester, #9 #76)
xml_ns() now keeps namespace prefixes that point to the same URI
(@jimhester, #35, #95).
read_html() methods added for
(@jimhester, #63, #93)
xml_child() function to make selecting children a little easier
(@jimhester, #23, #94)
xml_find_one() has been deprecated in favor of
(@jimhester, #58, #92)
xml_read() functions now default to passing the document's namespace
object. Namespace definitions can now be removed as well as added and
xml_ns_strip() added to remove all default namespaces from a document.
(@jimhester, #28, #89)
xml_read() gains a
options argument to control all available parsing
HUGE to turn off limits for parsing very large
documents and now drops blank text nodes by default, mimicking default
behavior of XML package. (@jimhester, #49, #62, #85, #88)
xml_write() expands the path on filenames, so directories can be specified
with '~/' (@jimhester, #86, #80)
xml_find_one() now returns a 'xml_missing' node object if there are 0
matches (@jimhester, #55, #53, hadley/rvest#82).
xml_find_lgl() functions added to
return numeric, character and logical results from XPath expressions. (@jimhester, #55)
xml_text() always correctly encode returned value as
Improved configure script - now works again on R-devel on windows.
Compiles with older versions of libxml2.,
Make configure script more cross platform.
xml_length() to count the number of children (#32).