Extract data from a simple XML document
This function can be used to extract data from an XML document (or sub-document) that has a simple, shallow structure that does appear reasonably commonly. The idea is that there is a collection of nodes which have the same fields (or a subset of common fields) which contain primitive values, i.e. numbers, strings, etc. Each node corresponds to an "observation" and each of its sub-elements correspond to a variable. This function then builds the corresponding data frame, using the union of the variables in the different observation nodes. This can handle the case where the nodes do not all have all of the variables.
xmlToDataFrame(doc, colClasses = NULL, homogeneous = NA, collectNames = TRUE, nodes = list(), stringsAsFactors = FALSE)
doc |
the XML content. This can be the name of a file containing
the XML, the parsed XML document. If one wants to work on a subset
of nodes, specify these via the |
colClasses |
a list/vector giving the names of the R types for the
corresponding variables and this is used to coerce the resulting
column in the data frame to this type. These can be named. This is similar to
the |
homogeneous |
a logical value that indicates whether each of the
nodes contains all of the variables ( |
collectNames |
a logical value indicating whether we compute the
names by explicitly computing the union of all variable names
or, if |
nodes |
a list of XML nodes which are to be processed |
stringsAsFactors |
a logical value that controls whether character vectors are converted to factor objects in the resulting data frame. |
A data frame.
Duncan Temple Lang
f = system.file("exampleData", "size.xml", package = "XML") xmlToDataFrame(f, c("integer", "integer", "numeric")) # Drop the middle variable. z = xmlToDataFrame(f, colClasses = list("integer", NULL, "numeric")) # This illustrates how we can get a subset of nodes and process # those as the "data nodes", ignoring the others. f = system.file("exampleData", "tides.xml", package = "XML") doc = xmlParse(f) xmlToDataFrame(nodes = xmlChildren(xmlRoot(doc)[["data"]])) # or, alternatively xmlToDataFrame(nodes = getNodeSet(doc, "//data/item")) f = system.file("exampleData", "kiva_lender.xml", package = "XML") doc = xmlParse(f) dd = xmlToDataFrame(getNodeSet(doc, "//lender"))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.