Import Data Directly From Excel
The package RDCOMClient
is used to open an Excel workbook and return the content (value) of one (or several) given range(s)
in a specified sheet. This is helpful, whenever pathologically scattered data on an Excel sheet, which can't simply be saved as CSV-file, has to be imported in R.XLGetWorkbook()
does the same for all the sheets in an Excel workbook.
XLGetRange(file = NULL, sheet = NULL, range = NULL, as.data.frame = TRUE, header = FALSE, stringsAsFactors = FALSE, echo = FALSE, na.strings = NULL, skip = 0) XLGetWorkbook(file, compactareas = TRUE) XLCurrReg(cell) XLNamedReg(x)
file |
the fully specified path and filename of the workbook. If it is left as |
sheet |
the name of the sheet containing the range(s) of interest. |
range |
a scalar or a vector with the address(es) of the range(s) to be returned (characters).
Use "A1"-address mode to specify the ranges, for example |
as.data.frame |
logical. Determines if the cellranges should be coerced into data.frames. Defaults to |
header |
a logical value indicating whether the range contains the names of the variables as its first line. Default is |
stringsAsFactors |
logical. Should character columns be coerced to factors? The default is |
echo |
logical. If set to |
na.strings |
a character vector of strings which are to be interpreted as |
compactareas |
logical, defining if areas should be returned by |
cell |
range of the left uppe cell, when current region should be used. |
x |
the name or the index of the XL-name to be used. |
skip |
the number of lines of the data file to skip before beginning to read data. |
The result consists of a list of lists, if as.data.frame
is set to FALSE
.
Be then prepared to encounter NULL
values. Those will prevent from easily being able to coerce
the square data structure to a data.frame.
The following code will replace the NULL
values by NA
and coerce the data to a data.frame.
# get the range D1:J69 from an excel file xlrng <- XLGetRange(file="myfile.xlsx", sheet="Tabelle1", range="D1:J69", as.data.frame=FALSE) # replace NULL values by NA xlrng[unlist(lapply(xlrng, is.null))] <- NA # coerce the square data structure to a data.frame d.lka <- data.frame(lapply(data.frame(xlrng), unlist))
This of course can be avoided by setting as.data.frame
= TRUE
.
The function will return dates as integers, because MS-Excel stores them internally as integers.
Such a date can subsequently be converted with the (unusual) origin of
as.Date(myDate, origin="1899-12-30")
. See also XLDateToPOSIXct
, which does the job. The conversion can directly be performed by XLGetRange()
if datecols
is used and contains the date columns in the sheet data.
If as.data.frame
is set to TRUE
, a single data.frame or a list of data.frames will be returned.
If set to FALSE
a list of the cell values in the specified Excel range, resp. a list of lists will be returned.
XLGetWorkbook()
returns a list of lists of the values in the given workbook.
Andri Signorell <andri@signorell.net>
## Not run: # Windows-specific example XLGetRange(file="C:/My Documents/data.xls", sheet="Sheet1", range=c("A2:B5","M6:X23","C4:D40")) # if the current region has to be read (incl. a header), place the cursor in the interesting region # and run: d.set <- XLGetRange(header=TRUE) # Get XL nameslist nm <- xl$ActiveWorkbook()$names() lst <- list() for(i in 1:nm$count()) lst[[i]] <- c(name=nm[[i]]$name(), address=nm[[i]]$refersToRange()$Address()) # the defined names as.data.frame(do.call(rbind, lst), stringsAsFactors = FALSE) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.