haven: read_dta – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

read_dta

Read and write Stata DTA files

Description

Currently haven can read and write logical, integer, numeric, character and factors. See labelled() for how labelled variables in Stata are handled in R.

Usage

read_dta(
  file,
  encoding = NULL,
  col_select = NULL,
  skip = 0,
  n_max = Inf,
  .name_repair = "unique"
)

read_stata(
  file,
  encoding = NULL,
  col_select = NULL,
  skip = 0,
  n_max = Inf,
  .name_repair = "unique"
)

write_dta(data, path, version = 14, label = attr(data, "label"))

Arguments

`file`	Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in `.gz`, `.bz2`, `.xz`, or `.zip` will be automatically uncompressed. Files starting with `http://`, `https://`, `ftp://`, or `ftps://` will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a vector of greater than length 1. Using a value of `clipboard()` will read from the system clipboard.
`encoding`	The character encoding used for the file. Generally, only needed for Stata 13 files and earlier. See Encoding section for details.
`col_select`	One or more selection expressions, like in `dplyr::select()`. Use `c()` or `list()` to use more than one expression. See `?dplyr::select` for details on available selection options. Only the specified columns will be read from `data_file`.
`skip`	Number of lines to skip before reading data.
`n_max`	Maximum number of lines to read.
`.name_repair`	Treatment of problematic column names: `"minimal"`: No name repair or checks, beyond basic existence, `"unique"`: Make sure names are unique and not empty, `"check_unique"`: (default value), no name repair, but check they are `unique`, `"universal"`: Make the names `unique` and syntactic a function: apply custom name repair (e.g., `.name_repair = make.names` for names in the style of base R). A purrr-style anonymous function, see `rlang::as_function()` This argument is passed on as `repair` to `vctrs::vec_as_names()`. See there for more details on these terms and the strategies used to enforce them.
`data`	Data frame to write.
`path`	Path to a file where the data will be written.
`version`	File version to use. Supports versions 8-15.
`label`	Dataset label to use, or `NULL`. Defaults to the value stored in the "label" attribute of `data`. Must be <= 80 characters.

Value

A tibble, data frame variant with nice defaults.

Variable labels are stored in the "label" attribute of each variable. It is not printed on the console, but the RStudio viewer will show it.

If a dataset label is defined in Stata, it will stored in the "label" attribute of the tibble.

write_dta() returns the input data invisibly.

Character encoding

Prior to Stata 14, files did not declare a text encoding, and the default encoding differed across platforms. If encoding = NULL, haven assumes the encoding is windows-1252, the text encoding used by Stata on Windows. Unfortunately Stata on Mac and Linux use a different default encoding, "latin1". If you encounter an error such as "Unable to convert string to the requested encoding", try encoding = "latin1"

For Stata 14 and later, you should not need to manually specify encoding value unless the value was incorrectly recorded in the source file.

Examples

path <- system.file("examples", "iris.dta", package = "haven")
read_dta(path)

tmp <- tempfile(fileext = ".dta")
write_dta(mtcars, tmp)
read_dta(tmp)
read_stata(tmp)

haven

Import and Export 'SPSS', 'Stata' and 'SAS' Files

v2.4.1

MIT + file LICENSE

Authors

Hadley Wickham [aut, cre], Evan Miller [aut, cph] (Author of included ReadStat code), RStudio [cph, fnd]

Initial release