Remove rows with missing values on columns specified
This is a data.table
method for the S3 generic stats::na.omit
. The internals are written in C for speed. See examples for benchmark timings.
bit64::integer64
type is also supported.
## S3 method for class 'data.table' na.omit(object, cols=seq_along(object), invert=FALSE, ...)
object |
A |
cols |
A vector of column names (or numbers) on which to check for missing values. Default is all the columns. |
invert |
logical. If |
... |
Further arguments special methods could require. |
The data.table
method consists of an additional argument cols
, which when specified looks for missing values in just those columns specified. The default value for cols
is all the columns, to be consistent with the default behaviour of stats::na.omit
.
It does not add the attribute na.action
as stats::na.omit
does.
A data.table with just the rows where the specified columns have no missing value in any of them.
DT = data.table(x=c(1,NaN,NA,3), y=c(NA_integer_, 1:3), z=c("a", NA_character_, "b", "c")) # default behaviour na.omit(DT) # omit rows where 'x' has a missing value na.omit(DT, cols="x") # omit rows where either 'x' or 'y' have missing values na.omit(DT, cols=c("x", "y")) ## Not run: # Timings on relatively large data set.seed(1L) DT = data.table(x = sample(c(1:100, NA_integer_), 5e7L, TRUE), y = sample(c(rnorm(100), NA), 5e7L, TRUE)) system.time(ans1 <- na.omit(DT)) ## 2.6 seconds system.time(ans2 <- stats:::na.omit.data.frame(DT)) ## 29 seconds # identical? check each column separately, as ans2 will have additional attribute all(sapply(1:2, function(i) identical(ans1[[i]], ans2[[i]]))) ## TRUE ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.