Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

recode

Recode a variable


Description

Recodes a vector (numeric, character or factor) according to a set of rules. It is similar to the function recode() from package car, but more flexible. It also has similarities with the function findInterval() from package base.

Usage

recode(x, rules, cuts, values, ...)

Arguments

x

A vector of mode numeric, character or factor.

rules

Character string or a vector of character strings for recoding specifications.

cuts

A vector of one or more unique cut points.

values

A vector of output values.

...

Other parameters, for compatibility with other functions such as recode() in package car but also factor() in package base

Details

Similar to the recode() function in package car, the recoding rules are separated by semicolons, of the form input = output, and allow for:

a single value 1 = 0
a range of values 2:5 = 1
a set of values c(6,7,10) = 2
else everything that is not covered by the previously specified rules

Contrary to the recode() function in package car, this function allows the : sequence operator (even for factors), so that a rule such as c(1,3,5:7), or c(a,d,f:h) would be valid.

Actually, since all rules are specified in a string, it really doesn't matter if the c() function is used or not. For compatibility reasons it accepts it, but a more simple way to specify a set of rules is "1,3,5:7=A; else=B"

Special values lo and hi may also appear in the range of values, while else can be used with else=copy to copy all values which were not specified in the recoding rules.

In the package car, a character output would have to be quoted, like "1:2='A'" but that is not mandatory in this function, "1:2=A" would do just as well. Output values such as "NA" or "missing" are converted to NA.

Another difference from the car package: the output is not automatically converted to a factor even if the original variable is a factor. That option is left to the user's decision to specify as.factor.result, defaulted to FALSE.

A capital difference is the treatment of the values not present in the recoding rules. By default, package car copies all those values in the new object, whereas in this package the default values are NA and new values are added only if they are found in the rules. Users can choose to copy all other values not present in the recoding rules, by specifically adding else=copy in the rules.

Since the two functions have the same name, it is possible that users loading both packages to use one instead of the other (depending which package is loaded first). In order to preserve functionality and minimize possible namespace collisions with package car, special efforts have been invested to ensure perfect compatibility with the other recode() function (plus more).

The argument ... allows for more arguments specific to the car package, such as as.factor.result, as.numeric.result. In addition, it also accepts levels, labels and ordered specific to function factor() in package base. When using the arguments levels and / or labels, the output will automatically be coerced to a factor.

Blank spaces outside category labels are ignored, see the last example.

It is possible to use recode() in a similar way to function cut(), by specifying a vector of cuts which work for both numeric and character/factor objects. For any number of c cuts, there should be c + 1 values, and if not otherwise specified the argument values is automatically constructed as a sequence of numbers from 1 to c + 1.

Unlike the function cut(), arguments such as include.lowest or right are not necessary because the final outcome can be changed by tweaking the cut values.

Author(s)

Adrian Dusa

Examples

x <- rep(1:3, 3)
x
#  [1] 1 2 3 1 2 3 1 2 3

recode(x, "1:2 = A; else = B")
#  [1] "A" "A" "B" "A" "A" "B" "A" "A" "B"

recode(x, "1:2 = 0; else = copy")
#  [1] 0 0 3 0 0 3 0 0 3

set.seed(1234)
x <- factor(sample(letters[1:10], 20, replace = TRUE),
          levels = letters[1:10])
x
#  [1] b g g g i g a c g f g f c j c i c c b c
# Levels: a b c d e f g h i j

recode(x, "b:d = 1; g:hi = 2; else = NA") # note the "hi" special value
#  [1]  1  2  2  2  2  2 NA  1  2 NA  2 NA  1  2  1  2  1  1  1  1

recode(x, "a, c:f = A; g:hi = B; else = C", as.factor.result = TRUE)
#  [1] C B B B B B A A B A B A A B A B A A C A
# Levels: A B C

recode(x, "a, c:f = 1; g:hi = 2; else = 3", as.factor.result = TRUE,
       labels = c("one", "two", "three"), ordered = TRUE)
#  [1] three two   two   two   two   two   one   one   two   one  
# [11] two   one   one   two   one   two   one   one   three one  
# Levels: one < two < three  

set.seed(1234)
categories <- c("An", "example", "that has", "spaces")
x <- factor(sample(categories, 20, replace = TRUE),
            levels = categories, ordered = TRUE)
sort(x)
#  [1] An       An       An       An       An       example 
#  [7] example  example  example  that has that has that has
# [13] that has that has that has that has that has spaces  
# [19] spaces   spaces  
# Levels: An < example < that has < spaces

recode(sort(x), "An : that has = 1; spaces = 2")
#  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2

# single quotes work, but are not necessary
recode(sort(x), "An : 'that has' = 1; spaces = 2")
#  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2

# same using cut values
recode(sort(x), cuts = "that has")
#  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2

# modifying the output values
recode(sort(x), cuts = "that has", values = 0:1)
#  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

# more treatment of "else" values
x <- 10:20

# recoding rules don't overlap all existing values, the rest are empty
recode(x, "8:15=1")

# all other values are copied
recode(x, "8:15=1; else=copy")

admisc

Adrian Dusa's Miscellaneous

v0.12
GPL (>= 3)
Authors
Adrian Dusa [aut, cre, cph] (<https://orcid.org/0000-0002-3525-9253>)
Initial release
2021-03-16

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.