Utility functions for creating new variables from logicals describing the levels

derivedVariable(
  ...,
  .ordered = FALSE,
  .method = c("unique", "first", "last"),
  .debug = c("default", "always", "never"),
  .sort = c("given", "alpha"),
  .default = NULL,
  .asFactor = FALSE
)

derivedFactor(..., .asFactor = TRUE)

Arguments

...

named logical "rules" defining the levels.

.ordered

a logical indicating whether the resulting factored should be ordered Ignored if .asFactor is FALSE.

.method

one of "unique", "first", and "last". If "unique", exactly one rule must be TRUE for each position. If "first", the first TRUE rule defines the level. If "last", the last TRUE rule defines the level.

.debug

one of "default", "always", and "never", indicating whether debugging information should be printed. If "default", debugging information is printed only when multiple rules give conflicting definitions for some positions.

.sort

One of "given" (the default) or "alpha" or a vector of integers the same length as the number of levels indicating the order in which the levels should appear in the resulting factor. Ignored if .asFactor is FALSE.

.default

character vector of length 1 giving name of default level or NULL for no default.

.asFactor

A logical indicating whether the returned value should be a factor.

Details

Each logical "rule" corresponds to a level in the resulting variable. If .default is defined, an implicit rule is added that is TRUE whenever all other rules are FALSE. When there are multiple TRUE rules for a slot, the first or last such is used or an error is generated, depending on the value of method.

derivedVariable is designed to be used with transform() or dplyr::mutate() to add new variables to a data frame. derivedFactor() is the same but that the default value for .asFactor is TRUE. See the examples.

Examples

Kf <- mutate(KidsFeet, biggerfoot2 = derivedFactor(
                   dom = biggerfoot == domhand,
                   nondom = biggerfoot != domhand)
                   )
tally( ~ biggerfoot + biggerfoot2, data = Kf)
#>           biggerfoot2
#> biggerfoot dom nondom
#>          L   2     20
#>          R  11      6
tally( ~ biggerfoot + domhand, data = Kf)
#>           domhand
#> biggerfoot  L  R
#>          L  2 20
#>          R  6 11

# Three equivalent ways to define a new variable
# Method 1: explicitly define all levels
modHELP <- mutate(HELPrct, drink_status = derivedFactor( 
  abstinent = i1 == 0,
  moderate = (i1>0 & i1<=1 & i2<=3 & sex=='female') |
     (i1>0 & i1<=2 & i2<=4 & sex=='male'),
  highrisk = ((i1>1 | i2>3) & sex=='female') | 
      ((i1>2 | i2>4) & sex=='male'),
  .ordered = TRUE)
)
tally( ~ drink_status, data = modHELP)
#> drink_status
#> abstinent  moderate  highrisk 
#>        68        28       357 

# Method 2: Use .default for last level
modHELP <- mutate(HELPrct, drink_status = derivedFactor( 
  abstinent = i1 == 0,
  moderate = (i1<=1 & i2<=3 & sex=='female') |
     (i1<=2 & i2<=4 & sex=='male'),
  .ordered = TRUE,
  .method = "first",
  .default = "highrisk")
)
tally( ~ drink_status, data = modHELP)
#> drink_status
#> abstinent  moderate  highrisk 
#>        68        28       357 

# Method 3: use TRUE to catch any fall through slots
modHELP <- mutate(HELPrct, drink_status = derivedFactor( 
  abstinent = i1 == 0,
  moderate = (i1<=1 & i2<=3 & sex=='female') |
     (i1<=2 & i2<=4 & sex=='male'),
  highrisk=TRUE,
  .ordered = TRUE,
  .method = "first"
  )
)
tally( ~ drink_status, data = modHELP)
#> drink_status
#> abstinent  moderate  highrisk 
#>        68        28       357 
is.factor(modHELP$drink_status)
#> [1] TRUE

modHELP <- mutate(HELPrct, drink_status = derivedVariable( 
  abstinent = i1 == 0,
  moderate = (i1<=1 & i2<=3 & sex=='female') |
     (i1<=2 & i2<=4 & sex=='male'),
  highrisk=TRUE,
  .ordered = TRUE,
  .method = "first"
  )
)
is.factor(modHELP$drink_status)
#> [1] FALSE