Skip to contents

Subset a data frame, selecting columns by matching or partially matching a vector of character strings. A convenience function to reduce the dimensions of a MIDFIELD data table at the start of a session by selecting only those columns typically required by other midfieldr functions. Particularly useful in interactive sessions when viewing the data tables at various stages of an analysis.

Usage

select_required(midfield_x, ..., select_add = NULL)

Arguments

midfield_x

Data frame from which columns are selected, typically student, term, degree or their subsets.

...

Not used, force later arguments to be used by name.

select_add

Optional character vector of search terms to add to the default vector given by c("mcid", "institution", "race", "sex", "^term", "cip6", "level").

Value

A data.table with the following properties:

  • Rows are not modified.

  • Columns with names that match or partially match the values in select.

  • Grouping structures are not preserved.

Details

Several midfieldr functions are designed to operate on one or more of the MIDFIELD data tables, usually student, term, or degree. This family of functions requires only a small subset of available variables, e.g., mcid, cip6, or term. The required columns are built in to the function. The select argument is used to add search strings to the default vector.

The column names of midfield_x are searched for matches or partial matches using grep(), thus search terms can include regular expressions. Variables with names that match or partially match the search terms are returned; all other columns are dropped. Rows are unaffected. Search terms not present are silently ignored.

One could use this function to select columns from a non-MIDFIELD data frame, but with no benefit to the user---conventional column selection syntax is better suited to that task. Here, we specialize the column selection to serve midfieldr functions.

Examples

# Default character vector for selecting columns
default_cols<- c("mcid", "institution", "race", "sex", "^term", "cip6", "level")

# Create one string separated by OR
search_pattern <- paste(default_cols, collapse = "|")

# Find names of columns matching or partially matching 
x <- select_required(toy_student) 
names(x)
#> [1] "mcid"        "institution" "race"        "sex"        
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE TRUE

x <- select_required(toy_term) 
names(x)
#> [1] "mcid"        "institution" "term"        "cip6"        "level"      
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE TRUE TRUE

x <- select_required(toy_degree) 
names(x)
#> [1] "mcid"        "institution" "term_degree" "cip6"       
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE TRUE

x <- select_required(toy_course) 
names(x)
#> [1] "mcid"        "institution" "term"       
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE

# Adding search terms
x <- select_required(toy_course, select_add = c("abbrev", "number", "grade")) 
names(x)
#> [1] "mcid"        "institution" "term"        "abbrev"      "number"     
#> [6] "grade"