Subset a data frame, selecting columns by matching or partially matching a vector of character strings. A convenience function to reduce the dimensions of a MIDFIELD data table at the start of a session by selecting only those columns typically required by other midfieldr functions. Particularly useful in interactive sessions when viewing the data tables at various stages of an analysis.
Arguments
- midfield_x
Data frame from which columns are selected, typically
student
,term
,degree
or their subsets.- ...
Not used, force later arguments to be used by name.
- select_add
Optional character vector of search terms to add to the default vector given by
c("mcid", "institution", "race", "sex", "^term", "cip6", "level")
.
Value
A data.table
with the following properties:
Rows are not modified.
Columns with names that match or partially match the values in
select.
Grouping structures are not preserved.
Details
Several midfieldr functions are designed to operate on one or more of the
MIDFIELD data tables, usually student
, term
, or degree.
This family of
functions requires only a small subset of available variables, e.g., mcid
,
cip6
, or term.
The required columns are built in to the function. The
select
argument is used to add search strings to the default vector.
The column names of midfield_x
are searched for matches or partial matches
using grep()
, thus search terms can include regular expressions. Variables
with names that match or partially match the search terms are returned; all
other columns are dropped. Rows are unaffected. Search terms not present are
silently ignored.
One could use this function to select columns from a non-MIDFIELD data frame, but with no benefit to the user---conventional column selection syntax is better suited to that task. Here, we specialize the column selection to serve midfieldr functions.
Examples
# Default character vector for selecting columns
default_cols<- c("mcid", "institution", "race", "sex", "^term", "cip6", "level")
# Create one string separated by OR
search_pattern <- paste(default_cols, collapse = "|")
# Find names of columns matching or partially matching
x <- select_required(toy_student)
names(x)
#> [1] "mcid" "institution" "race" "sex"
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE TRUE
x <- select_required(toy_term)
names(x)
#> [1] "mcid" "institution" "term" "cip6" "level"
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE TRUE TRUE
x <- select_required(toy_degree)
names(x)
#> [1] "mcid" "institution" "term_degree" "cip6"
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE TRUE
x <- select_required(toy_course)
names(x)
#> [1] "mcid" "institution" "term"
grepl(search_pattern, names(x))
#> [1] TRUE TRUE TRUE
# Adding search terms
x <- select_required(toy_course, select_add = c("abbrev", "number", "grade"))
names(x)
#> [1] "mcid" "institution" "term" "abbrev" "number"
#> [6] "grade"