Skip to contents

Subset a data frame, selecting columns by matching or partially matching a vector of character strings. A convenience function to reduce the dimensions of a MIDFIELD data table by selecting only those columns required by other midfieldr functions or that are required to form a composite key. Particularly useful in interactive sessions when viewing the data tables at various stages of an analysis.

Usage

select_required(midfield_x, ..., select_add = NULL)

Arguments

midfield_x

Data frame from which columns are selected.

...

Not used for passing values; forces subsequent arguments to be referable only by name.

select_add

Character vector of additional column names to return.

Value

A data frame that is a subset of the input with the following properties: rows are preserved; columns are preserved if their names match or partially match search terms; grouping structures are not preserved. An attempt is made to return a data frame of the same class as that of the input, e.g., a base R data.frame, a tibble-style enhanced data frame, or the midfieldr default data.table enhanced data frame.

Details

Several midfieldr functions require that their input data frames contain specific variables (column names) such as mcid or cip6. In addition, the MIDFIELD data tables have specific variables that act as keys or composite keys to the information in that table. All such are assembled in a character vector that comprises the default set of column names returned by select_required(). The default column set is c(mcid, institution, race, sex, term, term_course, term_degree, cip6, level, abbrev, number).

Additional column names or partial names can be included by using the select_add argument.

The column names of midfield_x are searched for matches or partial matches using grep(colnames, ignore.case = TRUE, value = TRUE), thus names that match or partially match search terms are returned; all other columns are dropped. Regular expressions can be used. Search terms not found are silently ignored.