Subset a data frame, selecting columns by matching or partially matching a vector of character strings. A convenience function to reduce the dimensions of a MIDFIELD data table by selecting only those columns required by other midfieldr functions or that are required to form a composite key. Particularly useful in interactive sessions when viewing the data tables at various stages of an analysis.
Value
A data frame that is a subset of the input with the following properties: rows are preserved; columns are preserved if their names match or partially match search terms; grouping structures are not preserved. An attempt is made to return a data frame of the same class as that of the input, e.g., a base R data.frame, a tibble-style enhanced data frame, or the midfieldr default data.table enhanced data frame.
Details
Several midfieldr functions require that their input data frames contain
specific variables (column names) such as mcid or cip6. In addition,
the MIDFIELD data tables have specific variables that act as keys
or composite keys to the information in that table. All such are assembled
in a character vector that comprises the default set of column names
returned by select_required(). The default column set is
c(mcid, institution, race, sex, term, term_course, term_degree, cip6, level, abbrev, number).
Additional column names or partial names can be included by using the
select_add argument.
The column names of midfield_x are searched for matches or partial matches
using grep(colnames, ignore.case = TRUE, value = TRUE), thus names
that match or partially match search terms are returned; all other
columns are dropped. Regular expressions can be used. Search terms not
found are silently ignored.