Undergoing major revision

Based on feedback from workshop attendees, the package is undergoing major revision to the vignettes and the underlying functionality.

While in this ambiguous state, the package should be used experimentally only. We hope to have the update complete by the end of September 2020.

Tools for student records research

The Multiple-Institution Database for Investigating Engineering Longitudinal Development (MIDFIELD) is a partnership of US higher education institutions with engineering programs. MIDFIELD contains registrar’s data for 1.7M undergraduates in all majors at 19 institutions from 1987–2019. The data are organized in four related tables: students, courses, terms, and degrees. A MIDFIELD sample is provided in the midfielddata package.

midfielddata (link) A stratified sample of MIDFIELD data. Contains data for 97,640 undergraduates at 12 institutions from 1987–2016 in four data sets: midfieldstudents, midfieldcourses, midfieldterms, and midfielddegrees.

midfieldr Tools for studying student records from midfielddata or the larger MIDFIELD database. Enables research in the intersectionality of race/ethnicity, sex, and discipline with metrics such as stickiness (retention by a discipline), migrator graduation rate, and migration yield (attraction of a discipline).

For MIDFIELD partner researchers: In making the midfielddata package public, confidentiality required some MIDFIELD variables to be anonymized and others to be omitted. Thus the midfielddata data dictionary is a subset of the MIDFIELD data dictionary.


Install midfielddata first.

Because of its size, the data package is stored in a drat repository. Installation takes time; please be patient and wait for the Console prompt “>” to reappear.

# install midfielddata first 
                 repos = "https://MIDFIELDR.github.io/drat/",
                 type = "source")

To confirm a successful installation, run the following to view the package help page.

? midfielddata

If the installation is successful, the code chunk above should produce a view of the help page as shown here.

midfielddata help page

Once you have conformed that midfielddata is successfully installed, install midfieldr. The package is currently available from GitHub, but should be submitted to CRAN by September 2020.

# install from CRAN (not yet available)
# install.packages("midfieldr")

# or install the development version from GitHub (available now)
# install.packages("devtools")


The midfieldr package includes:

  • cip Data frame with 1584 observations and 6 CIP variables of program codes and names at the 2, 4, and 6-digit levels. Each observation is a unique program keyed by a 6-digit CIP code. Occupies 380 kB of memory. Data dictionary (link).

The midfielddata package contains four data sets that constitute a stratified sample of the MIDFIELD database.

  • midfieldstudents Data frame with 97,640 observations and 15 demographic variables. Each observation is a unique student keyed by student ID. Occupies 19 MB of memory. Data dictionary (link).

  • midfieldcourses Data frame with 3.5 M observations and 12 academic course variables keyed by student ID, term, and course. Each observation is one course in one term for one student. Occupies 349 MB of memory. Data dictionary (link).

  • midfieldterms Data frame with 727,369 observations and 13 academic term variables keyed by student ID and term. Each observation is one term for one student. Occupies 82 MB of memory. Data dictionary (link).

  • midfielddegrees A data frame with 97,640 observations and 5 graduation variables keyed by student ID. Each observation is a unique student. Occupies 10.2 MB of memory. Data dictionary (link).


midfieldr functions work with MIDFIELD-structured data to access and manipulate student records. A typical workflow might include:

R ecosystem. midfieldr uses data.table functions and syntax. midfielddata data sets are class data.table and data.frame. However, midfieldr functions attempt to preserve data frame extensions assigned by the user (tbl for example). Thus users who prefer a different “dialect” such as base R or dplyr should find that the package is compatible with their preference.

In general the midfieldr vignettes use the following packages:

  • midfieldr
  • midfielddata
  • data.table (Dowle and Srinivasan, 2020)
  • ggplot2 (Wickham, 2016)

Get started vignette (link) introduces some of the basic midfieldr functions and the midfielddata data sets. Additional vignettes develop the material in more detail.


  • Data provided by MIDFIELD (link)
  • Get citation information with citation("midfieldr")
  • This project is released with a Code of Conduct (link). If you contribute to this project you agree to abide by its terms.


Dowle, Matt and Srinivasan, Arun (2020) data.table: Extension of data.frame. R package version 1.13.0. Available at: https://CRAN.R-project.org/package=data.table.

Wickham, Hadley (2016) ggplot2: Elegant Graphics for Data Analysis. ISBN 978-3-319-24277-4; Springer-Verlag New York. Available at: https://ggplot2.tidyverse.org.