Skip to contents

A degree-seeking student enrolled in their first degree-granting program is a starter in that program. Identifying starters is typically performed as part of a graduation rate calculation, though it can also be a useful measure in its own right.

This vignette in the MIDFIELD workflow.

  1. Planning
  2. Initial processing
  3. Blocs
    • Ever-enrolled
    • FYE proxies
    • Starters
    • Graduates
  4. Groupings
  5. Metrics
  6. Displays

Special cases

In two special cases, an entering student’s CIP code does not correspond to a degree-granting program. Our procedure for identifying starters accommodate both special cases.

The first case includes records for which a CIP is unspecified or reported as “undecided”. In MIDFIELD data, both conditions are encoded as CIP 999999. Students may enter with this CIP but we do not consider them starters until and if they enroll in a degree-granting program. (The midfielddata practice datasets contain no undecided CIP codes.)

The second case is more nuanced. At some US institutions, engineering students are required to complete a First-Year Engineering (FYE) program as a prerequisite for declaring an engineering major. These students are admitted as Engineering majors but we don’t know to which degree-granting program they intended to transition. At the 2-digit CIP level, FYE students are starters in Engineering (CIP 14). If we do not restrict a study to 2-digit CIPs, however, we use FYE proxies—our estimates of the degree-granting engineering programs (6-digit CIP level) that FYE students would have declared had they not been required to enroll in FYE.

Definitions

bloc

A grouping of student-level data dealt with as a unit, for example, starters, students ever-enrolled, graduates, transfer students, traditional and non-traditional students, migrators, etc.

starters

Bloc of degree-seeking students in their initial terms enrolled in degree-granting programs.

entry term

A student’s first term in the database.

start term

The first term in which a student can be considered a starter. Identical to the entry term unless the student enters as undecided/unspecified.

undecided/unspecified

The MIDFIELD taxonomy includes the non-IPEDS code (CIP 999999) for Undecided or Unspecified indicating instances in which a student has not declared a major or an institution had not recorded a program.

FYE

First-Year Engineering program, a common-first-year curriculum that is a prerequisite for declaring an engineering major at some US institutions. Denoted by its own CIP code, FYE is not a degree-granting program.

FYE proxy

Our estimate of the degree-granting engineering program in which an FYE student would have enrolled had they not been required to enroll in FYE. The proxy, a 6-digit CIP code, denotes the program of which the FYE student can be considered a starter.

Method

We use student and term to identify starters.

  1. Filter the source student-level records for data sufficiency and degree-seeking.

  2. Filter for a student’s first term not assigned an undecided/unknown CIP code.

  3. Identify the program(s) of which a student can be considered a starter. Substitute an FYE proxy when a starting program is FYE.

  4. Filter by program.

Reminder.   midfielddata datasets are for practice, not research.

Load data

Start.   If you are writing your own script to follow along, we use these packages in this article:

Load.   Practice datasets. View data dictionaries via ?student, ?term.

# Load practice data
data(student, term)

Loads with midfieldr.   Prepared data. View data dictionaries via ?study_programs, ?baseline_mcid, ?fye_proxy.

Initial processing

Select (optional).   Reduce the number of columns. Code reproduced from Getting started.

# Optional. Copy of source files with all variables
source_student <- copy(student)
source_term <- copy(term)

# Optional. Select variables required by midfieldr functions
student <- select_required(source_student)
term <- select_required(source_term)

Initialize.   From term and student, construct a data frame of student IDs filtered for data sufficiency and degree seeking as described in Blocs.

# Working data frame
DT <- copy(baseline_mcid)

Isolate the start term

The start term is the first term in which a student can be considered a starter, that is, they are degree-seeking and not recorded as undecided/unspecified.

Add variables.   Left join to add terms and CIPs for these students.

# Term into DT left join
DT <- term[DT, .(mcid, term, cip6), on = c("mcid")]
DT
#>                   mcid  term   cip6
#>      1: MCID3111142689 19883 090401
#>      2: MCID3111142782 19883 260101
#>      3: MCID3111142782 19885 260101
#>     ---                            
#> 531417: MCID3112870009 19953 240102
#> 531418: MCID3112870009 19954 240102
#> 531419: MCID3112870009 19983 240102

Filter.   We remove observations of undecided/unspecified (CIP 999999). Any rows remaining for the same IDs will have CIPs of degree-granting program (or FYE), allowing us to infer their preferred starting programs. (A required step for completeness, but unnecessary when using the practice data.)

# Remove undecided/unspecified
DT <- DT[!cip6 %like% "999999"]
DT
#>                   mcid  term   cip6
#>      1: MCID3111142689 19883 090401
#>      2: MCID3111142782 19883 260101
#>      3: MCID3111142782 19885 260101
#>     ---                            
#> 531417: MCID3112870009 19953 240102
#> 531418: MCID3112870009 19954 240102
#> 531419: MCID3112870009 19983 240102

Filter.   Order rows by ID and term, then filter to retain the start term observation. If your data contain students enrolled in more than one major in their first term, replace .SD[1] with the (slower) .SD[which.min(term)].

# Retain observations of the earliest remaining terms by ID
setorderv(DT, cols = c("mcid", "term"))
DT <- DT[, .SD[1], by = "mcid"]
DT
#>                  mcid  term   cip6
#>     1: MCID3111142689 19883 090401
#>     2: MCID3111142782 19883 260101
#>     3: MCID3111142881 19893 450601
#>    ---                            
#> 76873: MCID3112785480 20071 240102
#> 76874: MCID3112800920 20101 240102
#> 76875: MCID3112870009 19951 240102

Filter.   Remove unnecessary variables and filter for unique observations.

# Unique combinations of ID and CIP
DT <- DT[, .(mcid, cip6)]
DT <- unique(DT)
DT
#>                  mcid   cip6
#>     1: MCID3111142689 090401
#>     2: MCID3111142782 260101
#>     3: MCID3111142881 450601
#>    ---                      
#> 76873: MCID3112785480 240102
#> 76874: MCID3112800920 240102
#> 76875: MCID3112870009 240102

Starters without FYE

If and only if our study excluded Engineering programs that require FYE, the data frame just derived would be the desired bloc of starters.

In such a case, we would rename cip6 to start to make explicit that the CIP codes in this column represent programs of which the students can be considered starters. The code would have the form

# Not run
DT <- DT[, .(mcid, start = cip6)]

which retains the ID variable and changes the name of the CIP variable.

Starters with FYE

Add a variable.   Merge fye_proxy with the working data frame. The left join introduces NA in the proxy column for students not assigned an FYE proxy.

# Join the proxies to the working data frame
DT <- fye_proxy[DT, on = c("mcid")]
DT
#>                  mcid proxy   cip6
#>     1: MCID3111142689  <NA> 090401
#>     2: MCID3111142782  <NA> 260101
#>     3: MCID3111142881  <NA> 450601
#>    ---                            
#> 76873: MCID3112785480  <NA> 240102
#> 76874: MCID3112800920  <NA> 240102
#> 76875: MCID3112870009  <NA> 240102

Create a variable.   Estimated starting programs for FYE students are in the proxy column. Actual, recorded starting programs for non-FYE students are in the cip6 column. Create the start column to combine the two.

# Combine all starting CIPs
DT[, start := fcase(
  cip6 == "140102", proxy,
  cip6 != "140102", cip6
)]
DT
#>                  mcid proxy   cip6  start
#>     1: MCID3111142689  <NA> 090401 090401
#>     2: MCID3111142782  <NA> 260101 260101
#>     3: MCID3111142881  <NA> 450601 450601
#>    ---                                   
#> 76873: MCID3112785480  <NA> 240102 240102
#> 76874: MCID3112800920  <NA> 240102 240102
#> 76875: MCID3112870009  <NA> 240102 240102

Select.   Omit unnecessary columns.

# Omit unnecessary columns.
DT[, .(mcid, start)]
#>                  mcid  start
#>     1: MCID3111142689 090401
#>     2: MCID3111142782 260101
#>     3: MCID3111142881 450601
#>    ---                      
#> 76873: MCID3112785480 240102
#> 76874: MCID3112800920 240102
#> 76875: MCID3112870009 240102
DT
#>                  mcid proxy   cip6  start
#>     1: MCID3111142689  <NA> 090401 090401
#>     2: MCID3111142782  <NA> 260101 260101
#>     3: MCID3111142881  <NA> 450601 450601
#>    ---                                   
#> 76873: MCID3112785480  <NA> 240102 240102
#> 76874: MCID3112800920  <NA> 240102 240102
#> 76875: MCID3112870009  <NA> 240102 240102

Closer look

Examining the records of selected students in detail.

Example 1.   In our results, this student is a starter in CIP 143501 (Industrial Engineering).

# Analysis result
DT[mcid == "MCID3111150194"]
#>              mcid  proxy   cip6  start
#> 1: MCID3111150194 143501 140102 143501

An excerpt from their record in term shows them enrolled in CIP 140102 (FYE) for three terms followed by CIP 143501 They transitioned post-FYE to Industrial Engineering and we consider them a starter in that program.

# Sequence of term records
term[mcid == "MCID3111150194"]
#>              mcid   institution  term   cip6              level
#> 1: MCID3111150194 Institution J 19883 140102      01 First-year
#> 2: MCID3111150194 Institution J 19891 140102     02 Second-year
#> 3: MCID3111150194 Institution J 19893 140102     02 Second-year
#> 4: MCID3111150194 Institution J 19903 143501      03 Third-year
#> 5: MCID3111150194 Institution J 19911 143501     04 Fourth-year
#> 6: MCID3111150194 Institution J 19913 143501     04 Fourth-year
#> 7: MCID3111150194 Institution J 19921 143501 05 Fifth-year Plus
#> 8: MCID3111150194 Institution J 19923 143501 05 Fifth-year Plus

Example 2.   In our results, this student is a starter in CIP 141801 (Materials Engineering).

# Analysis result
DT[mcid == "MCID3111161837"]
#>              mcid  proxy   cip6  start
#> 1: MCID3111161837 141801 140102 141801

An excerpt from their record in term shows them enrolled in CIP 140102 (FYE) for three terms followed by CIP 270101 (Mathematics)—they transitioned from FYE to a non-engineering major. Thus we consider them a starter in their proxy program, Materials Engineering.

# Sequence of term records
term[mcid == "MCID3111161837"]
#>               mcid   institution  term   cip6              level
#>  1: MCID3111161837 Institution J 19883 140102      01 First-year
#>  2: MCID3111161837 Institution J 19891 140102     02 Second-year
#>  3: MCID3111161837 Institution J 19893 140102     02 Second-year
#>  4: MCID3111161837 Institution J 19905 270101     02 Second-year
#>  5: MCID3111161837 Institution J 19906 270101     02 Second-year
#>  6: MCID3111161837 Institution J 19913 270101      03 Third-year
#>  7: MCID3111161837 Institution J 19921 270101      03 Third-year
#>  8: MCID3111161837 Institution J 19923 270101     04 Fourth-year
#>  9: MCID3111161837 Institution J 19931 270101     04 Fourth-year
#> 10: MCID3111161837 Institution J 19933 270101     04 Fourth-year
#> 11: MCID3111161837 Institution J 19935 270101 05 Fifth-year Plus

Example 3.   In our results, this student is a starter in CIP 140701 (Chemical Engineering).

# Analysis result
DT[mcid == "MCID3111303095"]
#>              mcid  proxy   cip6  start
#> 1: MCID3111303095 140701 140102 140701

An excerpt from their record in term shows them enrolled in CIP 140102 (FYE) for two terms and then leaving the database. Again, we consider them a starter in their proxy program, Chemical Engineering.

term[mcid == "MCID3111303095"]
#>              mcid   institution  term   cip6         level
#> 1: MCID3111303095 Institution J 19911 140102 01 First-year
#> 2: MCID3111303095 Institution J 19913 140102 01 First-year

Filter by program

Filter.   Because “starter” usually means “starter in specific programs,” this bloc concludes with a filter by program.

# Rename cip6 as start
join_labels <- copy(study_programs)
join_labels <- join_labels[, .(program, start = cip6)]

# Filter by program
DT <- join_labels[DT, on = c("start"), nomatch = NULL]
DT
#>       program  start           mcid  proxy   cip6
#>    1:      EE 141001 MCID3111142965 141001 140102
#>    2:      EE 141001 MCID3111145102 141001 140102
#>    3:     ISE 143501 MCID3111150194 143501 140102
#>   ---                                            
#> 4051:      EE 141001 MCID3112619118   <NA> 141001
#> 4052:      EE 141001 MCID3112619484   <NA> 141001
#> 4053:      ME 141901 MCID3112619666   <NA> 141901

Select.   Omit unnecessary variables.

DT <- DT[, .(mcid, program)]
DT <- unique(DT)
DT
#>                 mcid program
#>    1: MCID3111142965      EE
#>    2: MCID3111145102      EE
#>    3: MCID3111150194     ISE
#>   ---                       
#> 4051: MCID3112619118      EE
#> 4052: MCID3112619484      EE
#> 4053: MCID3112619666      ME

Reusable code

Preparation.   The data frame of baseline IDs is the intake for this section.

DT <- copy(baseline_mcid)

Starters.   Summary code chunks for ready reference.

# Isolate starting term
DT <- term[DT, .(mcid, term, cip6), on = c("mcid")]
DT <- DT[!cip6 %like% "999999"]
setorderv(DT, cols = c("mcid", "term"))
DT <- DT[, .SD[1], by = "mcid"]
# Alternatively
# DT <- DT[, .SD[which.min(term)], by = "mcid"]
DT <- DT[, .(mcid, cip6)]
DT <- unique(DT)

For starters without FYE, finish by renaming cip6

# Not run
DT <- DT[, .(mcid, start = cip6)]

For starters with FYE, continue with FYE proxies.

DT <- fye_proxy[DT, .(mcid, cip6, proxy), on = c("mcid")]
DT[, start := fcase(
  cip6 == "140102", proxy,
  cip6 != "140102", cip6
)]
DT <- DT[, .(mcid, start)]

# Filter by program on start
join_labels <- copy(study_programs)
join_labels <- join_labels[, .(program, start = cip6)]
DT <- join_labels[DT, on = c("start"), nomatch = NULL]
DT <- DT[, .(mcid, program)]
DT <- unique(DT)

References