4 Work with R
R by Monceau and by Duncan C are licensed under CC BY-NC 2.0, by Marco Isler CC BY-ND 2.0, by Joanna Poe CC BY-SA 2.0
R is an open source language and environment for statistical computing and graphics [5], ranked by IEEE in 2020 as the 6th most popular programming language (Python, Java, and C are the top three) [6]. If you are new to R, some of its best features, paraphrasing Wickham [2], are:
- R is free, open source, and available on every major platform.
- R packages provide effective tools for data analysis and visualization.
- More than 17,750 open-source R packages are available (Jul 2021). Many are cutting-edge tools.
RStudio, an integrated development environment (IDE) for R, includes a console, editor, and tools for plotting, history, debugging, and workspace management as well as access to GitHub for collaboration and version control [7].
4.1 Prerequisites
Before proceeding, you should have completed Install everything which covers:
4.2 Create a script
Launch your workshop project—workshop.Rproj
or other name that you selected—to start the R session. Always work in an RStudio project environment.
Create a script by:
- Use the pulldown menu, File > New File > R Script
- File menu > Save As…
- In the dialog box, navigate to your
scripts
directory, type a file name, for example,01-R-basics.R
(file names in R can start with numerals), and Save.
We suggest you start a new R script for each tutorial and save it to the scripts
directory. For example, at the end of the workshop, your scripts directory might contain the following files:
\scripts 01-R-basics.R
\02-getting-started.R
\03-case-study-programs.R
\04-case-study-students.R
\ etc.
4.3 New to R?
Prerequisites should be completed before proceeding. By the end of the workshop, our R beginners will have made progress on two or possibly three tutorials:
- R basics An introduction to R.
- Getting started: Examine the MIDFIELD practice data
- Case study programs Gather CIP codes and program names
If there is still time remaining, continue to any tutorial listed in the After the workshop section.
4.4 Familiar with R?
Prerequisites should be completed before proceeding. By the end of the workshop, our more experienced R users will have made substantive progress on two or possibly three tutorials:
- Getting started: Examine the MIDFIELD practice data
- Case study programs Gather CIP codes and program names
- Case study students Gather students who pass the data sufficiency criterion.
If there is still time remaining, continue to any tutorial listed in the After the workshop section.
4.5 After the workshop
At his point, your learning is self-directed. Choose the skills you want to continue working on. We have tutorials for graph basics and data basics, for continuing the case study tour of midfieldr, and detailed vignettes for closer study of the midfieldr functionality and student unit record analysis.
4.5.1 R skills
The basic skills tutorials take about an hour each.
4.5.2 Case study
The case study is a quick tour of a typical workflow using Student Unit Record (SUR) data. This is a “big picture” development—functions are used without detailed explanations or development so that we can focus on the logic of the analysis.
4.5.3 Vignettes
Deep dive into the midfieldr functionality. The work flow follows the same general pattern as the quicker case study, but pauses to explore each function in more detail, exploring the arguments and strategies for use. In general, each tutorial is self-contained so you may enter at almost any point.
- Program codes and names Practice strategies of searching
cip
for programs we want to study. - Subsetting MIDFIELD data Use programs codes to subset the MIDFIELD data tables.
- Data sufficiency What it is and how it is applied to student unit-record (SUR) data.
- Timely completion What it is and how it is applied to SUR data.
- FYE programs What they are and how they are accommodated with SUR data.
- Multiway graphs How to graph and interpret a common data structure encountered when working with SUR data.
- Tabulating data How to tabulate multiway data for publication.