Transform a data frame such that two independent categorical variables are factors with levels ordered for display in a multiway dot plot. Multiway data comprise a single quantitative value (or response) for every combination of levels of two categorical variables. The ordering of the rows and panels is crucial to the perception of effects (Cleveland, 1993).
Arguments
- dframe
Data frame containing a single quantitative value (or response) for every combination of levels of two categorical variables. Categories may be class character or factor. Two additional numeric columns are required when using the "percent" ordering method.
- quantity
Character, name (in quotes) of the single multiway quantitative variable
- categories
Character, vector of names (in quotes) of the two multiway categorical variables
- ...
Not used for passing values; forces subsequent arguments to be referable only by name.
- method
Character, “median” (default) or “percent”, method of ordering the levels of the categories. The median method computes the medians of the quantitative column grouped by category. The percent method computes percentages based on the same ratio underlying the quantitative percentage variable except grouped by category.
- ratio_of
Character vector with the names (in quotes) of the numerator and denominator columns that produced the quantitative variable, required when
method
is "percent". Names can be in any order; the algorithm assumes that the parameter with the larger column sum is the denominator of the ratio.
Value
A data frame in data.table
format with
the following properties: rows are preserved; columns specified by
categories
are converted to factors and ordered; the column specified
by quantity
is converted to type double; other columns are preserved
with the exception that columns added by the function overwrite existing
columns of the same name (if any); grouping structures are not preserved.
The added columns are:
CATEGORY_median
columns (when ordering method is "median")Numeric. Two columns of medians of the quantitative variable grouped by the categorical variables. The
CATEGORY
placeholder in the column name is replaced by a category name from thecategories
argument. For example, supposecategories = c("program", "people")
andmethod = "median"
. The two new column names would beprogram_median
andpeople_median.
CATEGORY_QUANTITY
columns (when ordering method is "percent")Numeric. Two columns of percentages based on the same ratio that produces the quantitative variable except grouped by the categorical variables. The
CATEGORY
placeholder in the column name is replaced by a category name from thecategories
argument; theQUANTITY
placeholder is replaced by the quantitative variable name in thequantity
argument. For example, supposecategories = c("program", "people")
, andquantity = "grad_rate"
, andmethod = "percent"
. The two new column names would beprogram_grad_rate
andpeople_grad_rate.
Details
In our context, "multiway" refers to the data structure and graph design defined by Cleveland (1993), not to the methods of analysis described by Kroonenberg (2008).
Multiway data comprise three variables: a categorical variable of m levels; a second independent categorical variable of n levels; and a quantitative variable (or response) of length mn that cross-classifies the categories, that is, there is a value of the response for each combination of levels of the two categorical variables.
In a multiway dot plot, one category is encoded by the panels, the second category is encoded by the rows of each panel, and the quantitative variable is encoded along identical horizontal scales.
References
Cleveland WS (1993). Visualizing Data. Hobart Press, Summit, NJ.
Kroonenberg PM (2008). Applied Multiway Data Analysis. Wiley, Hoboken, NJ.
Examples
# Subset of built-in data set
dframe <- study_results[program == "EE" | program == "ME"]
dframe[, people := paste(race, sex)]
#> program sex race ever_enrolled graduates stickiness
#> <char> <char> <char> <int> <int> <num>
#> 1: EE Female Asian 21 12 57.1
#> 2: EE Female Black 6 3 50.0
#> 3: EE Female International 28 9 32.1
#> 4: EE Female Latine 8 3 37.5
#> 5: EE Female Native American 1 0 0.0
#> 6: EE Female Other/Unknown 7 3 42.9
#> 7: EE Female White 118 56 47.5
#> 8: EE Male Asian 123 71 57.7
#> 9: EE Male Black 29 17 58.6
#> 10: EE Male International 195 90 46.2
#> 11: EE Male Latine 45 17 37.8
#> 12: EE Male Native American 3 0 0.0
#> 13: EE Male Other/Unknown 42 16 38.1
#> 14: EE Male White 864 439 50.8
#> 15: ME Female Asian 7 1 14.3
#> 16: ME Female Black 3 2 66.7
#> 17: ME Female International 19 11 57.9
#> 18: ME Female Latine 12 8 66.7
#> 19: ME Female Other/Unknown 8 4 50.0
#> 20: ME Female White 213 134 62.9
#> 21: ME Male Asian 76 49 64.5
#> 22: ME Male Black 30 19 63.3
#> 23: ME Male International 178 89 50.0
#> 24: ME Male Latine 79 42 53.2
#> 25: ME Male Native American 5 1 20.0
#> 26: ME Male Other/Unknown 80 41 51.2
#> 27: ME Male White 1596 953 59.7
#> program sex race ever_enrolled graduates stickiness
#> people
#> <char>
#> 1: Asian Female
#> 2: Black Female
#> 3: International Female
#> 4: Latine Female
#> 5: Native American Female
#> 6: Other/Unknown Female
#> 7: White Female
#> 8: Asian Male
#> 9: Black Male
#> 10: International Male
#> 11: Latine Male
#> 12: Native American Male
#> 13: Other/Unknown Male
#> 14: White Male
#> 15: Asian Female
#> 16: Black Female
#> 17: International Female
#> 18: Latine Female
#> 19: Other/Unknown Female
#> 20: White Female
#> 21: Asian Male
#> 22: Black Male
#> 23: International Male
#> 24: Latine Male
#> 25: Native American Male
#> 26: Other/Unknown Male
#> 27: White Male
#> people
dframe[, c("race", "sex") := NULL]
#> program ever_enrolled graduates stickiness people
#> <char> <int> <int> <num> <char>
#> 1: EE 21 12 57.1 Asian Female
#> 2: EE 6 3 50.0 Black Female
#> 3: EE 28 9 32.1 International Female
#> 4: EE 8 3 37.5 Latine Female
#> 5: EE 1 0 0.0 Native American Female
#> 6: EE 7 3 42.9 Other/Unknown Female
#> 7: EE 118 56 47.5 White Female
#> 8: EE 123 71 57.7 Asian Male
#> 9: EE 29 17 58.6 Black Male
#> 10: EE 195 90 46.2 International Male
#> 11: EE 45 17 37.8 Latine Male
#> 12: EE 3 0 0.0 Native American Male
#> 13: EE 42 16 38.1 Other/Unknown Male
#> 14: EE 864 439 50.8 White Male
#> 15: ME 7 1 14.3 Asian Female
#> 16: ME 3 2 66.7 Black Female
#> 17: ME 19 11 57.9 International Female
#> 18: ME 12 8 66.7 Latine Female
#> 19: ME 8 4 50.0 Other/Unknown Female
#> 20: ME 213 134 62.9 White Female
#> 21: ME 76 49 64.5 Asian Male
#> 22: ME 30 19 63.3 Black Male
#> 23: ME 178 89 50.0 International Male
#> 24: ME 79 42 53.2 Latine Male
#> 25: ME 5 1 20.0 Native American Male
#> 26: ME 80 41 51.2 Other/Unknown Male
#> 27: ME 1596 953 59.7 White Male
#> program ever_enrolled graduates stickiness people
data.table::setcolorder(dframe, c("program", "people"))
# Class before ordering
class(dframe$program)
#> [1] "character"
class(dframe$people)
#> [1] "character"
# Class and levels after ordering
mw1 <- order_multiway(dframe,
quantity = "stickiness",
categories = c("program", "people"))
class(mw1$program)
#> [1] "factor"
levels(mw1$program)
#> [1] "EE" "ME"
class(mw1$people)
#> [1] "factor"
levels(mw1$people)
#> [1] "Native American Female" "Native American Male" "Asian Female"
#> [4] "Other/Unknown Male" "International Female" "Latine Male"
#> [7] "Other/Unknown Female" "International Male" "Latine Female"
#> [10] "White Female" "White Male" "Black Female"
#> [13] "Black Male" "Asian Male"
# Display category medians
mw1
#> program people stickiness ever_enrolled graduates
#> <fctr> <fctr> <num> <int> <int>
#> 1: EE Asian Female 57.1 21 12
#> 2: EE Black Female 50.0 6 3
#> 3: EE International Female 32.1 28 9
#> 4: EE Latine Female 37.5 8 3
#> 5: EE Native American Female 0.0 1 0
#> 6: EE Other/Unknown Female 42.9 7 3
#> 7: EE White Female 47.5 118 56
#> 8: EE Asian Male 57.7 123 71
#> 9: EE Black Male 58.6 29 17
#> 10: EE International Male 46.2 195 90
#> 11: EE Latine Male 37.8 45 17
#> 12: EE Native American Male 0.0 3 0
#> 13: EE Other/Unknown Male 38.1 42 16
#> 14: EE White Male 50.8 864 439
#> 15: ME Asian Female 14.3 7 1
#> 16: ME Black Female 66.7 3 2
#> 17: ME International Female 57.9 19 11
#> 18: ME Latine Female 66.7 12 8
#> 19: ME Other/Unknown Female 50.0 8 4
#> 20: ME White Female 62.9 213 134
#> 21: ME Asian Male 64.5 76 49
#> 22: ME Black Male 63.3 30 19
#> 23: ME International Male 50.0 178 89
#> 24: ME Latine Male 53.2 79 42
#> 25: ME Native American Male 20.0 5 1
#> 26: ME Other/Unknown Male 51.2 80 41
#> 27: ME White Male 59.7 1596 953
#> program people stickiness ever_enrolled graduates
#> program_median people_median
#> <num> <num>
#> 1: 44.55 35.70
#> 2: 44.55 58.35
#> 3: 44.55 45.00
#> 4: 44.55 52.10
#> 5: 44.55 0.00
#> 6: 44.55 46.45
#> 7: 44.55 55.20
#> 8: 44.55 61.10
#> 9: 44.55 60.95
#> 10: 44.55 48.10
#> 11: 44.55 45.50
#> 12: 44.55 10.00
#> 13: 44.55 44.65
#> 14: 44.55 55.25
#> 15: 57.90 35.70
#> 16: 57.90 58.35
#> 17: 57.90 45.00
#> 18: 57.90 52.10
#> 19: 57.90 46.45
#> 20: 57.90 55.20
#> 21: 57.90 61.10
#> 22: 57.90 60.95
#> 23: 57.90 48.10
#> 24: 57.90 45.50
#> 25: 57.90 10.00
#> 26: 57.90 44.65
#> 27: 57.90 55.25
#> program_median people_median
# Existing factors (if any) are re-ordered
mw2 <- dframe
mw2$program <- factor(mw2$program, levels = c("ME", "EE"))
# Levels before conditioning
levels(mw2$program)
#> [1] "ME" "EE"
# Levels after conditioning
mw2 <- order_multiway(dframe,
quantity = "stickiness",
categories = c("program", "people"))
levels(mw2$program)
#> [1] "EE" "ME"
# Ordering using percent method
order_multiway(dframe,
quantity = "stickiness",
categories = c("program", "people"),
method = "percent",
ratio_of = c("graduates", "ever_enrolled"))
#> program people graduates ever_enrolled stickiness
#> <fctr> <fctr> <num> <num> <num>
#> 1: EE Asian Female 12 21 57.1
#> 2: EE Black Female 3 6 50.0
#> 3: EE International Female 9 28 32.1
#> 4: EE Latine Female 3 8 37.5
#> 5: EE Native American Female 0 1 0.0
#> 6: EE Other/Unknown Female 3 7 42.9
#> 7: EE White Female 56 118 47.5
#> 8: EE Asian Male 71 123 57.7
#> 9: EE Black Male 17 29 58.6
#> 10: EE International Male 90 195 46.2
#> 11: EE Latine Male 17 45 37.8
#> 12: EE Native American Male 0 3 0.0
#> 13: EE Other/Unknown Male 16 42 38.1
#> 14: EE White Male 439 864 50.8
#> 15: ME Asian Female 1 7 14.3
#> 16: ME Black Female 2 3 66.7
#> 17: ME International Female 11 19 57.9
#> 18: ME Latine Female 8 12 66.7
#> 19: ME Other/Unknown Female 4 8 50.0
#> 20: ME White Female 134 213 62.9
#> 21: ME Asian Male 49 76 64.5
#> 22: ME Black Male 19 30 63.3
#> 23: ME International Male 89 178 50.0
#> 24: ME Latine Male 42 79 53.2
#> 25: ME Native American Male 1 5 20.0
#> 26: ME Other/Unknown Male 41 80 51.2
#> 27: ME White Male 953 1596 59.7
#> program people graduates ever_enrolled stickiness
#> program_stickiness people_stickiness
#> <num> <num>
#> 1: 49.4 46.4
#> 2: 49.4 55.6
#> 3: 49.4 42.6
#> 4: 49.4 55.0
#> 5: 49.4 0.0
#> 6: 49.4 46.7
#> 7: 49.4 57.4
#> 8: 49.4 60.3
#> 9: 49.4 61.0
#> 10: 49.4 48.0
#> 11: 49.4 47.6
#> 12: 49.4 12.5
#> 13: 49.4 46.7
#> 14: 49.4 56.6
#> 15: 58.7 46.4
#> 16: 58.7 55.6
#> 17: 58.7 42.6
#> 18: 58.7 55.0
#> 19: 58.7 46.7
#> 20: 58.7 57.4
#> 21: 58.7 60.3
#> 22: 58.7 61.0
#> 23: 58.7 48.0
#> 24: 58.7 47.6
#> 25: 58.7 12.5
#> 26: 58.7 46.7
#> 27: 58.7 56.6
#> program_stickiness people_stickiness