SGP: analyzeSGP – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

analyzeSGP

Analyze student data to produce student growth percentiles and student growth projections

Description

Wrapper function used to produce student growth percentiles and student growth projections (both cohort and baseline referenced) using long formatted data like that provided by prepareSGP.

Usage

analyzeSGP(sgp_object,
         state=NULL,
         years=NULL,
         content_areas=NULL,
         grades=NULL,
         sgp.percentiles=TRUE,
         sgp.projections=TRUE,
         sgp.projections.lagged=TRUE,
         sgp.percentiles.baseline=TRUE,
         sgp.projections.baseline=TRUE,
         sgp.projections.lagged.baseline=TRUE,
         sgp.percentiles.baseline.max.order=3,
         sgp.projections.baseline.max.order=3,
         sgp.projections.lagged.baseline.max.order=3,
         sgp.projections.max.forward.progression.years=3,
         sgp.projections.max.forward.progression.grade=NULL,
         sgp.projections.use.only.complete.matrices=NULL,
         sgp.minimum.default.panel.years=NULL,
         sgp.use.my.coefficient.matrices=NULL,
         sgp.use.my.sgp_object.baseline.coefficient.matrices=NULL,
         sgp.test.cohort.size=NULL,
         return.sgp.test.results=FALSE,
         simulate.sgps=TRUE,
         calculate.simex=NULL,
         calculate.simex.baseline=NULL,
         goodness.of.fit.print=TRUE,
         sgp.config=NULL,
         sgp.config.drop.nonsequential.grade.progression.variables=TRUE,
         sgp.baseline.panel.years=NULL,
         sgp.baseline.config=NULL,
         trim.sgp.config=TRUE,
         parallel.config=NULL,
         verbose.output=FALSE,
         print.other.gp=NULL,
         sgp.projections.projection.unit="YEAR",
         get.cohort.data.info=FALSE,
         sgp.sqlite=FALSE,
         sgp.percentiles.equated=NULL,
         sgp.percentiles.equating.method=NULL,
         sgp.percentiles.calculate.sgps=TRUE,
         SGPt=NULL,
         fix.duplicates=NULL,
         ...)

Arguments

`sgp_object`	An object of class `SGP` containing long formatted data in the `@Data` slot (from `prepareSGP`).
`state`	Acronym indicating state associated with the data for access to embedded knot and boundaries, cutscores, CSEMs, and other state related assessment data.
`years`	A vector indicating year(s) in which to produce student growth percentiles and/or student growth projections/trajectories. If missing the function will use the data to infer the year(s) based upon the assumption of having at least three years of panel data for analyses.
`content_areas`	A vector indicating content area(s) in which to produce student growth percentiles and/or student growth projections/trajectories. If left missing the function will use the data to infer the content area(s) available for analyses.
`grades`	A vector indicating grades for which to calculate student growth percentiles and/or student growth projections/trajectories. If left missing the function will use the data to infer all the grade progressions for student growth percentile and student growth projections/trajectories analyses.
`sgp.percentiles`	Boolean variable indicating whether to calculate student growth percentiles. Defaults to TRUE.
`sgp.projections`	Boolean variable indicating whether to calculate student growth projections. Defaults to TRUE.
`sgp.projections.lagged`	Boolean variable indicating whether to calculate lagged student growth projections often used for growth to standard analyses. Defaults to TRUE.
`sgp.percentiles.baseline`	Boolean variable indicating whether to calculate baseline student growth percentiles and/or coefficient matrices. Defaults to TRUE.
`sgp.projections.baseline`	Boolean variable indicating whether to calculate baseline student growth projections. Defaults to TRUE.
`sgp.projections.lagged.baseline`	Boolean variable indicating whether to calculate lagged baseline student growth projections. Defaults to TRUE.
`sgp.percentiles.baseline.max.order`	Integer indicating the maximum order to calculate baseline student growth percentiles (regardless of maximum coefficient matrix order). Also the max order of baseline coefficient matrices to be calculated if requested. Default is 3. To utilize the maximum matrix order, set to NULL.
`sgp.projections.baseline.max.order`	Integer indicating the maximum order to calculate baseline student growth projections (regardless of maximum coefficient matrix order). Default is 3. To utilize the maximum matrix order, set to NULL.
`sgp.projections.lagged.baseline.max.order`	Integer indicating the maximum order to calculate lagged baseline student growth projections (regardless of maximum coefficient matrix order). Default is 3. To utilize the maximum matrix order, set to NULL.
`sgp.projections.max.forward.progression.years`	Integer indicating the maximum number of years forward that cohort based projections will be established for. Default is 3 years.
`sgp.projections.max.forward.progression.grade`	Integer indicating the maximum grade forward that cohort based projections will be established for. Default is NULL, the highest grade.
`sgp.projections.use.only.complete.matrices`	Boolean argument (defaults to TRUE/NULL) indicating whether to produce projections only when a complete set of coefficient matrices is available.
`sgp.minimum.default.panel.years`	Integer indicating the minimum number of panels years to use for default sgp analyses. Default value is NULL (converted to 3) years of data.
`sgp.use.my.coefficient.matrices`	Argument, defaults to NULL, indicating whether to use coefficient matrices embedded in argument supplied to 'sgp_object' to calculate student growth percentiles.
`sgp.use.my.sgp_object.baseline.coefficient.matrices`	Argument, defaults to NULL (FALSE), indicating whether to utilize baseline matrices embedded in supplied `sgp_object` and not utilize baseline matrices embedded in `SGPstateData`.
`sgp.test.cohort.size`	Integer indicating the maximum number of students sampled from the full cohort to use in the calculation of student growth percentiles. Intended to be used as a test of the desired analyses to be run. The default, NULL, uses no restrictions (no tests are performed, and analyses use the entire cohort of students).
`return.sgp.test.results`	Boolean variable passed to `studentGrowthPercentiles` indicating whether the results from the cohort sample subset (if specified using the above argument) should be returned for inspection. Defaults to FALSE. If TRUE, only the sample subset of the data used will be returned in the SGP object's @Data slot. Alternatively, user can supply the character "ALL_DATA" to the argument to return the entire original data.
`simulate.sgps`	Boolean variable indicating whether to simulate SGP values for students based on test-specific Conditional Standard Errors of Measurement (CSEM). Test CSEM data must be available for simulation and included in `SGPstateData`. This argument must be set to TRUE for confidence interval construction. Defaults to TRUE.
`calculate.simex`	A character state acronym or list including state/csem variable, csem.data.vnames, csem.loss.hoss, simulation.iterations, lambda and extrapolation method. Returns both SIMEX adjusted SGP (`SGP_SIMEX`) as well as the percentile ranked SIMEX SGP (`RANK_SIMEX`) values as suggested by Castellano and McCaffrey (2017). Defaults to NULL, no simex calculations performed. Alternatively, setting the argument to TRUE sets the list up with state=state, lambda=seq(0,2,0.5), simulation.iterations=50, simex.sample.size=25000, extrapolation="linear" and save.matrices=TRUE.
`calculate.simex.baseline`	A character state acronym or list including state/csem variable, csem.data.vnames, csem.loss.hoss, simulation.iterations, lambda and extrapolation method. Defaults to NULL, no simex calculations performed. Alternatively, setting the argument to TRUE uses the same defaults as above along with `simex.use.my.coefficient.matrices = TRUE`, which assumes baseline SIMEX coefficient matrices are available.
`goodness.of.fit.print`	Boolean variable indicating whether to print out Goodness of Fit figures as PDF into a directory labeled Goodness of Fit. Defaults to TRUE.
`sgp.config`	If `years`, `content_areas`, and `grades` are missing, user can directly specify a list containing three vectors: `baseline.content.areas`, `baseline.panel.years`, and `baseline.grade.sequences`. This advanced option is helpful for analysis of non-traditional grade progressions and other special cases. See examples for use cases.
`sgp.config.drop.nonsequential.grade.progression.variables`	Boolean variable (defaults to TRUE) indicating whether non-sequential grade progression variables should be dropped when sgp.config is processed. For example, if a grade progression of c(3,4,6) is provided, the data configuration will assume (default is TRUE) that data for a missing year needs to be dropped prior to applying `studentGrowthPercentiles` or `studentGrowthProjections` to the data.
`sgp.baseline.panel.years`	A vector of years to be used for baseline coefficient matrix calculation. Default is to use most recent five years of data.
`sgp.baseline.config`	A list containing three vectors: `sgp.content.areas`, `sgp.panel.years`, `sgp.grade.sequences` indicating how baseline student growth percentile analyses are to be conducted. In almost all cases this value is calculated by default within the function but can be specified directly for advanced use cases. See source code for more detail on this configuration option.
`trim.sgp.config`	A Boolean variable indicating whether the arguments `content_areas`, `years` and `grades` should be used to 'trim' any manually supplied configuration for analysis supplied by 'sgp.config'.
`parallel.config`	A named list with, at a minimum, two elements indicating 1) the BACKEND package to be used for parallel computation and 2) the WORKERS list to specify the number of processors to be used in each major analysis. The BACKEND element can be set = to `FOREACH` or `PARALLEL`. Please consult the manuals and vignettes for information of these packages! TYPE is a third element of the `parallel.config` list that provides necessary information when using FOREACH or PARALLEL packages as the backend. With BACKEND="FOREACH", the TYPE element specifies the flavor of 'foreach' backend. As of version 1.0-1.0, only "doParallel" is supported. If BACKEND = "PARALLEL", the `parallel` package will be used. This package combines deprecated parallel packages `snow` and `multicore`. Using the "snow" implementation of `parallel` the function will create a cluster object based on the TYPE element specified and the number of workers requested (see WORKERS list description below). The TYPE element indicates the users preferred cluster type (either "PSOCK" for socket cluster of "MPI" for an OpenMPI cluster). If Windows is the operating system, this "snow" implementation must be used and the TYPE element must = "PSOCK". Defaults are assigned based on operating system if TYPE is missing based on system OS. Unix/Mac OS defaults to the "multicore" to avoid worker node pre-scheduling and appears to be more efficient in these operating systems. The WORKERS list must contain, at a minimum, a single number of processors (nodes) desired or available. If WORKERS is specified in this manner, then the same number of processors will be used for each analysis type (sgp.percentiles, sgp.projections, ... sgp.projections.lagged.baseline). Alternatively, the user may specify the numbers of processors used for each analysis. This allows for better memory management in systems that do not have enough RAM available per core. The choice of the number of cores is a balance between the number of processors available, the amount of RAM a system has and the size of the data (sgp_object). Each system will be different and will require some tailoring. One rule of thumb used by the authors is to allow for 4GB of memory per core used for running large state data. The SGP Demonstration (and data that size) requires more like 1-2GB per core. As an example, PERCENTILES=4 and PROJECTIONS=2 might be used on a quad core machine with 4 GB of RAM. This will use all 4 cores available for the sgp.percentiles analysis and 2 cores for the sgp.projections analysis (which requires more memory than available). The WORKERS list accepts these elements: PERCENTILES, PROJECTIONS (for both cohort and baseline referenced projections), LAGGED_PROJECTIONS (for both cohort and baseline referenced lagged projections), BASELINE_MATRICES (used to produce the baseline coefficient matrices when not available in SGPstateData - very computationally intensive), BASELINE_PERCENTILES (SGP calculation only when baseline coefficient matrices have already been produced and are available - NOT very computationally intensive). Alternatively, the name of an external CLUSTER.OBJECT (PSOCK or MPI) set up by the user outside of the function can be used. Example use cases are provided below.
`verbose.output`	A Boolean argument (defaults to FALSE) indicating whether the function should output verbose diagnostic messages.
`print.other.gp`	A Boolean argument (defaults to FALSE) indicating whether the function should output SGP of all orders.
`sgp.projections.projection.unit`	A character vector argument indicating whether the studentGrowthProjections function should produce projections relative to future grades or future years. Options are "YEAR" and "GRADE", with default being "YEAR".
`get.cohort.data.info`	A Boolean argument (defaults to FALSE) indicating whether a summary of all cohorts to be submitted to the `studentGrowthPercentiles` and `studentGrowthProjections` functions should be performed prior to analysis.
`sgp.sqlite`	A Boolean argument (defaults to FALSE) indicating whether a SQLite database file of the essential SGP data should be created from the `@Data` slot and subsequently used to extract data subsets for analysis with `studentGrowthPercentiles` and `studentGrowthProjections` functions. If the size of the `@Data` object is greater than 1 GB `sgp.sqlite` is set to TRUE internally. When TRUE, this can substantially reduce the amount of RAM memory required to conduct analyses. If set to TRUE the file "TMP_SGP_Data.sqlite" will be created in the R temporary directory (see`?tempdir` for information). This file is deleted by default although one may keep it if the argument is specified as the character "KEEP".
`sgp.percentiles.equated`	A Boolean argument (defaults to NULL/FALSE) indicating whether equating should be used on the most recent year of test data provided. Equating allows for student growth projections to be calculated in across assessment transitions where the scale for the assessment changes.
`sgp.percentiles.equating.method`	A character vector (defaults to NULL/'equipercentile') indicating the type of equating method to use. Options include any combination of 'identity', 'mean', 'linear', and 'equipercentile'.
`sgp.percentiles.calculate.sgps`	A Boolean argument (defaults to TRUE) indicating whether to calculate percentiles in calls to studentGrowthPercentiles function. Setting to FALSE would indicate desire to calculate only coefficient matrices and no percentiles.
`SGPt`	An argument supplied to implement time-dependent SGP analyses (SGPt). Default is NULL giving standard, non-time dependent argument. If set to TRUE, the function assumes the variables 'TIME' and 'TIME_LAG' are supplied as part of the panel.data. To specify other names, supply a list of the form: list(TIME='my_time_name', TIME_LAG='my_time_lag_name'), substituting your variable names.
`fix.duplicates`	Argument to control how duplicate records based upon the key of VALID_CASE, CONTENT_AREA, YEAR, and ID are dealt with. If set to 'KEEP.ALL', the function tries to fix the duplicate individual records by adding a '_DUP_***' suffix to the duplicate ID before running `studentGrowthPercentiles` in order to create unique records based upon the key. See `combineSGP` for additional info on `fix.duplicates` functionality.
`...`	Arguments to be passed to `studentGrowthPercentiles` or `studentGrowthProjections` for finer control over SGP calculations. NOTE: arguments can only be passed to one lower level function at a time, and only student growth percentiles OR projections can be created but not both at the same time.

Value

Function returns a list containing the long data set in the @Data slot as a data.table keyed using VALID_CASE, CONTENT_AREA, YEAR, ID and the student growth percentile and/or student growth projection/trajectory results in the SGP slot.

Author(s)

Damian W. Betebenner dbetebenner@nciea.org and Adam Van Iwaarden vaniwaarden@colorado.edu

Examples

## Not run: 
## analyzeSGP is Step 2 of 5 of abcSGP
Demonstration_SGP <- sgpData_LONG
Demonstration_SGP <- prepareSGP(Demonstration_SGP)
Demonstration_SGP <- analyzeSGP(Demonstration_SGP)

## Or (explicitly pass state argument)

Demonstration_SGP <- prepareSGP(sgpData_LONG)
Demonstration_SGP <- analyzeSGP(Demonstration_SGP, state="DEMO")

###
###  Example uses of the sgp.config argument
###

#  Use only 3 years of Data, for grades 3 to 6
#  and only perform analyses for most recent year (2012)

my.custom.config <- list(
MATHEMATICS.2013_2014 = list(
	sgp.content.areas=rep("MATHEMATICS", 3), # Note, must be same length as sgp.panel.years
	sgp.panel.years=c('2011_2012', '2012_2013', '2013_2014'),
	sgp.grade.sequences=list(3:4, 3:5, 4:6)),
READING.2013_2014 = list(
	sgp.content.areas=rep("READING", 3),
	sgp.panel.years=c('2011_2012', '2012_2013', '2013_2014'),
	sgp.grade.sequences=list(3:4, 3:5, 4:6)))

Demonstration_SGP <- prepareSGP(sgpData_LONG)
Demonstration_SGP <- analyzeSGP(Demonstration_SGP,
	sgp.config=my.custom.config,
	sgp.percentiles.baseline = FALSE,
	sgp.projections.baseline = FALSE,
	sgp.projections.lagged.baseline = FALSE,
	simulate.sgps=FALSE)


##  Another example sgp.config list:

#  Use different CONTENT_AREA priors, and only 1 year of prior data
my.custom.config <- list(
MATHEMATICS.2013_2014.READ_PRIOR = list(
	sgp.content.areas=c("READING", "MATHEMATICS"),
	sgp.panel.years=c('2012_2013', '2013_2014'),
	sgp.grade.sequences=list(3:4, 4:5, 5:6)),
READING.2013_2014.MATH_PRIOR = list(
	sgp.content.areas=c("MATHEMATICS", "READING"),
	sgp.panel.years=c('2012_2013', '2013_2014'),
	sgp.grade.sequences=list(3:4, 4:5, 5:6)))


## An example showing multiple priors within a single year

Demonstration_SGP <- prepareSGP(sgpData_LONG)

DEMO.config <- list(
READING.2012_2013 = list(
	sgp.content.areas=c("MATHEMATICS", "READING", "MATHEMATICS", "READING", "READING"),
	sgp.panel.years=c('2010_2011', '2010_2011', '2011_2012', '2011_2012', '2012_2013'),
	sgp.grade.sequences=list(c(3,3,4,4,5), c(4,4,5,5,6), c(5,5,6,6,7), c(6,6,7,7,8))),
MATHEMATICS.2012_2013 = list(
	sgp.content.areas=c("READING", "MATHEMATICS", "READING", "MATHEMATICS", "MATHEMATICS"),
	sgp.panel.years=c('2010_2011', '2010_2011', '2011_2012', '2011_2012', '2012_2013'),
	sgp.grade.sequences=list(c(3,3,4,4,5), c(4,4,5,5,6), c(5,5,6,6,7), c(6,6,7,7,8))))

Demonstration_SGP <- analyzeSGP(
		Demonstration_SGP,
		sgp.config=DEMO.config,
		sgp.projections=FALSE,
		sgp.projections.lagged=FALSE,
		sgp.percentiles.baseline=FALSE,
		sgp.projections.baseline=FALSE,
		sgp.projections.lagged.baseline=FALSE,
		sgp.config.drop.nonsequential.grade.progression.variables=FALSE)


###
###  Example uses of the parallel.config argument
###

##  Windows users must use a snow socket cluster:
#  possibly a quad core machine with low RAM Memory
#  4 workers for percentiles, 2 workers for projections.
#  Note the PSOCK type cluster is used for single machines.

Demonstration_SGP <- prepareSGP(sgpData_LONG)
Demonstration_SGP <- analyzeSGP(Demonstration_SGP,
	parallel.config=list(
		BACKEND="PARALLEL", TYPE="PSOCK",
		WORKERS=list(PERCENTILES=4,
                    PROJECTIONS=2,
                    LAGGED_PROJECTIONS=2,
                    BASELINE_PERCENTILES=4))

##  New parallel package - only available with R 2.13 or newer
#  Note there are up to 16 workers, and MPI is used,
#  suggesting this example is for a HPC cluster, possibly Windows OS.
	...
	parallel.config=list(
		BACKEND="PARALLEL", TYPE="MPI",
		WORKERS=list(PERCENTILES=16,
                    PROJECTIONS=8,
                    LAGGED_PROJECTIONS=6,
                    BASELINE_PERCENTILES=12))
	...

## FOREACH use cases:
	...
	parallel.config=list(
		BACKEND="FOREACH", TYPE="doParallel",
		WORKERS=3)
	...


#  NOTE:  This list of parallel.config specifications is NOT exhaustive.
#  See examples in analyzeSGP documentation for some others.0

###
###  Advanced Example: restrict years, recalculate baseline SGP
###  coefficient matrices, and use parallel processing
###

#  Remove existing DEMO baseline coefficient matrices from
#  the SGPstateData object so that new ones will be computed.

SGPstateData$DEMO$Baseline_splineMatrix <- NULL

#  set up a customized sgp.config list

	. . .

#  set up a customized sgp.baseline.config list

	. . .

#  to be completed


## End(Not run)

SGP

Student Growth Percentiles & Percentile Growth Trajectories

v1.9-5.0

GPL-3

Authors

Damian W. Betebenner [aut, cre], Adam R. Van Iwaarden [aut], Ben Domingue [aut], Yi Shang [aut], Jonathan Weeks [ctb], John Stewart [ctb], Jinnie Choi [ctb], Xin Wei [ctb], Hi Shin Shim [ctb], Xiaoyuan Tan [ctb] (Arizona Department of Education), Carrie Giovannini [ctb] (Arizona Department of Education), Sarah Polasky [ctb] (Arizona State University), Rebecca Gau [ctb] (Arizona Charter School Association), Jeffrey Dean [ctb] (University of Arkansas), William Bonk [ctb] (Colorado Department of Education), Marie Huchton [ctb] (Colorado Department of Education), Allison Timberlake [ctb] (Georgia Department of Education), Qi Qin [ctb] (Georgia Department of Education), Melissa Fincher [ctb] (Georgia Department of Education), Kiran Athota [ctb] (Georgia Department of Education), Travis Allen [ctb] (Georgia Department of Education), Glenn Hirata [ctb] (Hawaii Department of Education), Glenn Nochi [ctb] (Hawaii Department of Education), Joshua Lee [ctb] (Hawaii Department of Education), Ayaka Nukui [ctb] (Idaho Department of Education), Carissa Miller [ctb] (Idaho Department of Education), Matthew Raimondi [ctb] (Elgin Area School District U46 (Illinois)), Wes Bruce [ctb] (Indiana Department of Education), Robert Hochsegang [ctb] (Indiana Department of Education), Tony Moss [ctb] (Kansas State Department of Education), Xuewen Sheng [ctb] (Kansas State Department of Education), Kathy Flanagan [ctb] (Massachusetts Department of Elementary and Secondary Education), Robert Lee [ctb] (Massachusetts Department of Elementary and Secondary Education), Ji Zeng [ctb] (Michigan Department of Education), Steve Viger [ctb] (Michigan Department of Education), Joe DeCastra [ctb] (Mississippi Department of Education), Ken Thompson [ctb] (Mississippi Department of Education), Soo Yeon Cho [ctb] (Missouri Department of Education), Jeff Halsell [ctb] (Clark County School District, Nevada), Selcuk Ozdemir [ctb] (Washoe County School District, Nevada), Roger Silva [ctb] (Nevada Department of Education), Deb Wiswell [ctb] (New Hampshire Department of Education), Katya Levitan-Reiner [ctb] (New Haven Public Schools), Catherine McCaslin [ctb] (New Haven Public Schools), Joshua Marland [ctb] (New York Education Department), W Joshua Rew [ctb] (Oregon Department of Education), Jason Becker [ctb] (Rhode Island Department of Education), Jessica Bailey [ctb] (Rhode Island Department of Education), Ana Karantonis [ctb] (Rhode Island Department of Education), Deborah Jonas [ctb] (Virginia Department of Education), Juan D'Brot [ctb] (West Virginia Department of Education), Nate Hixson [ctb] (West Virginia Department of Education), Deb Came [ctb] (Washington Office of Superintendent of Public Instruction), Ashley Colburn [ctb] (Washington Office of Superintendent of Public Instruction), Nick Hassell [ctb] (Washington Office of Superintendent of Public Instruction), Krissy Johnson [ctb] (Washington Office of Superintendent of Public Instruction), Daniel Bush [ctb] (Wisconsin Department of Education), Justin Meyer [ctb] (Wisconsin Department of Education), Joseph Newton [ctb] (Wisconsin Department of Education), Nick Stroud [ctb] (Wisconsin Department of Education), John Paul [ctb] (Wyoming Department of Education), Michael Flicek [ctb] (Michael Flicek Projects LLC working with Wyoming Department of Education), Phyllis Clay [ctb] (Albuquerque Public Schools), Peter Kinyua [ctb] (Albuquerque Public Schools), Brendan Houng [ctb] (University of Melbourne, Australia, NAPLAN), Leslie Rosale [ctb] (Ministry of Education, Guatemala), Nathan Wall [ctb] (eMetric working with Nevada Department of Education and South Dakota Department of Education), Narek Sahakyan [ctb] (World Class Instruction and Design (WIDA))

Initial release

2020-1-30