Package 'ImputeRobust' reference manual

Title:	Robust Multiple Imputation with Generalized Additive Models for Location Scale and Shape
Description:	Provides new imputation methods for the 'mice' package based on generalized additive models for location, scale, and shape (GAMLSS) as described in de Jong, van Buuren and Spiess <doi:10.1080/03610918.2014.911894>.
Authors:	Daniel Salfran [aut, cre], Martin Spieß [aut, ths]
Maintainer:	Daniel Salfran <[email protected]>
License:	GPL-3
Version:	1.3-1
Built:	2025-03-31 04:10:27 UTC
Source:	https://github.com/dsalfran/imputerobust

Multiple Imputation with Generalized Additive Models for Location, Scale, and Shape

Description

De Jong (2012), De Jong, van Buuren and Spiess (2016) introduced a new imputation method based on generalized additive models for location, scale, and shape (Rigby and Stasinopoulos, 2005), which is a class of univariate regression models, where the assumption of an exponential family is relaxed and replaced by a general distribution family. This allows the a more flexible modelling than standard parametric imputation models of not only the location (e.g. the mean), but also the scale (e.g. variance), and the shape (e.g., skewness and kurtosis) of the conditional distribution of the dependent variable given all other variables.

Author(s)

Daniel Salfran [email protected]

Martin Spiess [email protected]

References

de Jong, R., van Buuren, S. & Spiess, M. (2016) Multiple Imputation of Predictor Variables Using Generalized Additive Models. Communications in Statistics – Simulation and Computation, 45(3), 968–985.

de Jong, Roel. (2012). “Robust Multiple Imputation.” Universität Hamburg. http://ediss.sub.uni-hamburg.de/volltexte/2012/5971/.

Rigby, R. A., and Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54 (3): 507–54.

GAMLSS bootstrap method

Description

Creates a random generation function for the missing values with bootstrap sample from the fitted GAMLSS model for the completely observed data.

Usage

ImpGamlssBootstrap(incomplete.data, fit, R, ...)
ImpGamlssBootstrap(incomplete.data, fit, R, ...)

Arguments

`incomplete.data`	Data frame with missings on one variable.
`fit`	Random sample generator method.
`R`	Boolean matrix with the response indicator.
`...`	extra arguments for the control of the gamlss fitting function

Value

Returns a imputation sample generator.

GAMLSS imputation fit

Description

This function takes a data set to fit a gamlss model and another to predict the expected parameters values. It returns a function that will generate a vector of random observations for the predicted parameters. The amount of random observations is the number of units on the dataset used to get such predictions.

Usage

ImpGamlssFit(data, new.data, family, n.ind.par, gam.mod,
  mod.planb = list(type = "pb", par = list(degree = 1, order = 1)),
  n.par.planb = n.ind.par, lin.terms = NULL, n.cyc = 5, bf.cyc = 5,
  cyc = 5, forceNormal = FALSE, trace = FALSE, ...)
ImpGamlssFit(data, new.data, family, n.ind.par, gam.mod,
  mod.planb = list(type = "pb", par = list(degree = 1, order = 1)),
  n.par.planb = n.ind.par, lin.terms = NULL, n.cyc = 5, bf.cyc = 5,
  cyc = 5, forceNormal = FALSE, trace = FALSE, ...)

Arguments

`data`	Completely observed data frame to be used to fit a gamlss model estimate.
`new.data`	Data frame used to predict the parameter values for some given right side x-values on the gamlss model.
`family`	Family to be used for the response variable on the GAMLSS estimation.
`n.ind.par`	Number of individual parameters to be fitted. Currently it only allows one or two because of stability issues for more parameters.
`gam.mod`	list with the parameters of the GAMLSS imputation model.
`mod.planb`	list with the parameters of the alternative GAMLSS imputation model.
`n.par.planb`	number of individual parameters in the alternative model.
`lin.terms`	Character vector specifying which (if any) predictor variables should enter the model linearly.
`n.cyc`	number of cycles of the gamlss algorithm
`bf.cyc`	number of cycles in the backfitting algorithm
`cyc`	number of cycles of the fitting algorithm
`forceNormal`	Flag that if set to 'TRUE' will use a normal family for the gamlss estimation as a last resource.
`trace`	whether to print at each iteration (TRUE) or not (FALSE)
`...`	extra arguments for the control of the gamlss fitting function

Value

Returns a method to generate random samples for the fitted gamlss model using "new.data" as covariates.

Multiple Imputation with Generalized Additive Models for Location, Scale, and Shape.

Description

Imputes univariate missing data using a generalized model for location, scale and shape.

Usage

mice.impute.gamlss(y, ry, x, family = NO, n.ind.par = 2,
  fitted.gam = NULL, gam.mod = list(type = "pb"), EV = TRUE, ...)

mice.impute.gamlssNO(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssBI(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssJSU(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssPO(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssTF(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssGA(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssZIBI(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssZIP(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

fit.gamlss(y, ry, x, family = NO, n.ind.par = 2, gam.mod = list(type
  = "pb"), ...)
mice.impute.gamlss(y, ry, x, family = NO, n.ind.par = 2,
  fitted.gam = NULL, gam.mod = list(type = "pb"), EV = TRUE, ...)

mice.impute.gamlssNO(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssBI(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssJSU(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssPO(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssTF(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssGA(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssZIBI(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

mice.impute.gamlssZIP(y, ry, x, fitted.gam = NULL, EV = TRUE, ...)

fit.gamlss(y, ry, x, family = NO, n.ind.par = 2, gam.mod = list(type
  = "pb"), ...)

Arguments

`y`	Numeric vector with incomplete data.
`ry`	Response pattern of 'y' ('TRUE'=observed, 'FALSE'=missing).
`x`	Design matrix with 'length(y)' rows and 'p' columns containing complete covariates.
`family`	Distribution family to be used by GAMLSS. It defaults to NO but a range of families can be defined by calling the corresponding "gamlssFAMILY" method.
`n.ind.par`	Number of parameters from the distribution family to be individually estimated.
`fitted.gam`	A predefined bootstrap gamlss method returned by `fit.gamlss`. Mice by default refits the model with each imputation. The parameter is here for a future faster modified mice function.
`gam.mod`	list with the parameters of the GAMLSS imputation model.
`EV`	Logical value to determine whether to correct or not extreme imputed values. This can arise due to too much flexibility of the gamlss model.
`...`	extra arguments for the control of the gamlss fitting function

Details

Imputation of y using generalized additive models for location, scale, and shape. A model is fitted with the observed part of the data set. Then a bootstrap sample is generated and used to refit the model and generate imputations.

The function fit.gamlss handles the fitting and the bootstrap and returns a method to generated imputations.

Being gamlss a flexible non parametric method, there may be problems with the fitting and imputation depending on the sample size. The imputation functions try to handle anomalies automatically, but results should be still inspected.

Value

Numeric vector with imputed values for missing y values

Author(s)

Daniel Salfran [email protected]

References

de Jong, Roel. (2012). “Robust Multiple Imputation.” Universität Hamburg. http://ediss.sub.uni-hamburg.de/volltexte/2012/5971/.

Rigby, R. A., and Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54 (3): 507–54.

Examples


require(lattice)
# Create the imputed data sets

predMat <- matrix(rep(0,25), ncol = 5)
predMat[4,1] <- 1
predMat[4,5] <- 1
predMat[2,1] <- 1
predMat[2,5] <- 1
predMat[2,4] <- 1
predMat[3,1] <- 1
predMat[3,5] <- 1
predMat[3,4] <- 1
predMat[3,2] <- 1
imputed.sets <- mice(sample.data, m = 2,
                     method = c("", "gamlssPO",
                                "gamlss", "gamlssBI", ""),
                     visitSequence = "monotone",
                     predictorMatrix = predMat,
                     maxit = 1, seed = 973,
                     n.cyc = 1, bf.cyc = 1,
                     cyc = 1)

fit <- with(imputed.sets, lm(y ~ X.1 + X.2 + X.3 + X.4))
summary(pool(fit))

stripplot(imputed.sets)

require(lattice)
# Create the imputed data sets

predMat <- matrix(rep(0,25), ncol = 5)
predMat[4,1] <- 1
predMat[4,5] <- 1
predMat[2,1] <- 1
predMat[2,5] <- 1
predMat[2,4] <- 1
predMat[3,1] <- 1
predMat[3,5] <- 1
predMat[3,4] <- 1
predMat[3,2] <- 1
imputed.sets <- mice(sample.data, m = 2,
                     method = c("", "gamlssPO",
                                "gamlss", "gamlssBI", ""),
                     visitSequence = "monotone",
                     predictorMatrix = predMat,
                     maxit = 1, seed = 973,
                     n.cyc = 1, bf.cyc = 1,
                     cyc = 1)

fit <- with(imputed.sets, lm(y ~ X.1 + X.2 + X.3 + X.4))
summary(pool(fit))

stripplot(imputed.sets)

Model creator

Description

This is a helper function to be used within the gamlss fitting procedure. It creates automatically a formula object for the variables named a given data frame. The dependent variable is the one in the first column and the rest are treated as independent.

Usage

ModelCreator(data, gam.model, lin.terms = NULL)
ModelCreator(data, gam.model, lin.terms = NULL)

Arguments

`data`	Data frame that will provide the named variables.
`gam.model`	List of mode parameter, containing the "type" with c("linear", "cs", "pb") as available choices and "par", an optional list parameter if the model is not linear.
`lin.terms`	Specify which predictors should be included linearly. For example, binary variables can be added directly as an additive term instead of defining a spline.

Value

Returns a formula object.

Sample data set with a monotone missing pattern

Description

A simple data set with monotone missing pattern

Format

A data frame with 200 rows on the following 5 variables

X.1: Numeric variable from a Normal distribution
X.2: Count data from a Poisson distribution
X.3: Numeric variable from a Normal distribution
X.4: Binary variable from a Binomial distribution
y: Response variable

Details

Sample data set with four predictors and a dependent variable. A missing monotone pattern was generated in three predictors to illustrate the gamlss imputation method.

For the data generation process a parameter beta equal to c(1.3, .8, 1.5, 2.5) and a predictor matrix X <- cbind(X.1, X.2, X.3, X.4) are defined. Then, the sample data set is created with the model y ~ X.1 + X.2 + X.3 + X.4.

Examples


head(sample.data)

head(sample.data)

Tropical Atmosphere Ocean (TAO) project data

Description

A sample from the Tropical Atmosphere Ocean (TAO) project data, downloaded from the GGOBI project.

Format

A data frame with 736 observations on the following 8 variables.

Year: a numeric vector
Latitude: a numeric vector
Longitude: a numeric vector
Sea.Surface.Temp: a numeric vector
Air.Temp: a numeric vector
Humidity: a numeric vector
UWind: a numeric vector
VWind: a numeric vector

Details

All cases recorded for five locations and two time periods.

Source

https://github.com/ggobi/ggobi/blob/master/data/tao.csv

Examples


head(tao)

head(tao)

Package 'ImputeRobust'

Help Index

Multiple Imputation with Generalized Additive Models for Location, Scale, and Shape

Description

Author(s)

References

GAMLSS bootstrap method

Description

Usage

Arguments

Value

GAMLSS imputation fit

Description

Usage

Arguments

Value

Multiple Imputation with Generalized Additive Models for Location, Scale, and Shape.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Model creator

Description

Usage

Arguments

Value

Sample data set with a monotone missing pattern

Description

Format

Details

Examples

Tropical Atmosphere Ocean (TAO) project data

Description

Format

Details

Source

Examples