f.spade {SPADE.RIVM}R Documentation

SPADE: a Statistical Program to Assess Dietary Exposure

Description

Function for the estimation of habitual intake distributions as function of age, based on two short-term intakes per person.
Daily and episodical intakes can be handled by f.spade.
Results can be presented per age unit or per age class and scaled survey weights are allowed.
Comparison with cut-off values (EAR 2x, AI 1x, UL 2x) are integrated in SPADE.

Usage

	function (frml.ia, frml.if, data, min.age , max.age, sex.lab, 
		weights.name = NULL, lambda = NULL, ia.method="one.pos", 
		outlier.ok=F, automatic.ok = T, backtrans.nr = 1, n.ppa = 5000,
		eps.prob = 0.0001, eps.norm = 0.0001, prb = c(0.05, 0.25, 0.5, 0.75, 0.95), 
		EAR.names = NULL, EAR.distr = NULL, EAR.vc = NULL, AI.names = NULL, 
		UL.names = NULL, age.classes = NULL, dt.pop = NULL, 
		verbose=T, plot.dev = 4, colors.ok=T, R.ok = F,	
		spade.output.path=SPADE.OUTPUT.PATH, csv.output.path=CSV.OUTPUT.PATH,
		output.name = NULL, dgts = 3, dgts.distr = 0, dgts.dri = 1,
		bootstrap.ok=F, boot3.ok=F, boot.dt.name=NULL)

Arguments

Arguments without the = sign have to be defined by the user. The values after arguments with the = sign are the Default values. These values are used by f.spade, without being explicitely in the user command.

frml.ia

formula, for modelling the intake amounts, e.g. frml.ia = response ~ fp(age), where response is the micronutrient, food or food component to model and fp(age) means a fractional polynomial of age. At this moment no other covariables are allowed.

frml.if

formula or string, the formula for the intake frequencies.
For daily intakes use frml.if = "no.if" , to indicate no intake frequency modelling is needed.
For episodical intakes use frml.if = response ~ cs(age), where response is the micronutrient, food or food component to model and ~cs(age) means as a cubic spline of age, a more flexible way of modelling the intake frequencies as function of age than fp(age).

data

data frame, with exactly 2 observations (rows) per individual and with three mandatory variable names: id = the individual's id / age = the individual's age / sex = the individual's gender.
Obviously the variable response in the two formulas above should be a column name in data.

min.age

number, the minimum age in the analysis.

max.age

number, the maximum age in the analysis.

sex.lab

string, (name between "quotes"), which indicates the gender in the analysis
use sex.lab = "male" or sex.lab = "men" for men
use sex.lab = "female" or sex.lab = "women" for women
use sex.lab = "both" for men and women together.

weights.name

string, the name of the column with survey weights in data
use e.g. weights.name = "weights" for column name "weights".

lambda

number, default NULL, which means that lambda is estimated by ML. In case one likes a log-transformation, use lambda=0, or a square root transformation use lambda=0.5.

ia.method

string, indicating the method for modelling the intake amounts for episodical intakes
Default ia.method="one.pos" means that all positive intakes of individuals with one positive intake and one zero intake in two survey days are used and randomly one of the two positive intakes of individuals with no zero intakes.
ia.method="two.pos" means that only the intakes of the individuals with two positive intakes are used and no positive intakes of individuals with one positive and one zero intake on both survey days.

outlier.ok

logical, option for detecting potential outliers in the amounts part during the fit of the model. Default outlier.ok=FALSE or briefly outlier.ok=F.
Use outlier.ok=T or outlier.ok=TRUE to detect outliers.

automatic.ok

logical, default automatic.ok=T means f.spade runs the analysis non-stopping. If one likes to run f.spade step by step use option automatic.ok=F and one is asked Enter to continue at the stopping points, marked by output on the screen (text or plots).

backtrans.nr

number, indicating the way of back-transformation. Default backtrans.nr = 1, for a fast back-transformation.
backtrans.nr = 0 means no back-transformation, only model fitting.
backtrans.nr = 1 means for daily intakes an exact back-transformation and for episodical intakes a Monte Carlo simulation with n.ppa persons per age for the convolution of the intake probability distribution and the habitual amounts distribution to obtain the habitual intake distribution.
backtrans.nr = 2 means back-transformation by pseudo sampling, simulating n.ppa pseudo persons per individual.

n.ppa

number, the number of persons per age (*) or the number of pseudo-persons per individual (**) to simulate in the back.transformation.
(*) = episodical intakes with convolution (backtrans.nr = 1)
(**) = daily or episodical intakes with pseudo persons (backtrans.nr = 2)

eps.prob

number, default is eps.prob = 0.001, the precision of the numerical approximation of the percentiles of the habitual intake distribution, for daily intakes and backtrans.nr=1.
This options is only for experts in R.

eps.norm

number, default is eps.norm = 0.001, the precision of the numerical approximation of the proportions under or above a threshold, for daily intakes and backtrans.nr=1.
This options is only for experts in R.

prb

vector, with probabilities for the percentiles of the habitual intake distribution to be reported.
Default prb = c(0.05, 0.25, 0.5, 0.75, 0.95) for the 5%, 25% 50% 75% and 95% percentiles respectively.

EAR.names

character vector, with the name(s) of the R data frames with the thresholds like EAR's, for which the proportion <= the threshold is estimated.
Use e.g. EAR.names="EAR_men" or for two EARs at once EAR.names=c("NL_EAR_men","EFSA.EAR.men").
See section Details for the requirements of these data frames.

EAR.distr

string, indicating the distribution ("normal" or "lognormal") to use for a probabilistic approach of the thresholds.
Default EAR.distr=NULL, which means that this option is not used. Use e.g. EAR.distr="lognormal".
This option can only be used in combination with EAR.vc.

EAR.vc

number, indicating the variation coefficient of EAR.dist. Default EAR.vc=NULL, which means that this option is not used.
This option can only be used in combination with EAR.distr.

AI.names

name, only one name of a data frame with AI thresholds is allowed. The comparison is only qualitative

UL.names

vector of one or two names. See EAR.names.

age.classes

vector of numbers, to define the age classes to be reported.
Use, e.g. age.classes=c(6,12,19,30) to define three age classes (6,12], (12,19] and (19,30] where (6,12] mean the ages 7,8,...,12, etc. So if min.age = 7 age.classes starts with 6. It is also possible to make the model for age 7,8,...,69 and report for classes not covering all ages, e.g. age.classes=c(19,30,50) or to report only one overall age class by age.classes=c(6,69). The overall age class is default reported except for the default age.classes=NULL, which means report for all ages.

dt.pop

data frame, with the population numbers for men and women. Consists in three mandatory columns called age (1,2,...100), pop.male and pop.female.

verbose

logical, default verbose=T, results are printed to the screen.

plot.dev

number, indicating how SPADE produces the output plots. Options are
plot.dev = 0 - no graphical output, plot.dev = 1 - graphical output to distinct plot windows, plot.dev = 2 - graphical output to distinct JPG files , plot.dev = 3 - graphical output to distinct WMF files , plot.dev = 4 - default graphical output to one (or more) PDF files.

colors.ok

logical, indicating to use colors (Default colors.ok=T) or to use grey scaled output for publishing colors.ok=F.

R.ok

logical, default R.ok=Foption for using only R during the back-transformation, which is very slow R.ok=T, or using a fast compiled Fortran algorithm R.ok=F.

spade.output.path

string, indicating the path for the SPADE reports of the analysis and the plots. Default spade.output.path=SPADE.OUTPUT.PATH which enables the user to set outside f.spade this path by creating SPADE.OUTPUT.PATH and by creating in e.g. Windows Explorer the corresponding folder.
E.g. SPADE.OUTPUT.PATH <- "N:/Project/Workshop/".
If the object SPADE.OUTPUT.PATH does not exist in the actual working directory, SPADE creates it by SPADE.OTUPUT.PATH <- "0_SPADEresults/".
SPADE creates the folder automatically in the actual working directory, if the folder does not exist.

csv.output.path

string, indicating the path for the SPADE csv or Excel output files with the percentiles and proportions of the habitual intake distribution. Default csv.output.path=CSV.OUTPUT.PATH and if this object does not exist in the actual working directory, SPADE creates
CSV.OUTPUT.PATH <- "1_CSVresults/".
SPADE creates the folder automatically in the actual working directory, if the folder does not exist.
Again the use can outside f.spade define CSV.OUTPUT.PATH and the corresponding folder.

output.name

string, offers the user to define a specific output name of the analysis Default output.name=NULL and a standard output name is generated inside SPADE, e.g. spade.folate.f.7.69... in the first example below.

dgts

number, of digits in the output of the analysis steps in SPADE. Default dgts=3. This applies not to the reported percentiles of the habitual intake distribution, see dgts.distr, or the proportions below or above a threshold, see dgts.dri.

dgts.distr

number, of digits in the reported percentiles of the habitual intake distribution. Default dgts.distr=0

dgts.dri

number, of digits in the reported proportions below or above a threshold. Default dgts.dri=1

bootstrap.ok

logical, indicating if f.spade is used in a bootstrap or not. Default bootstrap.ok=F. This argument is needed for programming purposes and transparency and should not be changed by the user.

boot3.ok

logical, indicating if f.spade is used by f.spade3.bootstrap or not. Default boot3.ok=F. This argument is needed for programming purposes and transparency and should not be changed by the user.

boot.dt.name

string, indicating the name of the bootstrap object. This argument is needed for programming purposes and transparency and should not be changed by the user.

Details

For the theoretical background see the manual.
For the definitions of the data frames needed for SPADE see ...

Value

SPADE writes all output to files outside R to facilitate the batch processing of many micronutrients, foods or foods components.

spade.output.path

All text files (txt) and output plots (pdf, jpg or wmf) are saved in spade.output.path
If output.name=NULL, the basic output name is made automatically in f.spade and becomes spade.c1.c2.c3.c4 where
c1 = name of the compound, e.g. folate or potato
c2 = m for male, f for female and b for both
c3 = min.age
c4 = max.age
Example: the analysis of folate for men in age 7-69 gets the output name spade.folate.f.7.69 and then the following three extensions:
The file with the initial model definition is called spade.c1.c2.c3.c4_date&time.000.txt
the complete log file is called spade.c1.c2.c3.c4_date&time.999.txt
and the pdf file is called spade.c1.c2.c3.c4_date&time.pdf
In this way all analyzes get unique names in the same folder.

csv.output.path

In folder csv.output.path the output table(s) are saved in csv format and for the bootstrap also in Excel files.
For daily intakes, the output table is saved in (in csv format) spade.c1.c2.c3.c4_HI.csv.
For episodically intakes, three output tables are saved:
spade.c1.c2.c3.c4_HI.csv for the habitual intake distribution output table,
spade.c1.c2.c3.c4_freq.csv for the frequency distribution table
spade.c1.c2.c3.c4_amnt.csv for the habitual amounts distribution table.
The last two tables are partial results for the Monte Carlo convolution or are used in the pseudo sampling back-transformation. The habitual amounts distribution is the distribution of the amounts given that the intakes are positive (see also argument ia.method).

"bin_results"

Since R is working completely in memory, the model together with the model results are saved in a list in the folder "bin_results/" of the actual working directory as "*.model_bin", where "*" stand for the output name without date and time.
The final table is saved in the same folder with the name *.table.bin and can be used by loading manually into R with command
load("bin_results/*.table.bin").

Note

The help of this SPADE.RIVM package is concise, since a detailed manual is available at www.spade.nl.
The data used in this manual is available in two different ways. The first way is needed to do the examples in the manual yourself, e.g. to import data of different types into R.

Import data from Excel, csv or sas7bdata files
These files can be found in the library folder SPADE.RIVM/extdata.
The SPADE.RIVM folder can be found in your library folder, which is the first folder, localized by the R statement .libPaths()

Import data directly from R The R statement data(package="SPADE.RIVM") shows all available data in SPADE.RIVM.
Object obj is imported into your actual working directory by the R statement data(obj).
The description of the content of the data objects can be found in the manual.

Author(s)

Arnold Dekkers PhD
Janneke Verkaik-Kloosterman PhD
Marga Ocké PhD
National Institute of Public Health and the Environment (RIVM)
Bilthoven, the Netherlands
email: spade@rivm.nl (the SPADE TEAM)

References

Type in R the following statement
citation("SPADE.RIVM")
to see the citations of the 2014 paper and the manual or visit the SPADE website www.spade.nl

Examples

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.
data(DNFCS) 
# Copy data \code{DNFCS} (Dutch National Food Cunsumption Survey 2007-2010
# to your working directory
# Example 1-part model
f.spade(frml.ia=folate~fp(age),frml.if="no.if",data=DNFCS,
	min.age=7, max.age=69,sex.lab="female")
	

# Copy population, a data frame with Dutch population number of 2008
data(population2008) 
f.spade(frml.ia=potato~fp(age),frml.if=potato~cs(age),data=DNFCS,
  min.age=7, max.age=69,sex.lab="both",age.classes=c(6,13,19,30,50,69),
  dt.pop=population2008)

[Package SPADE.RIVM version 3.2.07 Index]