Simulate a BIMETS model object
(Note: this is the html version of the reference manual. Please consider reading the pdf version of this reference manual, wherein there are figures and the mathematical expressions are better formatted than in html.)
The simulation of an econometric model basically consists in solving the system of the equations describing the model for each time period in the specified time interval. Since the equations may not be linear in the variables, and since the graph derived from the incidence matrix may be cyclic, the usual methods based on linear algebra are not applicable, and the simulation must be solved by using an iterative algorithm.
BIMETS simulation capabilities support:
- Static simulations: a static multiple equation simulation, in which the historical values for the lagged endogenous variables are used in the solutions of subsequent periods;
- Dynamic simulations: a dynamic simulation, in which the simulated values for the lagged endogenous variables are used in the solutions of subsequent periods;
- Residuals check: a single period, single equation simulation; simulated time series in output are just the computation of the RHS (right-hand-side) of their equation, by using the historical values of the involved time series and by accounting for error autocorrelation and PDLs, if any;
- Forecast simulations: similar to dynamic simulation, but during the initialization of the iterative algorithm the starting values of endogenous variables in a period are set equal to the simulated values of the previous period. This allows the simulation of future endogenous observations, i.e. the forecast;
- Partial or total exogenization of endogenous variables: in the provided time interval (i.e. partial exog.) or in the whole simulation time range (i.e. total exog.), the values of the selected endogenous variables can be definitely set equal to their historical values, by excluding their equations from the iterative algorithm of simulation;
- Constant adjustment of endogenous variables (add-factors): adds another exogenous time series - the "constant adjustment" - in the equation of the selected endogenous variables;
In details, the generic model suitable for simulation in BIMETS can be written as:
y_1=f_1(x,y) + z_1
...
y_n=f_n(x,y) + z_n
being:
n the number of equations in the model;
y=[y_1, ... , y_n] the n
-dimensional vector of the endogenous variables;
x=[x_1, ... , x_m] the m
-dimensional vector of the exogenous variables;
f_i(...), i=1..n any kind of functional expression able to be written by using the MDL
syntax;
As described later on, a modified Gauss-Seidel iterative algorithm can solve the system of equations. The convergence properties may vary depending on the model specifications. In some conditions, the algorithm may not converge for a specific model or a specific set of data.
A convergence criterion and a maximum number of iterations to be performed are provided by default. Users can change these criteria by using the simConvergence
and simIterLimit
arguments of the SIMULATE()
function.
The general conceptual scheme of the simulation process (for each time period) is the following:
1. initialize the solution for the current simulation period;
2. iteratively solve the system of equations;
3. save the solution, if any;
Step 2 means that for each iteration you will need to:
2.1 update the values of the current endogenous variables;
2.2 verify that the convergence criterion is satisfied or that the maximum number of allowed iterations has been reached;
The initial solution for the iterative process (step 1) can be given alternatively by:
- the historical value of the endogenous variables for the current simulation period (the default);
- the simulated value of the endogenous variables from the previous simulation period (this alternative is driven by the simType='FORECAST'
argument of the SIMULATE()
function);
In the "dynamic" simulations (i.e. simulations performed by using either the default simType='DYNAMIC'
or the simType='FORECAST'
), whenever lagged endogenous variables are needed in the computation, the simulated values of the endogenous variables y assessed in the previous time periods are used. In this case, the results of the simulation in a given time period depends on the results of the simulation in the previous time periods. This kind of simulation is defined as "multiple equation, multiple period".
As an alternative, the actual historical values can be used in the "static" simulations (i.e. simulations performed by using simType='STATIC'
) rather than simulated values whenever lagged endogenous variables are needed in the computations. In this case, the results of the simulation in a given time period does not depend on the results of the simulation in the previous time periods. This kind of simulation is defined as "multiple equation, single period".
The last simulation type available is the residual check (simType='RESCHECK'
). With this option a "single equation, single period" simulation is performed. In this case no iteration must be carried out. The endogenous variables are assessed for each single time period through the use of historical values for each variable on the right-hand side of the equation, for both lagged and current periods. This kind of simulation is very helpful for debugging and checking the logical coherence of the equations and the data, and can be used as a simple tool to compute the add-factors.
The debugging of the logical coherence of the model-equation and the data is carried out by means of a procedure called "Residual Check".
It consists in the following steps:
1. add another exogenous variable - the constant adjustment - to every equation of the model, both behavioral and technical identity (i.e. by using the ConstantAdjustment
argument of the SIMULATE()
function);
2. fill in with the estimated residuals all the constant adjustments for the behavioral equations;
3. fill in with zeroes the constant adjustments for the technical identities;
4. perform a simulation of the model with the simType='RESCHECK'
option;
5. compute the difference between the historical and the simulated values for all the endogenous variables;
6. check whether all the differences assessed in step 5 are zero in the whole time range;
If a perfect tracking of the history is obtained then the equations have been written coherently with the data, otherwise a simulated equation not tracking the historical values is an unambiguous symptom of data inconsistent with the model definition.
Aside from the residual check, the add-factors constitute an important tool to significantly improve the accuracy of forecasts made through an econometric model. Considering the following model:
y_1=f_1(x,y) + z_1
...
y_n=f_n(x,y) + z_n
the add-factors z=[z_1, ... ,z_n] can be interpreted as estimates of the future values of the disturbance terms or, alternatively, as adjustments of the intercepts in each equation. These add-factors round out the forecasts, by summarizing the effects of all the systematic factors not included in the model. One choice for the computation of the add-factors is given by past estimation residuals and past forecast errors or by an average of these errors. This consideration suggests an easy way of computing the add-factors:
1. add the constant adjustments to every equation of the model, both behavioral and technical identity;
2. fill in with zeroes all the constant adjustments;
3. solve the model, with the simType='RESCHECK'
option, in a time interval including some periods beyond the estimation sample;
4. compute the difference between the historical and the simulated values for each the endogenous variables;
5. average, or process in a suitable way, the difference arising from point 4 in the time periods beyond the estimation sample to compute the constant value to be used as an add-factor in the following forecasting exercises;
Please note that in case of equation that presents an LHS function the add-factor will be added before the application of the inverse function, i.e. during the simulation the following:
g_1(y_1)=f_1(x,y) + z_1
...
g_n(y_n)=f_n(x,y) + z_n
will be solved as:
y_1=g_1^(-1)( f_1(x,y) + z_1 )
...
y_n=g_n^(-1)( f_n(x,y) + z_n )
If a linear dependence from the simulated endogenous and the add-factor is preferred, users can manually insert an auxiliary equation w_i into the model definition, e.g. the following:
g_i(y_i)=f_i(x,y)
can be replaced by:
w_i=f_i(x,y)
y_i=g_i^(-1)(w_i)
During the simulation, the add-factors will be applied as in the following:
w_i=f_i(x,y) + v_i
y_i=g_i^(-1)(w_i) + z_i
given v_i, z_i as add-factors and the linear dependence from z_i and y_i.
THE OPTIMAL REORDERING |
In fact, the simulation process takes advantage of an appropriate ordering of the equations to increase the performances by iteratively solving only one subset of equations, while the others are solved straightforwardly. "...a different ordering of the equations can substantially affect the speed of convergence of the algorithm; indeed some orderings may produce divergence. The less feedback there is, the better the chances for fast convergence..." - Don, Gallo - Solving large sparse systems of equations in econometric models - Journal of Forecasting 1987.
The LOAD_MODEL
function builds the incidence matrix of the model, then defines the proper equation reordering. The incidence matrix is built from the equations of the model; it is a square matrix in which each row and each column represents an endogenous variable. If the (i,j)
element is equal to 1 then in the model definition the current value of the endogenous variable referred by the i
-row depends directly from the current value of the endogenous variable referred by the j
-column. You can see an incidence matrix example in the section "BIMETS package"
of this manual wherein the content of the kleinModel$incidence_matrix
variable is printed out.
In econometric models, the incidence matrix is usually very sparse. Only a few of the total set of endogenous variables are used in each equation. In this situation, ordering the equation in a certain sequence will lead to a sensible reduction of the number of iterations needed to achieve convergence. Reordering the equations is equivalent to rearranging rows and columns of the incidence matrix. In this way the incidence matrix might be made lower triangular for a subset of the equations.
For this subset, an endogenous variable determined in a specific equation has no incidence in any equation above it, although the same variable might have incidence in equations below it. Such a subset of equations is called recursive. Recursive systems are easy to solve. It is only necessary to solve each equation once if this is done in the right order. On the other hand, it is unlikely for the whole model to be recursive. Indeed the incidence graph is often cyclic, as in the Klein's model that presents the following circular dependecies in the incidence matrix: p <- w1 <- y <- p
as shown in the "BIMETS package"
section figure.
For a subset of the equations, some 1's will occur in the upper triangle of the incidence matrix for all possible orderings. Such subset of equations is called simultaneous. In order to be able to solve the endogenous variables in the simultaneous block of equations, an iterative algorithm has to be used. Nevertheless, the equations in the simultaneous block may be ordered so that the pattern of the 1's in the upper triangle of the incidence matrix forms a spike. The variables corresponding to the 1's in the upper triangle are called feedback variables.
(check the pdf version of this reference manual in order to see a graphical example)
The final pattern of an incidence matrix after the equation reordering generally features three blocks:
1. a recursive block (the pre-recursive block);
2. a simultaneous block;
3. another recursive block (the post-recursive block);
As said, the pre-recursive and the post-recursive blocks are lower triangular. Therefore the corresponding equations are solvable with a cascade substitution with no iteration. Just the simultaneous equations set needs an iterative algorithm to be solved. It is important to say that the convergence criterion may also be applied to these variables only: when the feedback variables converge, the rest of the simultaneous variables also do.
BIMETS builds and analyzes the incidence matrix of the model, and then it orders the equations in pre-recursive, post-recursive and simultaneous blocks. The simultaneous block is then analyzed in order to find a minimal set of feedback variables. This last problem is known to be NP-complete (Ref: Garey, Johnson - Computers and Intractability: a Guide to the Theory of NP-completeness - San Francisco, Freeman 1979).
The optimal reordering of the model equations is programmatically achieved through the use of an iterative algorithm applied to the incidence matrix that can produce 4 ordered lists of endogenous variables:
1. vpre
is the ordered list containing the names of the endogenous pre-recursive variables to be sequentially computed (by using their EQ>
definition in the MDL
) before the simulation iterative algorithm takes place;
2. vsim
is the ordered list containing the names of the endogenous variables to be sequentially computed during each iteration of the simulation iterative algorithm;
3. vfeed
is the list containing the names of the endogenous feedback variables;
4. vpost
is the ordered list containing the names of the endogenous post-recursive variables to be sequentially computed once the simulation iterative algorithm has converged;
If equations are reordered, the previous conceptual scheme is modified as follow:
- initialize the solution for the current simulation period;
- compute the pre-recursive equations (i.e. the equation of the endogenous variables in the vpre
ordered list);
- iteratively compute the system of simultaneous equations (i.e. the equation of the endogenous variables in the vsim
ordered list); for each iteration update the values of the current endogenous variables and verify that the convergence criterion is satisfied on the feedback variables or that the maximum number of iterations has been reached;
- compute the post-recursive equations (i.e. the equation of the endogenous variables in the vpost
ordered list);
- save the solutions;
Given y_(i,k) the value of the i-endogenous variable in the simultaneous block at the iteration k, with i the position of the equation in a reordered model, the modified Gauss-Seidel method simply takes for the approximation of the endogenous variable y_(i,k) the solution of the following:
y_(i,k)=f_i(x_1, ..., x_m, y_(1,k), ..., y_(i-1,k), y_(i,k-1),..., y_(n,k-1)
As said, the convergence is then tested at the end of each iteration on the feedback variables.
Newton's methods on a reordered model require the calculation of the Jacobian matrix on the feedback endogenous variables, i.e. at least f+2 iterations per simulation period, with f as the number of feedback variables. For large models (i.e. more than 30 feedback variables) if the overall required convergence is greater than 10^(-6) % the speedup over the Gauss-Siebel method is small or negative. Moreover the Gauss-Siebel method does not require a matrix inversion, therefore it is more robust against algebraical and numerical issues. For small models both methods are fast on modern computers.
The simulation of a non-trivial model, if computed by using the same data but on different hardware, software or numerical libraries, produces numerical differences. Therefore a convergence criterion smaller than 10^(-7) frequently leads to a local solution.
See Numerical methods for simulation and optimal control of large-scale macroeconomic models - Gabay, Nepomiastchy, Rachidi, Ravelli - 1980 for further information.
SIMULATE(model=NULL, TSRANGE=NULL, simType='DYNAMIC', simConvergence=0.01, simIterLimit=100, ZeroErrorAC=FALSE, BackFill=0, Exogenize=NULL, ConstantAdjustment=NULL, verbose=FALSE, verboseSincePeriod=0, verboseVars=NULL, MULTMATRIX=FALSE, TARGET=NULL, INSTRUMENT=NULL, MM_SHOCK=0.00001, quietly=FALSE, RESCHECKeqList=NULL, avoidCompliance=FALSE, ...)
model |
The BIMETS model object to be simulated. The simulation requires that all the behaviorals in the model have been previously estimated: all the behavioral coefficients (i.e. the estimation coefficients and the autoregression coefficients for the errors, if any) must be numerically defined in the model object. (see also |
TSRANGE |
The time range of the simulation, as a four dimensional numerical array, |
simType |
The simulation type requested: |
simConvergence |
The percentage convergence value requested for the iterative process, that stops when the percentage difference of all the feedback variables between iterations is less than |
simIterLimit |
The value representing the maximum number of iterations to be performed. The iterative procedure will stop when |
ZeroErrorAC |
If |
BackFill |
Defined as an |
Exogenize |
A named list that specifies the endogenous variables to be exogenized. During the simulation and inside the provided time range, the exogenized endogenous variables will be assigned to their historical values. List names must be the names of the endogenous variables to be exogenized; each element of this list contains the time range of the exogenization for the related endogenous variable, in the form of a 4-dimensional integer array, i.e. start_year, start_period, end_year, end_period. A list element can also be assigned |
ConstantAdjustment |
A named list that specifies the constant adjustments (i.e. add factors) to be added to the selected equations of the model. Each constant adjustment can be see as a new exogenous variable added to the equation of the specified endogenous variable. The list names are the names of the involved endogenous variables; each element of this is list contains the time series to be added to the equation of the related endogenous variable. Each provided time series must verify the compliance control check defined in |
verbose |
If |
verboseSincePeriod |
An integer that activate the verbose output, during the iterative process, only after the provided number of simulation periods; |
verboseVars |
A |
MULTMATRIX |
It is |
TARGET |
A |
INSTRUMENT |
A |
MM_SHOCK |
The value of the shocks added to variables in the derivative calculation of the multipliers. The default value is |
quietly |
If |
RESCHECKeqList |
If |
avoidCompliance |
If |
... |
Backward compatibility. |
This function will add a new element named simulation
into the output BIMETS model object.
The new simulation
element is a named R list; the names of the simulation
list are the names of the endogenous variables of the model; each element of the simulation
list contains the simulated time series of the related endogenous variable (see examples).
The simulation
list also contains the '__SIM_PARAMETERS__'
element that contains the arguments passed to the function call during the latest SIMULATE()
run, e.g. TSRANGE
, symType
, simConvergence
, symIterLimit
, Exogenize
, etc.: that can be useful in order to replicate the simulation results.
#define model myModelDefinition= "MODEL COMMENT> Klein Model 1 of the U.S. Economy COMMENT> Consumption BEHAVIORAL> cn TSRANGE 1921 1 1941 1 EQ> cn = a1 + a2*p + a3*TSLAG(p,1) + a4*(w1+w2) COEFF> a1 a2 a3 a4 COMMENT> Investment BEHAVIORAL> i TSRANGE 1921 1 1941 1 EQ> i = b1 + b2*p + b3*TSLAG(p,1) + b4*TSLAG(k,1) COEFF> b1 b2 b3 b4 COMMENT> Demand for Labor BEHAVIORAL> w1 TSRANGE 1921 1 1941 1 EQ> w1 = c1 + c2*(y+t-w2) + c3*TSLAG(y+t-w2,1) + c4*time COEFF> c1 c2 c3 c4 COMMENT> Gross National Product IDENTITY> y EQ> y = cn + i + g - t COMMENT> Profits IDENTITY> p EQ> p = y - (w1+w2) COMMENT> Capital Stock IDENTITY> k EQ> k = TSLAG(k,1) + i END" #define model data myModelData=list( cn =TIMESERIES(39.8,41.9,45,49.2,50.6,52.6,55.1,56.2,57.3,57.8,55,50.9, 45.6,46.5,48.7,51.3,57.7,58.7,57.5,61.6,65,69.7, START=c(1920,1),FREQ=1), g =TIMESERIES(4.6,6.6,6.1,5.7,6.6,6.5,6.6,7.6,7.9,8.1,9.4,10.7,10.2,9.3,10, 10.5,10.3,11,13,14.4,15.4,22.3, START=c(1920,1),FREQ=1), i =TIMESERIES(2.7,-.2,1.9,5.2,3,5.1,5.6,4.2,3,5.1,1,-3.4,-6.2,-5.1,-3,-1.3, 2.1,2,-1.9,1.3,3.3,4.9, START=c(1920,1),FREQ=1), k =TIMESERIES(182.8,182.6,184.5,189.7,192.7,197.8,203.4,207.6,210.6,215.7, 216.7,213.3,207.1,202,199,197.7,199.8,201.8,199.9, 201.2,204.5,209.4, START=c(1920,1),FREQ=1), p =TIMESERIES(12.7,12.4,16.9,18.4,19.4,20.1,19.6,19.8,21.1,21.7,15.6,11.4, 7,11.2,12.3,14,17.6,17.3,15.3,19,21.1,23.5, START=c(1920,1),FREQ=1), w1 =TIMESERIES(28.8,25.5,29.3,34.1,33.9,35.4,37.4,37.9,39.2,41.3,37.9,34.5, 29,28.5,30.6,33.2,36.8,41,38.2,41.6,45,53.3, START=c(1920,1),FREQ=1), y =TIMESERIES(43.7,40.6,49.1,55.4,56.4,58.7,60.3,61.3,64,67,57.7,50.7,41.3, 45.3,48.9,53.3,61.8,65,61.2,68.4,74.1,85.3, START=c(1920,1),FREQ=1), t =TIMESERIES(3.4,7.7,3.9,4.7,3.8,5.5,7,6.7,4.2,4,7.7,7.5,8.3,5.4,6.8,7.2, 8.3,6.7,7.4,8.9,9.6,11.6, START=c(1920,1),FREQ=1), time =TIMESERIES(NA,-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10, START=c(1920,1),FREQ=1), w2 =TIMESERIES(2.2,2.7,2.9,2.9,3.1,3.2,3.3,3.6,3.7,4,4.2,4.8,5.3,5.6,6,6.1, 7.4,6.7,7.7,7.8,8,8.5, START=c(1920,1),FREQ=1) ) #load model and model data myModel=LOAD_MODEL(modelText=myModelDefinition) myModel=LOAD_MODEL_DATA(myModel,myModelData) #estimate model myModel=ESTIMATE(myModel) #DYNAMIC SIMULATION #simulate model myModel=SIMULATE(myModel ,TSRANGE=c(1923,1,1941,1) ,simConvergence=0.00001 ,simIterLimit=100 ) # #Simulation: 100.00% #...SIMULATE OK #get simulated time series "cn" and "y" TABIT(myModel$simulation$cn) # # Date, Prd., myModel$simulation$cn # # 1923, 1 , 50.338 # 1924, 1 , 55.6994 # 1925, 1 , 56.7111 # ... # 1940, 1 , 66.7799 # 1941, 1 , 75.451 # TABIT(myModel$simulation$y) # # Date, Prd., myModel$simulation$y # # 1923, 1 , 56.0305 # 1924, 1 , 65.8526 # 1925, 1 , 64.265 # ... # 1940, 1 , 76.8049 # 1941, 1 , 93.4459 # #get latest simulation parameters print(myModel$simulation$'__SIM_PARAMETERS__') #$TSRANGE #[1] 1923 1 1941 1 # #$simType #[1] "DYNAMIC" # #$simConvergence #[1] 1e-05 # #$simIterLimit #[1] 100 # #$ZeroErrorAC #[1] FALSE # #...etc etc #RESCHECK SIMULATION #simulate model myModel=SIMULATE(myModel ,simType='RESCHECK' ,TSRANGE=c(1923,1,1941,1) ,simConvergence=0.00001 ,simIterLimit=100 ) # #Simulation: 100.00% #...SIMULATE OK #get consumption simulated vs historical differences TABIT(myModel$simulation$cn-myModel$modelData$cn) # # Date, Prd., myModel$simulation$cn - myModel$modelData$cn # # 1923, 1 , 1.56574 # 1924, 1 , 0.493503 # 1925, 1 , -0.0076079 # ... # 1939, 1 , -0.989201 # 1940, 1 , -0.785077 # 1941, 1 , 2.17345 # #FORECAST GNP in 1942 and 1943 #we need to extend exogenous variables in 1942 and 1943 myModel$modelData$w2 = TSEXTEND(myModel$modelData$w2, UPTO=c(1943,1)) myModel$modelData$t = TSEXTEND(myModel$modelData$t, UPTO=c(1943,1)) myModel$modelData$g = TSEXTEND(myModel$modelData$g, UPTO=c(1943,1)) myModel$modelData$time = TSEXTEND(myModel$modelData$time,UPTO=c(1943,1) ,EXTMODE='LINEAR') #simulate model myModel=SIMULATE(myModel ,simType='FORECAST' ,TSRANGE=c(1940,1,1943,1) ,simConvergence=0.00001 ,simIterLimit=100 ) # #Simulation: 100.00% #...SIMULATE OK #get forecasted GNP TABIT(myModel$simulation$y) # # Date, Prd., myModel$simulation$y # # 1940, 1 , 74.5781 # 1941, 1 , 94.0153 # 1942, 1 , 133.969 # 1943, 1 , 199.913 # #STATIC SIMULATION WITH EXOGENIZATION AND CONSTANT ADJUSTMENTS #define exogenization list #'cn' exogenized in 1923-1925 #'i' exogenized in the whole TSRANGE exogenizeList=list( cn = c(1923,1,1925,1), i = TRUE ) #define add factor list constantAdjList=list( cn = TIMESERIES(1,-1,START=c(1923,1),FREQ='A'), y = TIMESERIES(0.1,-0.1,-0.5,START=c(1926,1),FREQ='A') ) #simulate model myModel=SIMULATE(myModel ,simType='STATIC' ,TSRANGE=c(1923,1,1941,1) ,simConvergence=0.00001 ,simIterLimit=100 ,Exogenize=exogenizeList ,ConstantAdjustment=constantAdjList ) #SIMULATE(): endogenous variable cn has been exogenized from year-period 1923-1 to 1925-1 #SIMULATE(): endogenous variable i has been exogenized from year-period 1923-1 to 1941-1 #SIMULATE(): endogenous variable cn has a constant adjustment from year-period 1923-1 to 1924-1 #SIMULATE(): endogenous variable y has a constant adjustment from year-period 1926-1 to 1928-1 # #Simulation: 100.00% #...SIMULATE OK #VERBOSE SIMULATION myModel=SIMULATE(myModel ,TSRANGE=c(1923,1,1941,1) ,simConvergence=0.00001 ,simIterLimit=100 ,verbose=TRUE ,verboseSincePeriod=19 ,verboseVars=c('cn') ) #CHECK_MODEL_DATA(): warning, there are missing values in series "time". # #Simulation: 5.26% #Prd. 1 convergence reached in iter 34 #Simulation: 10.53% #Prd. 2 convergence reached in iter 42 #Simulation: 15.79% #Prd. 3 convergence reached in iter 40 #Simulation: 21.05% # #... # #Prd. 17 convergence reached in iter 40 #Simulation: 94.74% #Prd. 18 convergence reached in iter 37 #Simulation: 100.00% #VPRE eval #Prd. 19 Iter 0 cn 69.7 #VSIM eval #Prd. 19 Iter 1 cn 71.7197097805143 #VSIM eval #Prd. 19 Iter 2 cn 72.7386970702054 #... etc etc #Prd. 19 Iter 39 cn 75.4510243396176 #VSIM eval #Prd. 19 Iter 40 cn 75.4510298916399 #VPOST eval #Prd. 19 Iter 40 cn 75.4510299 #Prd. 19 convergence reached in iter 40 # #...SIMULATE OK
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.