You are on page 1of 4

CONTINUITY EQUATIONS

Alternative Implementation of the Restricted Vector Autoregressive Model (RVAR) of Continuity Equations for Continuous Assurance
Erik van Kempen, Student Member, IEEE, <erikvankempen@ieee.org> Fontys Hogeschool Financieel Management, Fontys University of Applied Sciences

AbstractContinuous assurance can use continuity equations by modeling predictors for reported quantities in business process steps. One of the best models for these continuity equations is the Restricted Vector Autoregressive Model (RVAR). In this model lags are estimated during the modeling process as opposed to predening xed lags in the basic VAR model. Furthermore, insignicant variables are excluded from the nal model to minimize the overall posterior R2 of the model. In previous studies a model was implemented in the commercially available SAS environment for testing purposes. In this paper an alternative implementation is provided in the freely available and open source R environment. Index TermsContinuous assurance, continuity equations, RVAR, Restricted Vector Autoregressive model, data analysis, R, implementation.

I. I NTRODUCTION ONTINUITY EQUATIONS are a fundamental part of classical physics, but the application of these equations in the eld of nance has not been explored in detail. In their paper on continuity equations Kogan et al. have provided the theory and a number of solutions for applying these equations as a tool in the continuous assurance domain. [1] One of these solutions is the Subset Vector Autoregressive Model (Subset VAR). This model was implemented in the SAS language. In this paper an alternative implementation in R, the free and open source programming language, is presented.

last two decades. Only Dzeng [3] and Kogan et al. [1] have also considered the VAR model in their papers. In most businesses the ow of goods is the most important basis for revenue recognition. As such, it can be used to provide evidence for the completeness, timeliness and accuracy of the reported revenue. If the continuity equations hold for a certain business process, one can assert that there are no leakages from the transaction ow, i.e. the integrity of the ow of goods can be asserted. Therefore, continuity equations provide a method to evidence the integrity of the basis for revenue recognition, which makes them a valuable tool in continuous assurance. Continuity equations are based on historical data of quantities in the separate steps of business processes. For example, the sales cycle can be modeled as three separate steps: receiving the order from the customer, shipping goods to the customer and invoicing for the ordered and shipped goods. Of course the quantity of ordered goods today will show up in the invoicing step a certain amount of days later. The daily ow of goods between these steps can be dened as a certain quantity Q and lag between the steps .

A. Base Model In this paper we will focus on the sales cycle, with the three previously dened process steps. The continuity equations for the sales cycle can be represented as Equation 1. In this model ordert is the quantity ordered at time t, the terms are N 1 transition vectors for a multivariate linear model and the M terms are N 1 vectors containing daily aggregates of quantities for the given dimension. ordert = oo M (order) + so M (shipped) + io M (invoiced)

July 26, 2013 II. C ONTINUITY E QUATIONS Continuity equations are a fundamental part of classical physics. These equations describe the transport of a conserved quantity, while simultaneously ensuring conversation of this quantity (like mass and/or energy). Accordingly similar relations can be dened for the transport of quantities within a system in the nancial domain. For example the movement of reported quantities between steps in the key business processes can be described with continuity equations. The term continuity equations was coined by Vasarhelyi and Halper in 1991, when they modeled the ow of billing data at AT&T. [2] Although Vasarhelyi and Halper proposed continuity equations more than 20 years ago, little research has been performed on the application in practice and implementation of a decent continuity equations model. Especially research focusing on the VAR model has been rarely performed in the

invoicedt = oi M (order) + si M (shipped) + ii M (invoiced) (1) Each of these sub-equations models a predictor for the reported quantities in a specic step in the business process. As previously dened, the quantities are related to quantities in the other process steps by a time delay (lag). For example, if orders are shipped in exactly one day, without exception, and invoicing is performed simultaneously with shipping, the

shippedt = os M (order) + ss M (shipped) + is M (invoiced)

CONTINUITY EQUATIONS

resulting predictors can be dened as Equation 2 shippedt = ordert = 1 shippedt+1 + 2 invoicedt+1 1 ordert1 + 1 ordert1 +
Start

invoicedt =

2 invoicedt 2 shippedt

Initial model estimation

Data

(2)
Predefined threshold Exclude parameters with t-statistic below threshold Re-estimate model

In practice most business processes can not be modeled sufciently this simplistically due to varying lags and dependencies between process steps. B. Basic Vector Autoregressive Model In the basic Vector Autoregressive Model (VAR) the model is estimated optimizing for the overall R2 by trying different lags for the process steps. Only the maximally expected lag is provided to the algorithm and it then tries to nd the best tting model by iterating trough all lag possibilities. The exact lags do not have to be known prior to modeling as the best tting lags are determined while modeling. One can easily understand that it is not always trivial to determine lags prior to the modeling process, e.g. lags in the purchasing cycle are highly dependent on the policies and processes at third parties. Therefore, the VAR model can be a powerful tool for modeling continuity equations if lags can not be predened trivially. C. Restricted Vector Autoregressive Model Kogan et al. have shown in their studies that the VAR model did show great results. More importantly, they showed that the Subset VAR or Restricted VAR model resulted in better results. With a MAPE (mean absolute percentage error) of 0.3374 on the test set it outscored even several other models, i.e. SEM, GARCH and LRM. Only the BVAR model performed better if only the MAPE is taken into account, but the BVAR model resulted in a larger standard deviation for the absolute percentage error. The RVAR model was found to be one of the best models for continuity equations. The Restricted Vector Autoregressive Model translates roughly to optimizing for R2 of the predictor by removing insignicant variables from the VAR model. For example, if the mean lag between order and shipping is less than a month; then the shipment shippedt+365 a year after ordering is obviously not signicant and thus excluded from the model. This method iterates the modeling process per equation by removing all variables with t -values below a predened threshold for the t-statistics, as explained in Figure 1. D. Data The proposed base model for the sales cycle is based on three different quantities: the ordered quantity, the quantity of goods sent and the quantity invoiced. These three variables can be provided by most ERP systems on a daily basis. In this implementation data was used from a wholesaler in technical supplies. This company uses an off-the-shelf solution of Microsoft Dynamics AX 2009. The data was extracted from separately generated reports for each of the process steps by merging the columns by date, as presented in Figure 2. The resulting data is exported as a CSV le to be imported by the implementation of the modeling tool in R.

No

Final model

Yes

All t-statistics below threshold?

Fig. 1. RVAR modeling process

SalesOrders PK Date Quantity

Shipments PK Date Quantity PK

Invoices Date Quantity

SalesData PK,FK1,FK2,FK3 Date SO GS IS

Fig. 2. Data Model

III. I MPLEMENTATION A. R Environment The R language has recently gathered signicant attention from the nancial sector. As a language focused on data analysis and statistics, it has the potential to become the language of choice in nance. R has some important advantages compared to the commercially available SAS language. First of all R is an open source language. It is available for free from the R Project website. [4] Furthermore, due to its open source availability, thousands of practitioners and developers have contributed to the base code and packages related to a plethora of elds including nance. Another advantage of R is that it can be installed on almost any target platform. Furthermore, R code can be executed without being invoked by a user, but as an automated script via cron or as an additional stage in an existing ETL procedure. In R a special purpose package is provided to perform vector autoregressive modeling, vars. This package has been developed and pusblished by Bernhard Pfaff and Matthieu Stigle and is available via CRAN. [5][7] The package includes several functions for modeling VARs, testing the VARs and presentation of the results. It supports automated restriction of insignicant variables as an out of the box feature. B. Code The RVAR modeling is implemented in three stages: preprocessing, modeling and prediction. The nal result is pre-

CONTINUITY EQUATIONS

sented in Appendix A. 1) Pre-processing: The data generated by the ERP system is probably not provided in the correct data format, as used by the modeling functions. Therefore, the data has to be preprocessed in order to be used as input for the modeling stage. First the raw data has to be imported. If data is missing for a specic day, e.g. weekends, the date is left out of the reports from the ERP system. The missing dates are added to the data set with quantities zero resulting in a complete time linear data set. This complete data set can be converted to a multiple time series object (mts), which is used in the modeling stage. 2) Modeling: When the vars package is used, a fairly compact piece of code can be used for modeling. Only a few functions from the vars package are used. First a full VAR model is calculated using all variables up to the maximum lags. Trends and constant terms should be excluded from the VAR model. In the nal step of the modeling stage the restrict function of the vars package is used to automate the exclusion process for insignicant variables. The result is a RVAR model using only correlated variables. 3) Prediction: Finally the RVAR model is used to generate predictions for subsequent time periods with the predict function. These predictions can be used to be compared to actual quantities reported in subsequent time periods for conformance. Deviations above a predened threshold can be agged as exceptions for further review. IV. C ONCLUSION In conclusion it is not at all necessary to use a commercial statistical analysis software to implement a fully functional Restricted Vector Autoregressive model for continuity equations. With a small amount of code, using freely available packages in the open source analysis environment R, a complete solution can be implemented. An additional advantage of this implementation is that testing does not have to be invoked by a user. Testing can be performed automatically when integrated in existing ETL procedures. Most companies already have decent infrastructure for analyses on ERP data, e.g. OLAP cubes. This implementation of the RVAR model can be implemented as an additional stage in the ETL procedure, reporting exceptions automatically to key users. Using continuity equations this way is a big step towards real continuous assurance. V. R ECOMMENDATIONS Previous research on Vector Autoregressive models in the eld of continuous assurance has been limited to two publications only by Kogan et al. and Dzeng. The validity of the Restricted Vector Autoregressive model has to be reconrmed for additional data from other companies in order to nd support for this innovative audit approach.

A PPENDIX A C ODE
# Load the vars library to be able to use # VAR() and the predict() function for # VAR models library("vars") # Some properties of the model have to # be predefined. # model.lag.max: the maximum lag between # all steps. # t.threshold: threshhold value for the # t-statistics in the iterative exlusion # of uncorrelated variables. model.lag.max <- 30 t.threshold <- 2 # Data is read from a CSV file. # The data consists of daily aggregates of # quantities per process step. In this # example there are three process steps: # SO: sales order # GS: goods shipped # IS: invoice sent data.raw <- read.csv( file="Data/Sales-Quantities.csv", sep=";", header=TRUE, colClasses=c(Date, numeric, numeric, numeric) ) # Missing dates in the provided CSV files # are filled by merging an empty data frame # containing all dates with the provided # data. Missing dates between the first and # last day of the provided data is filled # with zeros. data.empty <- data.frame( Date=seq.Date( from=as.Date( head( sort( data.raw[,1] ), 1 ) ), to=as.Date( tail( sort(data.raw[,1]), 1) ), by="1 day") ) data.merged <- merge( data.empty, data.raw, by = c("Date"), all.x=TRUE, all.y=FALSE ) data.merged[ is.na(data.merged) ] <- 0 # A multipe time series object is created # by using the ts function on the merged # data frame. This mts object can be used # by the vars package for modeling. data.tseries <- ts( data = data.merged[, 2:4] ) # The VAR is modeled by using the VAR # function from the vars package based on # the mts object. # A maximum lag can be provided and since # trend and constant terms should not be # included in the model, type is set to none. # In this case the model contains all # of the variables restricted by lag.max # (30) in this example. model.var <- VAR( data.tseries, p=1, lag.max=model.lag.max, type="none" ) # The VAR model is restricted further # to exclude all weakly correlated # variables from the model.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

24 25 26 27 28 29 30 31

32

33 34 35 36 37 38 39

40 41 42 43 44 45 46 47 48 49 50

51 52 53 54

CONTINUITY EQUATIONS

55

model.var.restricted <- restrict( model.var, thresh=t.threshold, method = "ser" ) # Serveral built-in functions can be used # to present the resulting restricted model. summary( model.var.restricted ) plot( model.var.restricted ) # The predict function from the vars package # can be used to predict a number of dates # following the provided data. In this # example 10 days are predicted within a # 95% confidence interval based on the # restricted model. model.predictions <- predict( model.var.restricted, n.ahead = 10, ci = 0.95 ) # Serval built-in functions can be used # to present the resulting predictions. fanchart( model.predictions ) plot( model.predictions )

56 57 58 59 60 61 62 63 64 65 66 67 68

69 70 71 72 73

ACKNOWLEDGMENTS I would like to thank M. Vasarhelyi, A. Kogan, M.G. Alles and J. Wu for their academic research on continuity equations and providing the inspiration to implement the RVAR model in R. Furthermore, I would like to thank R. Schellekens, P. Thijs and N. Weterings for all of their guidance, support and encouragement. R EFERENCES
[1] A. Kogan, M. G. Alles, M. A. Vasarhelyi, and J. Wu, Analytical procedures for continuous data level auditing: Continuity equations, 2010. [2] M. A. Vasarhelyi and F. B. Halper, The continuous audit of online systems, Auditing: A Journal of Practice & Theory, vol. 10, no. 1, pp. 110125, 1991. [3] S. Dzeng, A comparison of analytical procedures expectation models using both aggregate and disaggregate data, Auditing: A Journal of Practice & Theory, vol. 13, no. Fall, pp. 124, 1994. [4] R. Foundation. (2013) The r project for statistical computing. [Online]. Available: http://www.r-project.org/ [5] B. Pfaff, Var, svar and svec models: Implementation within r package vars, Journal of Statistical Software, vol. 27, no. 4, pp. 132, 2008. [6] , vars: Var modelling, R package version, pp. 13, 2008. [7] B. Pfaff and K. im Taunus, Using the vars package, 2007.

Erik van Kempen Erik (born April 25, 1987) is a lecturer in Statistics, ERP and Business Intelligence at Fontys Univerity of Applied Sciences in Eindhoven, The Netherlands. Prior to this lectureship, he obtained a Bachelor of Business Administration in Accountancy. His research interests are in the areas continuous assurance, smart auditing, business intelligence and process mining.

You might also like