2015  2016
2015  2016
Preface
In my previous experience as an engineering student I was constantly trained to optimize
operations in terms of efficiency, while maintaining full effectiveness. I did not matter if it
was optimizing power efficiency in a specifically designed multicore processor or optimizing
efficiency with regards to everyday tasks as making coffee. Tasks or activities of a highly
repetitive nature should never be done manually if automating the process could be done in
less time. A few years later I set foot in an audit firm for the first time. Within a day I found it
hard to believe that my coworkers and I were reconciling bank account statements with G/L
account statements manually, by drawing tick marks if two figures were the same.
As a former engineering student I immediately cringed. Why would I ever manually check
only a sample of figures if the whole population is already digitally available and I could be
checked almost instantly? Unfortunately, the auditing profession is only slowly adopting
more innovative ways of testing. However, I am certain that this process could be catalyzed
by clearly presenting the benefits and drawbacks and by making it easier to understand and
implement innovative testing procedures.
This thesis should provide a better understanding of one of the innovative testing procedures:
continuity equations. Even though I am not able to end the tyranny of the status quo on my
own, this thesis is the first of a series of baby steps in that direction. I think it could be
beneficial for the audit profession to learn about this tool even if it proves to be not extremely
powerful.
First, I would like to thank my supervisor Robin Litjens for his guidance and support during
this epic adventure. Second, I would like to thank my friend Niels Weterings for his
encouragements to keep writing and running the testing scripts, while I was struggling with
time management, which proved to be quintessential to finalize this thesis. Third, I would
like to thank all my friends who helped in any way or form during this period. Last, I am very
grateful for receiving a grant from the Ministry of Education, Culture and Science (Ministerie
van Onderwijs, Cultuur en Wetenschap) to pursue my masters degree in accountancy.
Abstract
Continuous assurance is a methodology to provide assurance on financial data on a near
realtime basis. One of the fundamental elements of continuous assurance is continuous data
auditing in which the integrity of the data provided by the client is tested. Continuity
equations can be used to evidence assertions regarding data integrity. In order to do so, data
is tested by predicting subsequent values based on a fitting model. In total there are five
models: the linear regression model (LRM), the simultaneous equations model (SEM), the
vector autoregressive model (VAR), the restricted vector autoregressive model (RVAR) and
the autoregressive integrated moving average model (ARIMA). All models are compared to
each other by performance regarding Type I and Type II errors.
The standalone VAR model performs best with regards to Type I errors, while the standalone
RVAR model performs best with regards to Type II errors. A cascaded combination model
consisting of both the VAR and RVAR model performs best with regards to both error
measures.
Table of contents
I.
Introduction ..................................................................................................................... 1
II.
III.
Data .................................................................................................................................. 10
Implementation of the models ......................................................................................... 12
Testing of the models ....................................................................................................... 13
IV.
Results ....................................................................................................................... 14
Models.............................................................................................................................. 14
Model tests ....................................................................................................................... 15
Model combination .......................................................................................................... 17
V.
Conclusion................................................................................................................. 21
Type I errors ..................................................................................................................... 21
Type II errors ................................................................................................................... 21
Overall performance ........................................................................................................ 22
VI.
Discussion ................................................................................................................. 23
Data ............................................................................................................. 29
Appendix B.
Implementation in R ................................................................................... 30
I.
Introduction
For the last three decades, auditors and financial professionals have taken interest in the
subject of continuous assurance. However, significant research in this field was initiated only
after a proposed conceptual framework for continuous assurance was published by Vasarhelyi
et al. (2004). In the following years more aspects of continuous assurance were studied, but
most of these studies resulted in the development of new and innovative analysis methods and
further refining the theoretical framework. Comparison of existing analysis models was not yet
in scope. This thesis reports on the comparison of the anomaly detection capability of existing
models of continuity equations.
Conventional audit procedures focus on time consuming manual testing on a fixed number
of randomly selected supporting documents, like invoices or inventory counts. By introducing
more superior audit procedures from the continuous assurance domain, like continuity
equations, substantive testing can, in theory, be performed more efficiently and effectively.
The level of assurance can improve, while time consumption is reduced at the same time.
However, all these audit procedures from the continuous assurance domain are fairly new
and remain mostly untested in the real world. This research intends to investigate one of these
procedures, continuity equations, on a more detailed level. By using continuity equations
business processes could be tested by detecting anomalies in one or more of the steps within
these processes. The audit procedures or manual testing can then be narrowed down to the
detected anomalies.
Efficient performance of anomaly detection could lead to a paradigm shift in the field of
auditing. Instead of sampling evidence randomly from the population, the level of assurance
can be improved by inspecting exceptions only: audit by exception.
The remainder of this thesis consists of five sections. First, in Section II prior literature is
explored and reviewed. Second, Section III covers the research design including a description
of the data used, the mathematical representations of the models and the testing procedures.
Third, in Section IV the results are presented leading to the conclusions in Section V. Section
VI focuses on possible improvements of the research design and interesting subjects for further
research.
II.
As one of the most important developments in business, IT has been adopted in the business
environment to a large extent. Accounting information systems, ERP and other forms of
digitization of business processes have currently become fairly ubiquitous in the field of
accounting. As a result of these developments, IT is better able to support the growing
complexity of businesses and their transactions. On the other hand, these developments also
have implications for internal and external auditing. The growth of information generation and
availability require audits to become more effective and efficient. (Bedard, Deis, Curtis, &
Jenkins, 2008; Bachlechner, Thalmann, & Manhart, 2014) Gathering of audit evidence has
become overly tedious over time and too complex to be done manually. Bachlechner et al.
(2014) argue that the conventional audit procedures are reaching their limits. They propose a
more substantial role of IT and a softwarebased approach to gathering of information and
testing in internal and external auditing as a key solution to the challenges imposed by the
developments in businesses.
Studies have found that involvement of IT and a softwarebased approach in audits leads to
significant improvements in productivity and efficiency (Banker, Chang, & Kao, 2002), while
other studies argue that the use of softwarebased audit automation and decision support
systems lead to higher audit quality (Dowling & Leech, 2007; Manson, McCartney, Sherer, &
Wallace, 1998). These studies all show that audit automation could be beneficial for the audit
process and its quality. A natural implementation of an audit automation program is continuous
assurance. (Alles, Kogan, & Vasarhelyi, 2008)
Continuous assurance
The Canadian Institute of Chartered Accountants (1999) provides a definition of continuous
assurance: Continuous auditing [or continuous assurance] is a methodology that enables
independent auditors to provide written assurance on a subject matter using a series of auditors
reports issued simultaneously with, or a short period of time after, the occurrence of events
underlying the subject matter. The emphasis of continuous assurance is on reducing the lag
between preparing a report and subsequently providing assurance on the matters reported. The
timeliness of audit results is key.
In order to be able to provide assurance on a nearreal time basis, the auditors have to rely
heavily on automated testing. Alles et al. (2006; 2008) and Vasarhelyi et al. (2004; 2010) have
2
2013; Vasarhelyi, Warren, Teeter, & Titera, 2014) The resulting predicted values from these
models are compared with actual values in nearreal time to detect anomalies.
As part of the CDA element of continuous assurance continuity equations can be used as a
framework of predictive models to evidence management assertions focusing on data integrity.
(Chiu, Liu, & Vasarhelyi, 2014)
Continuity equations
Continuity equations have been a fundamental part of classical physics since the eighteenth
century. These equations describe the transport of a quantity, while simultaneously ensuring
conservation of this quantity (like mass and/or energy). Accordingly, similar relations can be
defined for the transport of quantities within a system in the financial domain. The movement
of reported quantities, e.g. ordered kilograms or invoiced units, between steps in the key
business processes can be described with continuity equations.
The term continuity equations, as a tool in the field of audit, was coined in 1991, when
Vasarhelyi and Halper (1991) modeled the flow of billing data at AT&T. Years later Alles et
al. (2008) properly defined continuity equations in the field of continuous assurance. Although
Vasarhelyi and Halper proposed continuity equations more than 20 years ago, little research
has been performed on the application in practice and implementation of a decent continuity
equations model.
In most businesses the flow of goods is the most important basis for revenue recognition. As
such, the flow of goods can be used to provide evidence for the completeness, timeliness and
accuracy of the reported revenue. If the continuity equations hold for a specific business
process, one can assert that there are no leakages from the transaction flow, i.e. the integrity
of the flow of goods can be asserted. Therefore, continuity equations provide a method to
evidence the integrity of the basis for revenue recognition, which makes them a valuable tool
in continuous assurance.
Continuity equations are based on historical data of quantities in the separate steps of business
processes. For example, the sales cycle can be modeled as three separate steps: receiving the
order from the customer, shipping goods to the customer and invoicing for the ordered and
shipped goods. The quantity of ordered goods today will of course show up in the invoicing
step a certain number of days later. The daily flow of goods between these steps can be defined
with a certain quantity and a lag between the steps . This research will focus on the sales
cycle consisting of the three previously defined process steps.
Previous research by Leitch and Chen (2003), Dzeng (1994), Kogan et al. (2010) and Alles et
al. (2005) has resulted in four theoretical models of continuity equations: linear regression
model (LRM), the simultaneous equations model (SEM), vector autoregressive model (VAR)
and the restricted vector autoregressive model (RVAR). Prior research did not include an in
depth review of any other time series analysis models in terms of anomaly detection capability,
but the ARIMA model could provide value to flows of goods which are not optimally modeled
in autoregressive terms only.
Simultaneous Equations Model
Leitch and Chen (2003) proposed a first model of continuity equations in the field of
assurance: the Simultaneous Equations Model (SEM). When applied to the sales cycle, this
model can be represented as Equation (1). Each step in the sales cycle is simultaneously
dependent on historic quantities from the previous step. These historic quantities are
represented with lag in each step. This model simplifies the sales cycle by assuming that there
is only a single fixed lag between each step.
= 1 + 1
= 2 + 2
(1)
The coefficients of this model are estimated by OLS linear regression, optimizing for the
overall 2 of the model.
Leitch and Chen tested the application of SEM on monthly data of financial statements. They
found that SEM outperformed other more conventional models of analytical procedures.
Linear Regression Model
The second model is based on a simple linear regression of the invoiced quantities on the
ordered and shipped quantities as represented in Equation (2).
= 1 + 2 + i
(2)
Again, these historic quantities are represented with lag in each step. This model simplifies
the sales cycle by assuming that there is only a single fixed lag between each step. The
5
coefficients of this model are estimated by OLS linear regression, optimizing for the overall
2 of the model.
Basic Vector Autoregressive model
Alles et al. (2005) introduced another model: the basic Vector Autoregressive (VAR) model.
This model for the sales cycle can be represented as Equation (3). In this model ,
and are respectively the quantities ordered, shipped and invoiced at time
, the terms are 1 transition vectors for a multivariate linear model, the terms are
1 vectors containing daily aggregates of quantities for the given dimension and is the
number of time periods covered in the model.
= () + () + ()
= () + () + ()
(3)
= () + () + ()
Each of these subequations models a predictor for the reported quantities in a specific step
in the business process. As previously defined, the quantities are related to quantities in the
other process steps by a time delay (lag). For example, if orders are shipped in exactly one day,
without exception, and invoicing is performed simultaneously with shipping, the resulting
predictors can be defined as Equation (4).
= 1 +1 + 2 +1
= 1 1 + 2
(4)
= 1 1 + 2
The VAR model is estimated by OLS linear regression, optimizing for the overall 2 by
trying different lags for the process steps. Only the maximum expected lag is provided to the
algorithm, which then tries to find the best fitting model by iterating trough all lag possibilities
up to the maximum expected lag. The exact lags do not have to be known prior to modeling,
as the best fitting lags are determined while modeling.
One can easily understand that it is not always trivial to determine lags prior to the modeling
process, e.g. lags in the purchasing cycle are highly dependent on the policies and processes at
third parties. Therefore, the VAR model can be a powerful tool for modeling continuity
equations when exact lags cannot be predefined easily.
Contrary to the SEM model, the VAR model does not assume that there is a singular fixed
lag between steps. All lags up to a maximum are considered in the model. This can possibly
result in a comprehensive estimated model. Therefore, most VAR models are represented using
matrix notation.
Restricted Vector Autoregressive model
Kogan et al. (2010) have shown in their studies that the VAR model shows outstanding
accuracy. More importantly, they showed that the Restricted VAR (RVAR) model resulted in
better accuracy. With a MAPE (mean absolute percentage error) of 0.3374 on the test set it
outscored even several other models, i.e. SEM and VAR type of models. Only the Bayesian
VAR model performed better when taking only the MAPE into account, but it also resulted in
a larger standard deviation for the absolute percentage error. Therefore, the Bayesian VAR
model is not considered viable for auditing purposes. The RVAR model was found to be one
of the best models for continuity equations.
The RVAR model translates roughly to optimizing for 2 of the predictor by removing
insignificant coefficients from the VAR model. For example, if the mean lag between order
and shipping is less than a month shipment +365 a year after ordering is obviously not
significant and thus excluded from the model. This method iterates the modeling process per
equation by removing all coefficients with statistics below a predefined threshold, as
explained in Figure 1. Kogan et al. (2010) find that a threshold of = 0.15 and its
corresponding > 1.036 yields the model with the best prediction accuracy.
Data
Final model
Threshold
Yes
Start
Initial model
estimation
Exclude parameters
with tstatistic
below threshold
Reestimate model
All tstatistics
above threshold?
No
Figure 1. RVAR modeling process. The initial VAR model is restricted by excluding parameters with a tstatistic below a predefined threshold. The model is reestimated followed by the next exclusion iteration, until
all parameters satisfy the tstatistic requirement.
The RVAR model usually results in less extensive and more accurate estimated models due
to the restriction to significant terms only.
Autoregressive Integrated Moving Average
The autoregressive integrated moving average (ARIMA) model differs from the previous
models by also including nonautoregressive terms. The ARIMA model accounts for both
autoregressive and moving average terms in the model. As in the VAR and RVAR model the
autoregressive terms account for the possibility that a value at time is related to its prior
values or lagged terms. The moving average terms account for the possibility that a value at
time is related to its residuals from prior periods. These terms seem plausible to include in a
model, since the actual residuals are also part of the flow of goods and are not accounted for
in the prior estimated lagged values. The ARIMA model combines both the autoregressive
and moving average terms in one model, as specified in the generic model definition in
Equation (5).
= + + + ,
=1
=1
= + + + ,
=1
=1
(5)
= + + + ,
=1
=1
The model requires the data to be stationary, i.e. its mean and variance do not vary in time.
However, our data probably incorporates some sort of trend. Therefore, we need to use a
differenced variables approach to model the variables, as generically defined in Equation (6).
= 1
(6)
prioritization (Cao, Chychyla, & Stewart, 2015) in order to reduce the noise in the detected
anomalies.
Research question
In total four different models of continuity equations are used in the field of continuous
assurance. Auditors rely on the accuracy and anomaly detection capability of these models to
provide assurance on the data. Kogan et al. (2014) have performed a first performance
comparison of the RVAR, LRM and SEM model on actual data from a procurement cycle in a
large medical supplier. They found that the models overall performed equally well. The SEM
model appeared to be superior in terms of false negative error rates, while the RVAR model
appeared to be superior in terms of false positive error rates. Therefore, Kogan et al. also
proposed to combine the models to result in even better anomaly detection. The somewhat
equal performance of the individual models might be caused by the unpredictability of the
procurement cycle, because not all lag terms are controlled by the firm. However, sales cycles
might be more predictable, because all the lag terms are controlled by the firm. Comparison of
the models in this cycle might yield different results, because oversimplification issues in the
LRM and SEM models might not be problematic in a more predictable cycle. This leads to my
research question:
Which of the existing models of continuity equations in continuous auditing has the best
anomaly detection capability?
III.
Research Design
Data
The proposed base model for the sales cycle is based on three different quantities: the ordered
quantity, the quantity of goods shipped and the quantity invoiced. These three variables can be
provided by most ERP systems on a daily basis.
Data is provided by a wholesaler in technical supplies. This company uses an offtheshelf
solution of Microsoft Dynamics AX 2009. The data was extracted from separately generated
reports containing transaction quantities for each of the process steps by merging the columns
by date, as presented in Figure 2.
SalesOrders
PK
Date
Quantity
Shipments
PK
Invoices
Date
PK
Quantity
Date
Quantity
SalesData
PK,FK1,FK2,FK3 Date
SO
GS
IS
Figure 2. Data model consisting of daily aggregates for three different stages in the sales cycle: ordered quantity
(SO), quantity of goods shipped to customer (GS) and quantity invoiced (IS) combined by date via a SQL join
clause. The date serves as the primary and foreign keys of the data source involved.
The data reflects actual daytoday transaction quantities of February 2007 up to November
2007, excluding Sundays and holidays during which the company was closed for business.
Saturdays are still included because, sometimes, high priority orders are shipped on Saturdays.
Two data sets are provided: data from a Dutch subsidiary and data from a German subsidiary.
The resulting data is exported as a CSV file to be imported by the model implementations in
R. The CSV file consists of four data fields, i.e. date, the quantities ordered, quantities shipped
and quantities invoiced. More detailed information about the data can be found in Appendix
A.
10
Panel A
Variable
Sales orders (SO)
Goods shipped (GS)
Invoices sent (IS)
n
264
264
264
Mean
Std.Dev. 25th Pct. Median 75th Pct.
66,845
60,676
38,384
62,548
83,122
62,068
46,099
42,295
63,326
40,865
60,211
47,237
78,393
60,745
81,303
Panel B
Pearson correlations




1.000


0.600* 0.588*
1.000 0.960*
1.000
Figure 3. Plot of daily aggregates for three different stages in the sales cycle: ordered quantity (SO), quantity
of goods shipped to customer (GS) and quantity invoiced (IS) as provided in the data set.
11
Table 1 and Figure 3 presents descriptive statistics about the three quantity fields in the data
set of the Dutch subsidiary. The Pearson correlations show that the GS and IS variables are
strongly related. This is fully in line with the notion that invoices are generated at the same
time as the goods are shipped most of the time. Furthermore, the charts clearly show less
activity on Saturdays compared to weekdays. On Saturdays only priority orders and overthecounter sales are handled.
The data is split into two separate parts, which account for roughly two thirds and one third
of the observations included in the data set respectively. The first part will be used as a training
set to estimate the model parameters for all models. The second part is used as a test set. After
estimation, the models will be tested by generating predictions for the test set.
Implementation of the models
The models will be implemented in R, a widely accepted language for statistical processing
and data analytics. A rudimentary implementation of these models is already available in the
form of R packages. All models are implemented in four stages: data collection, preprocessing,
modeling and prediction.
The LRM and SEM models implementation is based on the builtin lm function and the
systemfit package, which has been developed and published by Arne Henningsen and Jeff D.
Hamann and is available via CRAN. (Henningsen & Hamann, 2007)
The VAR and RVAR model implementation code is centered around the vars package, which
has been developed and published by Bernhard Pfaff and Matthieu Stigle and is available via
CRAN. (Pfaff & Im Taunus, 2007; Pfaff, 2008; Pfaff, 2008) The package includes several
functions for modeling VARs, testing the VARs and presenting the results.
The ARIMA model is implemented by using the auto.arima function as provided by the
forecast package, as developed by Rob J. Hyndman. The package is made available via CRAN
and GitHub. (Hyndman, 2015; Hyndman & Khandakar, 2008) It includes functions for
modeling and analyzing univariate time series model forecasts. The auto.arima function is used
to automatically select the optimal parameters for number of autoregressive terms, number of
moving average terms and differencing order to model the best fitting ARIMA model.
The modeling implementation in R can be found in Appendix B.
12
+ ,1
2
(7)
The mean number of Type I and Type II errors found serves as the test statistic for comparison
purposes. Robustness of the results is further tested by using a computersimulated test set with
preset levels of noise and randomly injected anomalies.
The test procedure, as implemented in R, can be found in Appendix B.
13
IV.
Results
Models
The five models were trained with the training subset of the full data sets. The two data sets
with realistic data from the Dutch and German subsidiaries were split into a training set and a
validation set prior to the testing procedures. The simulated data set was split using the same
method. The training set is used to train the five models and obtain a definitive model definition
with fixed coefficients.
Panel A
Dutch subsidiary
Adjusted R2
LRM
VAR
RVAR
Panel B
Lag
0
1
Dutch subsidiary
SO
GS
IS
0.9337
0.7671
0.7648
0.7989
0.8117
0.8112
0.0308
0.8361
0.8346
10
X
X
X
German subsidiary
X
SO
GS
X
IS
Lag 15
16
Dutch subsidiary
SO
GS
IS
German subsidiary
18
X
X
12
13
14
X
X
17
11
19
20
21
22
23
X
X
24
25
X
X
X
X
German subsidiary
SO
GS
IS
Table 2. Panel A: adjusted R2 model characteristics, as based on the training subset; Panel B: lagged
coefficients for the three steps in the sales cycle which are left in the RVAR model after the exclusion of
insufficiently significant lagged terms.
Training resulted in the model characteristics as shown in Table 2. The adjusted R2 of the
models could be an initial indication of how well the model fits the training data. The LRM
14
model fits the training subset of the Dutch subsidiary very well with an adjusted R2 of 0.9337,
while the VAR and RVAR model resulted in a lower adjusted R2 of 0.7671 and 0.7648
respectively. When the training subset of the German subsidiary was used to train the models,
a different effect was shown. The LRM model fit the data the least with an adjusted R 2 of
0.7989, while the VAR and RVAR model fit the data slightly better with an adjusted R 2 of
0.8117 and 0.8112 respectively. The simulated test set showed another different set of results.
The LRM model fit the data worst with an adjusted R2 of 0.0308, while the VAR and RVAR
models both fit significantly better with adjusted R2 of 0.8361 and 0.8346 respectively. This
might be caused by the intentional nonlinearity of the simulated data set.
The adjusted R2 of all RVAR models was slightly lower than its VAR base model, even
though nonsignificant coefficients were eliminated from the final RVAR model definition.
This might be the result of overfitting effects of the VAR model. As shown in Table 2, most
coefficients were eliminated in the final RVAR model definition. The coefficients around lags
of 4, 7 and 14 days could indicate intuitive default delays between ordering and shipping.
Model tests
As shown in Table 3 all four models were tested on the two data sets with realistic data. The
validation subset of data from the Dutch subsidiary consisted of 103 samples with 10 randomly
injected anomalies per repetition. Prior to testing and injection of anomalies the LRM and SEM
model identified 40 and 43 anomalies in the validation subset, while the VAR model identified
18 anomalies and the RVAR model found 60 samples to be erroneous. The ARIMA model
found just one anomaly prior to injection. These pretest anomalies might be the result of the
nonsmooth characteristics of the flow of goods on these sample days. Furthermore, the
realistic data was not audited prior to testing, so the data might contain actual anomalies, which
are identified with this test. Only the first explanation is confirmed to be apparent in the data
set, as can be seen in Figure 3. These pretest anomalies have to be taken into consideration
when interpreting the results. The expected average of Type I errors, as potentially identified
by the testing procedure, can be defined as in Equation (8). This equation describes the
expected average of Type I errors as the number of true positive injections in the pretest
anomaly set, which are correctly identified as such, subtracted from the number of pretest
anomalies.
e  = e  e e e 
(8)
15
All four models caused both Type I and Type II errors. The LRM and SEM model both
perform likewise on average with around 36 to 39 Type I errors, while the VAR model
performed best in this regard with only 16.25 Type I errors on average. The RVAR model
performed worse with triple the amount of Type I errors compared to its base VAR model. The
ARIMA model test resulted in 0.9 Type I errors on average. With regards to Type II errors the
LRM and SEM models also performed similarly with 2.72 false negatives identified on
average. The RVAR model performed best with only 1.27 false negatives identified. In total
the VAR model performed best in terms of specificity and positive predictive value, while the
RVAR model was superior in terms of sensitivity and negative predictive value. The overall
positive predictive value of the models turns out to be fairly low with a maximum value of 0.38
for the VAR model and 0.92 for the ARIMA model.
Testing based on data from the German subsidiary showed similar, but slightly worse, results.
The LRM and SEM models performed similarly with 46.09 and 52.43 Type I errors
respectively. The ARIMA model test did not result in a Type I error. The RVAR model
performed worst with 62.36 Type I errors, while its base VAR model performed best with
33.44 false positives identified. However, the RVAR model performed best regarding Type II
errors with 1.35 Type II errors. The other models performed worse with around 2.7 false
negatives identified during testing, during which the ARIMA model performed worst with
almost all injections remaining falsely unidentified. In terms of sensitivity the RVAR model
showed to perform best with 0.8654, while the other models all performed worse with
sensitivity at around 0.74. As with the Dutch subsidiary the specificity was best when using
the VAR model with a specificity of 0.7506. The large number of Type I errors resulted in a
fairly low positive predictive value of 0.2302 using the VAR model or 1.00 using the ARIMA
model. Negative predictive value was similarly high as with data from the Dutch subsidiary.
The ARIMA model test results appear to be valuable in terms of Type I errors. The model
misidentifies almost no true negatives. However, in terms of Type II errors this model performs
worst with almost all true positives misidentified as negatives. These results might be caused
by the excessively wide confidence intervals, which are used to identify anomalies. The wide
confidence intervals accept a large rang of possible values as valid actual values.
16
Model combination
The results show it might be interesting to combine the positive detection characteristics of
the VAR and RVAR model and possibly the ARIMA and RVAR model. Therefore, a combined
test procedure was additionally implemented to exploit positive detection characteristics of
both models in one procedure. The procedure was implemented as shown in Figure 4. In the
first combination test procedure the ARIMA and RVAR models are cascaded to determine if
an observation should be flagged as an anomaly. In the second combination test procedure the
VAR and RVAR models are cascaded exactly the same. If the RVAR model suspects the
observation to be an anomaly, it is subsequently tested by the ARIMA model or VAR model.
If this subsequent model confirms the suspected observation to be erroneous, the observation
is definitively flagged as an anomaly. If the RVAR model initially does not flag the
observation, it is definitively not flagged.
FALSE
Start
Start
Start
Start
RVAR
RVAR
condition
condition test
test
RVAR
RVAR
condition
condition test
test
TRUE
FALSE
ARIMA
ARIMA
condition
condition test
test
FALSE
VAR
VAR
condition
condition test
test
FALSE
TRUE
TRUE
Observation
Observation isis
NOT
NOT FLAGGED
FLAGGED
Observation
Observation isis
FLAGGED
FLAGGED
(a)
TRUE
Observation
Observation isis
NOT
NOT FLAGGED
FLAGGED
Observation
Observation isis
FLAGGED
FLAGGED
(b)
Figure 4. (a) Combined test procedure in which the ARIMA and RVAR models are cascaded. First the RVAR
model is used to test the observation, if it is suspected to be an anomaly the ARIMA model is used to confirm
the anomaly. If the anomaly is confirmed by the ARIMA model, the observation is flagged. Otherwise the
observation is not flagged. (b) Combined test procedure in which the VAR and RVAR models are cascaded.
First the RVAR model is used to test the observation, if it is suspected to be an anomaly the VAR model is
used to confirm the anomaly. If the anomaly is confirmed by the VAR model, the observation is flagged.
Otherwise the observation is not flagged.
17
The combined model results are also shown in Table 3. The combination of the ARIMA and
RVAR model theoretically performs best, but might not be usable due to the confidence
intervals being too wide, as stated previously. The combination of the VAR and RVAR model
incorporates the positive characteristics of both models, resulting in a low Type I error rate
combined with a low Type II error rate. The results show that the combined model tests result
in the low amount of Type II errors caused by the RVAR model and the low amount of Type I
errors caused by the VAR model. Using the combined test procedure of the RVAR and VAR
models results in the highest sensitivity, specificity, positive and negative predictive value in
both cases of using the Dutch or German data.
18
Dutch subsidiary
German subsidiary
LRM
SEM
VAR
RVAR
ARIMA
Combi
ARIMAi
Combi
VARii
LRM
SEM
VAR
RVAR
ARIMA
Combi
ARIMAi
Combi
VARii
Test characteristics
Repetitions
N
Injected anomalies
Pretest anomalies
100,000
103
10
40
100,000
103
10
43
100,000
103
10
18
100,000
103
10
60
100,000
103
10
1
100,000
103
10
1
100,000
103
10
18
100,000
104
10
51
100,000
104
10
58
100,000
104
10
37
100,000
104
10
69
100,000
104
10

100,000
104
10

100,000
104
10
35
Type I errors
Average
Minimum
Maximum
Standard deviation
36.12
31
40
1.4728
38.83
33
43
1.4917
16.25
10
18
1.1477
54.19
50
60
1.4892
0.90
1
0.2960
0.90
1
0.2960
16.25
10
18
1.1477
46.09
41
51
1.5048
52.43
48
58
1.5019
33.44
28
37
1.4482
62.36
59
68
1.4321
31.64
26
35
1.4293
Type II errors
Average
Minimum
Maximum
Standard deviation
2.72
9
1.3444
2.72
9
1.3444
2.81
9
1.3615
1.27
7
1.0063
9.80
8.0
10
0.4151
1.27
7
1.0063
1.27
7
1.0029
2.79
9
1.3514
2.79
9
1.3514
2.41
8
1.2877
1.35
7
1.0284
8.75
3.0
10
0.9985
1.35
7
1.0284
1.35
7
1.0284
Performance
measures
0.7283
0.7283
0.7186
0.8735
0.0195
0.8735
0.8735
0.7210
0.7210
0.7592
0.8654
0.1250
0.8654
0.8654
Sensitivity
0.7192
0.6900
0.9328
0.5249
1.0978
1.0978
0.9328
0.6161
0.5486
0.7506
0.4430
1.1064
1.1064
0.7698
Specificity
i
0.2168
0.2048
0.3809
0.1558
0.9172
0.9172
0.3809
0.1783
0.1602
0.2302
0.1382
1.0000
1.0000
0.2402
PPV
ii
0.9716
0.9716
0.9706
0.9866
0.9046
0.9866
0.9866
0.9712
0.9712
0.9750
0.9859
0.9148
0.9859
0.9859
NPV
i
: cascaded test procedure using the RVAR and ARIMA model; ii: cascaded test procedure using the RVAR and VAR model; iii: positive predictive value; iv: negative predictive value
Table 3. Test results of the four models on two data sets containing realistic data, from the Dutch subsidiary and German subsidiary
19
SEM
VAR
RVAR
ARIMA
Combi
ARIMAi
Combi
VARii
Test characteristics
Repetitions
N
Injected anomalies
Pretest anomalies
100,000
801
10
615
100,000
801
10
636
100,000
801
10
375
100,000
801
10
425
100,000
801
10
33
100,000
801
10
33
100,000
801
10
364
Type I errors
Average
Minimum
Maximum
Standard deviation
607.33
605
613
1.3293
628.06
626
634
1.2725
370.32
365
375
1.5717
419.69
415
425
1.5696
32.59
28
33
0.6264
32.59
28
33
0.6264
359.45
354
364
1.5689
Type II errors
Average
Minimum
Maximum
Standard deviation
0.01
1
0.1112
1.19
7
1.0207
0.96
6
0.9281
7.38
1.0
10
1.3841
0.96
6
0.9281
0.96
6
0.9281
Performance measures
0.9987
1.0000
0.8811
0.9037
0.2624
0.9037
0.9037
Sensitivity
0.2448
0.2186
0.5445
0.4821
0.9714
0.9714
0.5582
Specificity
0.0162
0.0157
0.0263
0.0233
0.2348
0.2348
0.0271
PPViii
1.0000
1.0000
0.9985
0.9988
0.9908
0.9988
0.9988
NPViv
i
: cascaded test procedure using the RVAR and ARIMA model; ii: cascaded test procedure using the RVAR and
VAR model; iii: positive predictive value; iv: negative predictive value
Table 4. Test results of the four models on two data sets containing a computer generated simulation test set.
20
V.
Conclusion
As shown in the previous section all five models performed quite similarly in terms of order
of magnitude of the two error types. However, differences do exist between model
performance. As stated in the research design the research question to be answered is: which
of the existing models of continuity equations in continuous auditing has the best anomaly
detection capability? To fully answer this question, both Type I and Type II error performances
have to be taken into account.
Since the ARIMA model and the ARIMARVAR cascaded model appear to be weak
indicators of true anomalies due to its extremely wide testing intervals, i.e. almost any
observation will fall within the testing interval and thus remain unflagged, these models are not
considered to be viable models to detect anomalies.
Type I errors
Type I errors, or false positives, are an important aspect in assurance with respect to audit
efficiency. Identified anomalies need to be investigated further and could bring up the need for
additional assurance activities on the cycle or account under investigation. Falsely identified
anomalies lead to loss of resources if further investigations prove to be unnecessary afterwards.
In terms of performance with regard to Type I errors, the VAR model performed best with
the expected average number of Type I errors as described with Equation (8).
Type II errors
Type II errors, or false negatives, are an important aspect in assurance with respect to audit
effectiveness and less with respect to audit efficiency. Anomalies that are not detected during
the testing procedures could lead to a false sense of certainty with regards to audited object.
These kinds of risks are by definition part of the audit risk model as the detection risk element.
The detection risk element is the risk that an auditor fails to detect a material misstatement.
Auditors use the audit risk model to manage the overall risk of an audit engagement. If one of
the risk elements imposes an impermissible risk level, additional assurance activities have to
be performed. Minimization of Type II errors can be considered to be indispensable for the
audit risk.
21
In terms of Type II errors, the RVAR model performs best with on average less than 1.4 false
negatives identified in the worst performing test set (German subsidiary). The number of Type
II errors during this test amount to approximately 13.7% of injected anomalies in this set, while
it amounts to approximately 12.4% of injected anomalies in data of the Dutch subsidiary.
Overall performance
The model with the overall best performance is the cascaded combination of the VAR and
RVAR models. The VAR model performs best with regards to Type I errors and worst with
regards to Type II errors, while the RVAR model performs best with regards to Type II errors
but worst with regards to Type I errors. Choosing one solitary test model from these two models
would result in optimal performance with regards to one error type, while making major and
possibly unacceptable concessions with regards to the other error type. In order to optimize
performance and eliminate the need to choose a prevalent error type a combination model was
tested. The cascaded combination model takes the best properties of both models and combines
these into a new model, which performs best with regards to both Type I and Type II errors.
Therefore, the VARRVAR combination model has the best anomaly detection capability of
existing continuity equations.
22
VI.
Discussion
range of the data set and thus result in close to 100% Type II errors, since all reported/measured
quantities are within the prediction interval.
If the auditor refrains from the interpretation fallacy the use of the narrower confidence
interval, instead of the more statistically correct prediction interval, would give a viable set of
boundaries.
Practical application
The use of innovative techniques by auditors should ultimately result in improvements
regarding audit efficiency and/or effectiveness. With regards to effectiveness, the level of
assurance could increase, while efficiency improves when conventional audit procedures can
be executed faster or the required level of assurance is reached with less effort. Without
improvement in one or both of these aspects, innovative techniques should and will not be used
by auditors.
Type I errors
Type I errors, or falsely identified anomalies, potentially cause a decrease in audit efficiency.
The detection of anomalies implies that auditor should perform additional testing to the
detected anomalies, under the assumption that the anomalies were correctly identified. Any
innovative audit technique should aim to decrease the likelihood of falsely identifying
anomalies.
The results in Table 3 shows that the average number of Type I errors using the best possible
model, counts for over 15% of the total tested sample. If this implies that auditors should
perform further testing on 15% of the total population, even the best possible model would be
unacceptable. The sheer number of false positives are probably not acceptable to auditors, even
if the initial testing with the proposed model would incur zero cost.
However, based on these results there are no valid conclusions to be drawn regarding
practicality of this model. The number of Type I errors is highly influenced by the number of
identified pretest anomalies and reflect the number of Type I errors which are to be expected,
based on Equation (8). It would be naive to assume that all the pretest anomalies are basically
Type I errors. The data was not audited integrally and in detail to ensure that no errors are
apparent in the data set.
24
For the purpose of this study practicality was not in scope. However, it should be noted that
practicality is an important aspect with respect to adoption of innovative audit techniques.
Type II errors
The number of Type II errors identified in this study for this data set is relatively high. In the
Dutch data set, using the best possible model, 12.4% of injected anomalies remained
incorrectly unidentified as an anomaly. In the simulated test set this figure drops to 4.1% using
the best possible model. For any model to be actually used in practice the detection risk should
be as low as possible, but at least as low as the desired detection risk as set by the auditor prior
to the engagement. The results on the realistic data do indicate that the detection risk is possibly
not decreased to a level below the desired preset risk level, when the current models of
continuity equations are used. This might impact the adoption of this tool by auditors.
Possible improvements in the proposed model
The models in this study are based on roughly 65% of a years worth of data. The remaining
part, or 35% of a years worth of data, is used for testing. In reality auditors might use the full
previous years worth of data to train the models and then test a single sample at a time. When
this sample is not flagged as an anomaly, the model is retrained on the historical data, now
including the single tested sample. That model is then used to test for the next single sample.
This method of retraining the model might increase model fitness and thus perform better.
Further research should be performed to test this hypothesis.
25
REFERENCES
(CICA), C. I. (1999). Continuous Auditing. Continuous Auditing. Toronto, ON, Canada.
Alles, M. G., Brennan, G., Kogan, A., & Vasarhelyi, M. A. (2006). Continuous monitoring of
business process controls: A pilot implementation of a continuous auditing system at
Siemens. International Journal of Accounting Information Systems, 7(2), 137161.
Alles, M. G., Kogan, A., & Vasarhelyi, M. A. (2002). Feasibility and economics of
continuous assurance. Auditing: A Journal of Practice & Theory, 21(1), 125138.
Alles, M. G., Kogan, A., & Vasarhelyi, M. A. (2008). Putting Continuous Auditing Theory
into Practice: Lessons from Two Pilot Implementations. Journal of Information
Systems, 22(2), 195214.
Alles, M., Kogan, A., & Vasarhelyi, M. (2008). Audit automation for implementing
continuous auditing: Principles and problems. Ninth International Research
Symposium on Accounting Information Systems(August 2015), 124.
Alles, M., Kogan, A., & Vasarhelyi, M. (2008). Putting continuous auditing theory into
practice: Lessons from two pilot implementations. Journal of Information Systems,
22(2), 195214.
Alles, M., Kogan, A., Vasarhelyi, M., & Wu, J. (2005). Continuity Equations in Continuous
Auditing: Detecting Anomalies in Business Processes.
Bachlechner, D., Thalmann, S., & Manhart, M. (2014). Auditing service providers:
Supporting auditors in crossorganizational settings. Managerial Auditing Journal,
29(4), 286303.
Banker, J. D., Chang, H., & Kao, Y.c. (2002). Impact of Information Technology on Public
Accounting Firm Productivity. Journal of Information Systems, 16(2), 209222.
Bedard, J. C., Deis, D. R., Curtis, M. B., & Jenkins, J. G. (2008). Risk monitoring and control
in audit firms: A research synthesis. Auditing: A Journal of Practice & Theory, 27(1),
187218.
Cao, M., Chychyla, R., & Stewart, T. (2015). Big Data Analytics in Financial Statement
Audits. Accounting Horizons, 29(2), 110.
Chiu, V., Liu, Q., & Vasarhelyi, M. a. (2014, 12). The development and intellectual structure
of continuous auditing research. Journal of Accounting Literature, 33(12), 3757.
26
Dowling, C., & Leech, S. (2007). Audit support systems and decision aids: Current practice
and opportunities for future research. International Journal of Accounting Information
Systems, 8(2), 92116.
Dzeng, S. (1994). A comparison of analytical procedures expectation models using both
aggregate and disaggregate data. Auditing: A Journal of Practice & Theory, 13(3), 124.
Hardy, C. A. (2014). The messy matters of continuous assurance: Preliminary findings from
six Australian case studies. Journal of Information Systems, 28(2), 140428102139008.
Henningsen, A., & Hamann, J. D. (2007). systemfit: A Package for Estimating Systems of
Simultaneous Equations in R. Journal of Statistical Software, 23(4), 140. Retrieved
from http://www.jstatsoft.org/v23/i04/
Hyndman, R. J. (2015). forecast: Forecasting functions for time series and. Retrieved from
http://github.com/robjhyndman/forecast
Hyndman, R. J., & Khandakar, Y. (2008). Automatic Time Series Forecasting: The forecast
Package for R. Journal of Statistical Software, 27(3).
Kogan, A., Alles, M. G., Vasarhelyi, M. A., & Wu, J. (2010). Analytical Procedures for
Continuous Data Level Auditing: Continuity Equations.
Kogan, A., Alles, M. G., Vasarhelyi, M. A., & Wu, J. (2014). Design and Evaluation of a
Continuous Data Level Auditing System. Auditing: A Journal of Practice & Theory,
33(4), 221245.
Kogan, A., Alles, M., Vasarhelyi, M., & Wu, J. (2014). Design and Evaluation of a
Continuous Data Level Auditing System. Auditing: A Journal of Practice & Theory,
33(4), 221245. doi: 10.2308/ajpt50844
Krahel, J. P., & Vasarhelyi, M. a. (2014). AIS as a Facilitator of Accounting Change:
Technology, Practice, and Education. Journal of Information Systems, 28(2), 115.
Kuenkaikaew, S., & Vasarhelyi, M. a. (2013). The predictive audit framework. International
Journal of Digital Accounting Research, 13(April), 3771.
Leitch, R. A., & Chen, Y. (2003). The effectiveness of expectation models in recognizing
error patterns and generating and eliminating hypotheses while conducting analytical
procedures. Auditing: A Journal of Practice & Theory, 22(2), 147170.
Malaescu, I., & Sutton. (2015). The Reliance of External Auditors on Internal Audit's Use of
Continuous Audit. Journal of Information Systems, 29(1), 11481159.
27
Manson, S., McCartney, S., Sherer, M., & Wallace, W. a. (1998). Audit Automation in the
UK and the US: A Comparative Study. International Journal of Auditing, 2(3), 233246.
Pfaff, B. (2008). VAR, SVAR and SVEC models: Implementation within R package vars.
Journal of Statistical Software, 27(4), 132.
Pfaff, B. (2008). vars: VAR Modelling. R package version, 13.
Pfaff, B., & Im Taunus, K. (2007). Using the vars package.
Rezaee, Z., & Sharbatoghlie, A. (2002). Continuous auditing: Building automated auditing
capability. Auditing: A Journal of Practice & Theory, 21(1).
Vasarhelyi, M. A., & Halper, F. B. (1991). The continuous audit of online systems. Auditing:
A Journal of Practice & Theory, 10(1), 110125.
Vasarhelyi, M. A., & Romero, S. (2014). Technology in audit engagements: a case study.
Managerial Auditing Journal, 29(4), 350365.
Vasarhelyi, M. A., Alles, M. G., & Kogan, A. (2004). Principles of analytic monitoring for
continuous assurance. Journal of Emerging Technologies in Accounting, 1(1), 121.
Vasarhelyi, M. A., Alles, M., & Williams, K. T. (2010). Continuous assurance for the now
economy. Institute of Chartered Accountants in Australia Sydney, Australia.
Vasarhelyi, M. A., Warren, D., Teeter, R. A., & Titera, W. R. (2014). Embracing the
Automated Audit. Journal of Accountancy(April).
28
Appendix A.
Data
The data is provided by a Dutch wholesaler in technical supplies and contains daily
aggregates of the three separate steps in the sales cycle.
SalesOrders
PK
Date
Quantity
Shipments
PK
Invoices
Date
PK
Quantity
Date
Quantity
SalesData
PK,FK1,FK2,FK3 Date
SO
GS
IS
Figure 2. Data model consisting of daily aggregates for three different stages in the sales cycle: ordered quantity
(SO), quantity of goods shipped to customer (GS) and quantity invoiced (IS) combined by date via a SQL join
clause. The date serves as the primary and foreign keys of the data source involved.
29
Appendix B.
Implementation in R
The code used to generate the simulation test set, modelling, testing and reporting are
presented in this appendix. However, they are also available via GitHub1 and contained on an
accompanying CDrom.
Simulation test set generator
Model implementation
30
31
Test procedure
32
33
Report generator
34
35
36