You are on page 1of 9

SPE 167437

Intelligent Data Quality Control of Real-time Rig Data


A. Arnaout, P. Zoellner, TDE Thonhauser Data Engineering GmbH; N. Johnstone, TDE Norge AS; G.
Thonhauser, Montanuniversität Leoben

Copyright 2013, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Middle East Intelligent Energy Conference and Exhibition held in Dubai, UAE, 28–30 October 2013.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not
been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum
Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited.
Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE
copyright.

Abstract
There is an increasing requirement for centralized analysis of drilling data to improve performance. Typically, data from
service companies is transmitted from the rig to the clients Real Time Operating Centre (RTOC). A major problem is the
quality of this data, which can lead to incorrect analysis and a breakdown in trust between the rig and the RTOC.
Currently real time data is, at best, only checked for completeness.
In this work, we present a model for data quality control in real-time operating centres. The following points are
considered in our suggested model:
 Monitoring the quality of real time data for:
o Completeness
o Continuity
o Timeliness
o Validity
o Accuracy
o Consistency
o Integrity
 Alarming of all unexpected data;
 Generating daily data quality reports.

The suggested data quality control model has different groups of key performance indicators, calculated in real time from
the streamed data. The model then takes the values of these performance indicators as input, and evaluates different
properties of data quality as the output. Additionally, a new group of intelligent model-based key performance indicators
is suggested. These indicators give the possibility to monitor actual quality and measure it against the expected one. The
result is a measurement of Quality of Service (QoS) supplied by data provider at the rig.

Introduction
Nowadays there is a huge amount of sensor data acquired at the rigsite which is used to monitor the drilling process. This
data carries not only information on the measured physical phenomena but also on the drilling process in general. It is
preferred to refer to these as “Data Channels” as there are also values received from data providers which are calculated
from sensor readings, i.e. the measured depth of the hole or the flow rate of the mud pumps.
Usually, the data channels are acquired and stored locally at the rigsite and then transferred using various methods such
as WITSML, WITS or OPC over TCP/IP protocols or stored on local storage disks and transferred physically.

Usually, the received data channels may contain problems coming from different sources such as: wrong sensor
calibrations, physically damaged sensors, wrongly calculated data channels and many other sources. These problems may
cause a wrong situation assessment which can further lead to inaccurate decisions. Mathis and Thonhauser (2007)
reported the importance of data quality control for real-time rig performance evaluation. Typically, a data provider
service company takes the responsibility for the quality of the data channels which are sent to the interested parties. The
problem is that usually there is no clear agreement on the quality of service provided by such service companies apart
from the completeness of data. Data completeness is considered to be a poor measure of data channel quality as this
measure gives only an idea on the ratio of data received to data expected, which has nothing to do with the validity of the
data, e.g. a sensor delivers shifted values due to a wrong sensor calibration or a sensor malfunction.
2 SPE 167437

In this work, a model is presented which helps to monitor and quality control data channels delivered by a service
company. The data problems are listed and categorized and then Key Performance Indicators (KPIs) are suggested to
monitor the quality of sensor data channels. A model of Quality of Service (QoS) is presented. It is important to note that
this paper does not consider the reasons that might cause quality problems, but shows how the quality of data channels
will be evaluated and monitored.

Data Acquisition at the Rigsite

The most common surface drilling mechanics measurements come from sensors at the rigsite. These may be the rigs
sensors or provided by a third party, typically a mud logging contractor. As described above, some sensor values are
used directly (eg: pressures and hookload), other data channels are calculated from the sensor values (eg: bit and hole
depths by block movement when out of slips).

Figure 1: Sensor data acquisition at rigsite.

Figure 1 shows how the collection of directly measured and calculated data channels are mixed in to one data stream
using an instrumentation server and transferred to WITSML servers to be available to third parties remote from the rig.
WITS and WITSML are used as standard data formats to transmit data channels.

Real-time Sensor Data Problems


The sensor data received at the office site may have many problems related to time, depth or data channel.
Detailed examples of these problems are given in Appendix 1.

Time Problems
Time is the principal reference for real time data but is subject to several problems:
− Missing Timestamp
− Invalid Time Format
− Wrong time zone
− Incorrect, or no time synchronisation

Depth Problems
− Bit depth/hole depth resets
− Heave compensation (floating rigs)

Data Channel Problems


− Wrong channel description
− Wrong units
− Calibration
SPE 167437 3

− Gaps (missing values and null values)


− Different frequencies
− Outliers
− Drifting values

Data Quality Control of Rig Sensor Data


The amount of data collected at the rigsite has grown to amounts that are barely manageable. This collected data forms
the basis for many critical decisions. Currently rig operators have no clear idea on the quality of the data received from
rig, so there is an urgent requirement to monitor and control the quality of data.

The Key Performance Indicators (KPIs) technique is a good tool that can be used to evaluate quality of data.
A KPI is a value used to evaluate a specific property in data, for example, the number of missing values can be a good
KPI to measure the completeness of the data. KPIs can be categorized based on their severity or on the quality of service.
The severity category is linked to alarming applications while the quality of service category is used in data quality
reporting.

Alarming applications are used to alarm and notify the clients on data problems, in particular when there is a real-time
application or analysis service that uses the data. Data quality reports are issued regularly on
daily/weekly/monthly/yearly basis; they control the relations between data providers and clients. Currently, the clients
are unaware of the quality problems in received data from data providers, which can have bad consequences for remote
applications. Moreover, monitoring of data quality will help the data providers to improve their service by resolving the
issues that cause such quality issues.

Intelligent Key Performance Indicators


Intelligent KPIs can be designed to check the difference between predicted data channel values and the actual ones. The
predicted data channel can be calculated using either heuristic or analytical models (Kucs, 2008, Niedermayer, 2010,
Fruhwirth, 2006, and Arnaout, 2012).
An Intelligent KPI can be suggested based on the difference between the actual and predicted data channels, for example,
the pump pressure data channel can predicted using a heuristic model (which is trained using data from offset wells to
predict the value) and also by running an analytical hydraulics programme. The resulting KPI might be the ratio or
percentage of pump pressure that did not correlate to actual pressure.

Figure 5 shows this using both models.

Figure 5: Intelligent Key Performance Indicator


4 SPE 167437

Data Quality Centre


A Data Quality Centre is an extension of the RTOC (Real-time Operating Centres) and can be used to monitor data
quality issues in order to issue alarms, generate data quality reports and send feedback to data providers. Figure 6 shows
where the data quality centre should be located in the business context. The data channels usually flow from the
WITSML server to the data quality centre where all KPIs will be calculated in order to produce quality control reports
and also sending alarms as required.

Figure 6: Data Quality Control Centre

QoS of Rig Data Provider


The Quality of Service (QoS) is the concept of measuring the status of the services provided by the data providers to the
clients. Currently the only evaluation of rig data providers is by data completeness. In reality, the received data has many
problems, so it is important to find out detailed concepts for QoS measurement.

Using KPIs for measuring the data problems can be the key factor in designing new QoS concepts.

The following list represents the services that can be evaluated to measure QoS of data providers:
1. Validity: represents the ratio of valid received data values, the valid value is within a pre-defined domain
specified by engineers. For example, block position values can be between 0 and 90 meters based on the rig
derrick, any value located outside this range can be considered as invalid.
2. Accuracy: shows the ratio of accurate received data, the accurate data reflects the exactness of measurements
performed by the measurement system.
3. Consistency: measures the degree of uniformity of received data, the consistency of the received data gives an
idea on the regularity of the data received, for example, the ratio of duplicated records in the received data.
4. Integrity: shows the ratio of valid consistent relations between different sensors, for example, if the drillstring is
“In Slips” state and bit depth changes during this state, other examples are rpm with no torque readings and
pump strokes without any mud flow-in rate.
5. Timeliness: indicates whether the data is available at time needed.
6. Completeness: checks whether all necessary data received as it was expected.
7. Continuity: represents the regularity of received data or regularity of gaps in the data, are the gaps received
regularly or randomly?

Quality Reports of Rig Sensor Data


To understand the suggested concept of QoS, Figure 7 shows the QoS results from two different data providers.

This example shows interesting results, as mentioned, the service supplied by data providers is currently evaluated only
by data completeness. For Data Provider 1 the completeness service quality is very high for all the rigs, but the quality of
other services are at lower level and are sometimes considered as very low service quality (rig 5). The average QoS per
rig shows that maximum quality is in the range of 85-92% while completeness shows quality values between 94-99%.
The QoS report for Data Provider 1 shows a consistent high completeness service quality but with low quality for
Consistency, Integrity, Validity, and Timeliness. The overall QoS of Data Provider 1 is 90% compared to the
completeness service of 97%.
SPE 167437 5

Data Provider 2 shows different results. The completeness of data is low but the other quality service measures show
high values. The overall QoS is 95% compared to the completeness service of 89%.

The importance of QoS reports comes from their ability to highlight the weak points, limitations and advantages of
services supplied by data providers. Furthermore, it is possible to present a feedback to the data providers and tell them
where the restrictions of their services are.

Figure 7: Data Quality of Service Report – Sample

Summary
Data problems significantly affect not only the accuracy of remote analysis services but also any critical decisions at the
rigsite. The sources of data problems are inadequate sensors, poor data acquisition systems, bad sensor calibration, and
communication failures.

In this paper an overview of data quality problems and their categories has been presented. Then the concept of using
Key Performance Indicators to measure the data quality received from rigsite including the use of intelligent KPI models
was suggested. The concept of Quality of Service of data providers was proposed to evaluate the services supplied by
data providers. All data quality problems, quality KPI calculations and QoS reports would be handled and generated by
the Data Quality Centre, which is a part of Real-time Operating Centre.

A complete analysis of the provided services can be offered by the Data Quality Centre to help the client in getting a
correct idea on the quality of data they obtain from different data providers. Another important role for the Data Quality
Centre is to send feedback to the data providers to improve their services by taking the required actions to rectify data
problems.

The appendix of this paper contains examples of common data quality problems.

References
Kucs, Richard, et al. "Automated Real-Time Hookload and Torque Monitoring."IADC/SPE Drilling Conference.
2008.
Niedermayr, Michael, et al. "Case Study--Field Implementation of Automated Torque-and-Drag Monitoring for
Maari Field Development." IADC/SPE Drillng Conference and Exhibition. 2010.
Fruhwirth, Rudolf, Gerhard Thonhauser, and Wolfgang Mathis. "Hybrid Simulation Using Neural Networks To
Predict Drilling Hydraulics in Real Time."SPE Annual Technical Conference and Exhibition. 2006.
Arnaout, A., et al. "Model-Based Hookload Monitoring and Prediction at Drilling Rigs using Neural Networks
and Forward-Selection Algorithm." EGU General Assembly Conference Abstracts. Vol. 14. 2012.
Mathis, Wolfgang, and Gerhard Thonhauser. "Mastering Real-Time Data Quality Control-How to Measure and
Manage the Quality of (Rig) Sensor Data."SPE/IADC Middle East Drilling and Technology Conference.
2007.
6 SPE 167437

Appendix: Data Channel Problems

Examples of the more common problems are listed below:

Time problems:

Missing Timestamp:  Data is sent without any time reference, so there is no way of knowing when the data was generated.

Invalid Timestamp Format: Timestamp is sent with wrong format, the following example shows a false month value.

Wrong Time Zone: Daylight savings time changes can result in data being overwritten when the clocks go back.
Similarly, though of less importance, there can be a gap of one hour when the clocks go forward.

Time synchronization: When data is sent from more than one provider on a rig, their acquisition systems clocks are often
offset. This means that comparisons are not possible, particularly between MWD memory data and surface sensor data.

Depth problems:

Bit Depth/Hole Depth Resets: Due to poor depth tracking it is very common for the bit depth (and sometimes also for the
hole depth) to be corrected (by reference to the pipe tally) during a connection.  

A B
C

In the above example, during the connection the bit depth was corrected by 5m (A). Drilling of the next stand started at
(B), but the data provider did not recognise drilling until (C), when bit depth equalled hole depth reached before the
connection.

A B C D
E
 
 
In this example, the depth tracking is affected by the bit position not going back to bottom after connections. At (A),
drilling is not recognised. At (B), both bit and hole depth were adjusted upwards. At (C), the bit depth was manually
corrected to show drilling. At (D), two joints were drilled, but shown as off bottom events. This was then corrected at
(E), where both bit and hole depths were adjusted downwards.
SPE 167437 7

A B

 
 
The example above shows running in to hole with a drilling BHA after a liner run. The bit depth had not been corrected
after setting the liner and pulling the running string. Therefore, at (A), bit depth became greater than hole depth, so that
the tripping was seen as drilling. This was manually corrected in two steps at (B).

Heave compensation (floating rigs):   On floating rigs, as the rig is moving up and down, there can be difficulties in
providing hole depth relative to the fixed point of reference. As the block position is normally measured relative to the
rig floor, the heave measurement needs to be taken into account to do this. This is often not done.

 
In the above example there is no heave compensation at all – bit depth mirrors the block position exactly.

 
This causes a distinct ‘stepping’ of hole depth, which does not correspond to the real rate of penetration.

Data channel problems:

Wrong channel description:   The wrong channel may be assigned to a data stream, for example Torque may be assigned
as RPM. Some of these errors are obvious, but many are not.  

Wrong units: It is common for the data provider to state the wrong unit for the channel, such as the value being in SI
units but stated to be in API units.  
 
In this example, the depth was in metres, not feet as stated.

Calibration: Calibration errors are extremely common.  

The above plot shows one stand being drilled – but the block position shows 154m movement.  
8 SPE 167437

B
A

 
In this example, at (A) the bit depth stays constant while still pulling out of hole. A manual depth reset was made at (B),
but from then the bit depth remained the same. The cause of this was almost certainly an incorrect “in slips” threshold.

Calculated values can also be incorrect, for example the pump liners may be changed but not the pump displacement in
the flow rate calculation

Gaps (missing values and null values): Data gaps may be due to problems at the rigsite (so no data is recorded) or to
transmission problems. In addition, for individual channels a sensor may fail, but the last value is still sent.

Example of short data gaps.


SPE 167437 9

Outliers: Outliers are values outside the expected range of data values, for example the following figure shows outliers in
the hookload data over a complete well.  

Different frequencies: Data is usually transmitted at a 5 or 10 second interval. Is the value transmitted the actual value at
that time, or an average/maximum/minimum value from the 5/10 second period?

Drifting values:

 
The above plot shows drifting of the block position while tripping.

 
The faster speed of the moving the block without weight produces more pulses from the encoder than the acquisition
system can process, causing a 2-3m difference in the movement distance measured. Block position therefore drifts
upwards when pulling out and downwards when running in.

You might also like