Professional Documents
Culture Documents
Overview
System Identification is an important area for data analysis in systems engineering But are other equally important areas In particular, how can we use historical data-bases that are collected routinely by process computers
Difficult nature of historical data Latent variable methods Problems and industrial applications
Hundreds to thousands of variables measured every few seconds for years Not result of designed experiments
Non-causal
Identifying the causal effect of one variable on another is not generally possible
Non-full rank
Variables are highly correlated with one another Statistical rank is very low
Rank is independent of the number of variables measured Depends on number of independent sources of variation occurring in the process
Missing data
10-20% missing is common Analysis methods must be able to trivially handle this Little information in any one variable Need multivariate methods to extract the information from all the variables
Measurements on k variables x = [x1, x2, ..., xk] Process actually driven by small set of a independent latent variables that actual drive the system z = [z1, z2, , za] (a << k) Measurements related to these as xT = zT R +eT z is not identifiable, but the space of the latent variables is: xT = tT P + eT
Symmetric in X and Y
Both functions of the LVs No assumption of a causal direction Both measured with error X & Y decided by objectives / what will be available in future
T Y = TQT = XWQ =XB
Prediction:
Operating space summarized by: few orthogonal LVs - t1, t2, and distance of an observation xj from this space given by
SPE i = ( x ij x ij ) 2
j=1 K ^
Max. variance components in X space Max. covariance Max. var of y explained by correlation with X Max. correlation
Discussion of LV Methods
All estimation methods provide set of orthogonal LVs Only PCR, PLS, ML provide good model for the X-space
X-space model is most important part of the model in many applications Why need model for X space?
In identification X is full rank by design of experiments With process data X is of very low rank (a<<k) Need to define this operating subspace! Model for X used to treat:
missing data detect outliers monitor processes provide realizable inverses for control It is this modeling of X-space that makes LV methods very different from other regression methods
Analysis of Industrial Data-bases Process Monitoring and FDI Soft Sensors / Inferential Models Extracting information from multivariate sensors System identification Process control in reduced dimensional LV spaces Many other interesting areas
Currently a major area of application of these LV methods in industry A major justification for every computer system was to collect data for process improvement ! But little has been done with these databases Data graveyards !
Massive data sets, missing data, outliers, extreme correlation among variables, non-causal nature of data, data compression algorithms, etc.
Latent variable model are ideal for analyzing these data Two common analysis problems:
Weekly averages, hourly averages, minute, second data, Build local models to detect & diagnose problems
LV score plots (eg. t1 vs t2) show the important process behavior in the LV space Loading plots (w1, w2) allow interpretation of general movements in the scores (ti = Xwi) Contribution plots show contribution of each variable to local changes in the scores & SPE
batches
Initial Conditions
Variable Trajectories
t2
t1
Build a new PLS model from historical data with only acceptable operation Any deviation from this model will reveal unacceptable behavior Statistics to plot: a Hotellings T2: T 2 = t l2 / s l2
l =1
k
Residual SPE:
Soft sensors built from process data using regression, ANNs, PLS Advantage of PLS models when:
Large number of highly correlated measurements Missing data Occasional outliers in the X measurements
Adaptive PLS and nonlinear PLS often used Key point in building inferential models is nature of the data used
Problems
Boiler fed with time varying mixture of waste hydrocarbon streams and natural gas. Energy content of waste stream varies considerably
Want to estimate energy content of waste stream in real time Want to estimate the steam generation rate
Time
Large 3-dimensional image arrays obtained every second Multi-way PCA Obtain very stable LV score plots of the highly variable flame images Averaging/filtering done in score space Extract feature information from the PCA score space Relate features to boiler performance via PLS
220 210 200 190 180 170 160 150 Predicted value
00 90 80 70 60 50
10:41 11:02 11:24 11:45 Time 12:07 12:29
Measured value
13:20
13:41 Time
14:03
(a) Case I
(b) Case II
300 Prediction (ppm) 250 200 150 100 50 50 150 250 Observation (ppm)
Tr aining set Test set
Revolution in new micro/molecular sensors More use of fiber optics spectrometers, imaging, acoustical, etc. sensors Characteristics:
Greatly enhance possibilities for control Problem is extracting the information from the large number of highly correlated measurements at each time
On-line Monitoring and Feedback Control of Snack Food Quality using Digital Imaging
C om put er
C am er a
Li ght i ng
U nseasoned Pr oduct
Tum bl er
Seasoni ng
C onveyor Bel t
Lab Analysis
Non-seasoned
Low-seasoned
High-seasoned
On-line Image
Product Image
Background Image
+
Product Mask
1
PLS model
Model Predict Value
Product Image
Cumulative histogram
Seasoning distribution
Visual Inspection
Closed-loop control of seasoning content and seasoning distribution from digital camera
Predicted seasoning level Seasoning level set point
N4SID algorithms: Variants of RRR CVA algorithms: CCA Both these involve LV methods that do not model the Xspace (no need in this case)
CV and/or MV spaces are high dimensional and non-full rank Spatial control of sheet and film processes Control of distributed properties (MWD, PSD) MVs are trajectories in batch processes
50
45
200
2 3
1 40
Reactor Pressure
Jacket Pressure
35
30
50
25
0 0
25
50
75
100
125
150
175
200
20 0
25
50
75
100
125
150
175
200
Time (min)
Time (min)
Control in the LV space of the PLS model From the optimized values of the LVs (t1, t2) compute the entire remaining MV trajectories
Identification: PLS model using process variable & MV trajectory data from past batch operation plus a few batches with designed exp. at the control points Prediction:
At each decision period predict final quality using PLS model Problem dont have the trajectory data for rest of batch! Must use PLS model of X-space to impute the process variable trajectories for the remaining part of the batch (missing data)
Control:
t ( i )
y sp ) T Q 1 ( y y sp ) + t T Q 2 t + T min { (y st present ) T Q T T = (t + t y T
2
2 ( t + t present ) a 2 sa
a =1
t min t t m ax
Trajectory reconstruction of the full MV trajectories using Xspace model from PLS
T T 1 T T xT = (t x W )(P W ) P2 1 2 2 1 2
50 -0.1 45 0.1
Reactor Pressure
Jacket Pressure
40
35
- - - Nominal condition -10% in W - - +10% in W
30
50
25
0 0
25
50
75
100
125
150
175
200
20 0
25
50
75
100
125
150
175
200
Time (min)
Time (min)
SUMMARY
Presented overview of data-based methods for process analysis, monitoring and control. Latent variable models provide the basis for treating these subspace problems
High dimensionality, extreme correlation & reduced rank missing data & outliers They provide models for the X space Analysis of data-bases / troubleshooting Process monitoring / FDI Soft sensors and control from digital images Control in reduced dimensional spaces
Acknowledgements
All my excellent graduate students who have contributed to this research In particular to