Predictive Analysis Overview 2013

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Appliance Software SPS 05
Target Audience Consultants Administrators SAP Hardware Partner Others
Public March 2013
SAP AG 2013
Copyright
2013 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. Adobe, the Adobe logo, Acrobat, PostScript, and Reader are trademarks or registered trademarks of Adobe Systems Incorporated in the United States and other countries. Apple, App Store, FaceTime, iBooks, iPad, iPhone, iPhoto, iPod, iTunes, Multi-Touch, Objective-C, Retina, Safari, Siri, and Xcode are trademarks or registered trademarks of Apple Inc. Bluetooth is a registered trademark of Bluetooth SIG Inc. Citrix, ICA, Program Neighborhood, MetaFrame now XenApp, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems Inc. Computop is a registered trademark of Computop Wirtschaftsinformatik GmbH. Edgar Online is a registered trademark of EDGAR Online Inc., an R.R. Donnelley & Sons Company. Facebook, the Facebook and F logo, FB, Face, Poke, Wall, and 32665 are trademarks of Facebook. Google App Engine, Google Apps, Google Checkout, Google Data API, Google Maps, Google Mobile Ads, Google Mobile Updater, Google Mobile, Google Store, Google Sync, Google Updater, Google Voice, Google Mail, Gmail, YouTube, Dalvik, and Android are trademarks or registered trademarks of Google Inc. HP is a registered trademark of the Hewlett-Packard Development Company L.P. HTML, XML, XHTML, and W3C are trademarks, registered trademarks, or claimed as generic terms by the Massachusetts Institute of Technology (MIT), European Research Consortium for Informatics and Mathematics (ERCIM), or Keio University. IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, z10, z/VM, z/OS, OS/390, zEnterprise, PowerVM, Power Architecture, Power Systems, POWER7, POWER6+, POWER6, POWER, PowerHA, pureScale, PowerPC, BladeCenter, System Storage, Storwize, XIV, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, AIX, Intelligent Miner, WebSphere, Tivoli, Informix, and Smarter Planet are trademarks or registered trademarks of IBM Corporation. Microsoft, Windows, Excel, Outlook, PowerPoint, Silverlight, and Visual Studio are registered trademarks of Microsoft Corporation. INTERMEC is a registered trademark of Intermec Technologies Corporation. IOS is a registered trademark of Cisco Systems Inc. The Klout name and logos are trademarks of Klout Inc. Linux is the registered trademark of Linus Torvalds in the United States and other countries. Motorola is a registered trademark of Motorola Trademark Holdings LLC. Mozilla and Firefox and their logos are registered trademarks of the Mozilla Foundation. Novell and SUSE Linux Enterprise Server are registered trademarks of Novell Inc. OpenText is a registered trademark of OpenText Corporation. Oracle and Java are registered trademarks of Oracle and its affiliates. QR Code is a registered trademark of Denso Wave Incorporated. RIM, BlackBerry, BBM, BlackBerry Curve, BlackBerry Bold, BlackBerry Pearl, BlackBerry Torch, BlackBerry Storm, BlackBerry Storm2, BlackBerry PlayBook, and BlackBerry AppWorld are trademarks or registered trademarks of Research in Motion Limited. SAVO is a registered trademark of The Savo Group Ltd. The Skype name is a trademark of Skype or related entities. Twitter and Tweet are trademarks or registered trademarks of Twitter. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Wi-Fi is a registered trademark of Wi-Fi Alliance. SAP, R/3, ABAP, BAPI, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA, the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, Sybase, Adaptive Server, Adaptive Server Enterprise, iAnywhere, Sybase 365, SQL Anywhere, Crossgate, B2B 360 and B2B 360 Services, m@gic EDDY, Ariba, the Ariba logo, Quadrem, b-process, Ariba Discovery, SuccessFactors, Execution is the Difference, BizX Mobile Touchbase, It's time to love work again, SuccessFactors Jam and BadAss SaaS, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany or an SAP affiliate company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
SAP AG 2013
Table of Contents
1 2 What is PAL? ....................................................................................................................... 5 Getting Started .................................................................................................................... 6 2.1 2.2 2.3 2.4 Prerequisites ................................................................................................................ 6 Application Function Libraries (AFL) ............................................................................ 6 Security ........................................................................................................................ 6 How to Call PAL Functions .......................................................................................... 7 2.4.1 3 Parameter Table Structure .............................................................................. 9
PAL Functions ................................................................................................................... 10 3.1 Clustering Algorithms ................................................................................................. 12 3.1.1 3.1.2 3.1.3 3.2 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.2.8 3.2.9 3.3 3.4 3.3.1 3.4.1 3.4.2 3.4.3 3.5 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5 3.6 3.6.1 3.6.2 Anomaly Detection ........................................................................................ 12 K-means ........................................................................................................ 17 Self-Organizing Maps .................................................................................... 26 Bi-Variate Geometric Regression .................................................................. 32 Bi-Variate Natural Logarithmic Regression ................................................... 41 C4.5 Decision Tree ........................................................................................ 50 CHAID Decision Tree .................................................................................... 60 Exponential Regression................................................................................. 69 KNN ............................................................................................................... 78 Multiple Linear Regression ............................................................................ 82 Polynomial Regression .................................................................................. 90 Logistic Regression ....................................................................................... 99 Apriori .......................................................................................................... 109 Single Exponential Smoothing .................................................................... 120 Double Exponential Smoothing ................................................................... 124 Triple Exponential Smoothing ..................................................................... 129 Binning ......................................................................................................... 134 Inter-quartile Range Test ............................................................................. 139 Sampling ...................................................................................................... 144 Scaling Range ............................................................................................. 151 Variance Test .............................................................................................. 156 ABC Analysis ............................................................................................... 161 Weighted Score Table ................................................................................. 165
Classification Algorithms ............................................................................................ 32
Association Algorithms ............................................................................................. 109 Time Series Algorithms ............................................................................................ 120
Preprocessing Algorithms ........................................................................................ 134
Miscellaneous ........................................................................................................... 161
SAP AG 2013
4 5
End-to-End Scenarios ..................................................................................................... 171 Best Practices .................................................................................................................. 180
SAP AG 2013
What is PAL?
SAP HANAs SQLScript, an extension of SQL that includes enhanced control-flow capabilities, lets developers define complex application logic inside database procedures. However, it is difficult or even impossible to describe predictive analysis logic with procedures. For example, an application may need to perform a cluster analysis in a huge customer table with 1T records. It is impossible to implement the analysis in a procedure using the simple classic K-means algorithms, and also impossible with the more complicated algorithms in the data-mining area. Transferring large tables to the application server to perform the K-means calculation would also be costly. The Predictive Analysis Library (PAL) defines functions that can be called from within SQLScript procedures to perform analytic algorithms. Currently, PAL includes classic and universal predictive analysis algorithms in six data-mining categories: Clustering Classification Association Time Series Preprocessing Miscellaneous
The algorithms in PAL were carefully selected based on the following criteria: The algorithms are needed for SAP HANA applications. The algorithms are the most commonly used based on market surveys (e.g., Rexer Analytics and KDnuggets polls). The algorithms are generally available in other database products.
SAP AG 2013
2
2.1

Getting Started
Prerequisites
To use the PAL functions, you must: Install SAP HANA SPS05. Install the Application Function Library (AFL), which includes PAL. For more information, see the section Installing Application Function Libraries (AFLs) on a SAP HANA System in the SAP HANA Installation Guide with Unified Installer.
2.2
Application Function Libraries (AFL)
You can dramatically increase performance by executing complex computations in the database instead of at the application sever level. SAP HANA provides several techniques to move application logic into the database, and one of the most important is the use of application functions. Application functions are like database procedures written in C++ and called from outside to perform data intensive and complex operations. Functions for a particular topic are grouped into an application function library (AFL), such as the Predictive Analysis Library (PAL) and the Business Function Library (BFL). Currently, all AFLs are delivered in one archive (that is, one SAR file with the name AFL<version_string>.SAR). The AFL archive is not part of the HANA appliance, and must be installed separately by the administrator. Each release of AFL has a version in the form of <revision_number>.<patch_level>. For example, AFL 40.01 refers to revision 40 and patch level 01. The revision of the AFL must match the revision of the SAP HANA. Thus, an AFL revision 40 (any patch level) should be installed with SAP HANA revision 40 only.
2.3
Security
This section provides detailed security information which can help administrator and architects answer some common questions. 1. User and Schema During startup, the system creates the user _SYS_AFL, with default schema _SYS_AFL. All AFL objects (such as areas, packages, functions, and procedures) are created under this user and schema. Therefore, all these objects have fully specified names in the form of _SYS_AFL.<object name>. 2. Role Assignment For each AFL library, there is a role. You must be assigned this role to execute the functions in the library. The role for the PAL library is named: AFL__SYS_AFL_AFLPAL_EXECUTE
Note There are 2 underscores between AFL and SYS. Once a role is created, it cannot be dropped anymore. In other words, even when an area with all its objects is dropped and re-created during system startup, the user still keeps the role originally granted. SAP AG 2013 6
2.4

How to Call PAL Functions
To use PAL functions, you must do the following: Create the AFL_WRAPPER_GENERATOR procedure. This only needs to be done once. From within SQLScript code, generate a procedure that wraps the PAL function. Call the procedure, for example, from an SQLScript procedure.
Step 1 Create the AFL_WRAPPER_GENERATOR Procedure Before using any AFL function, you need to create the AFL_WRAPPER_GENERATOR procedure. It is used to generate a wrapper for the AFL functions that take tables with a variable number of columns as inputs. This procedure only needs to be created once. 1. Make sure you are the SYSTEM user. 2. Go to /hanamnt/<SID>/HDB <instance_number>/exe/plugins/afl/ and run the script to execute the afl_wrapper_generator.sql script file. Thus, the AFL_WRAPPER_GENERATOR procedure is owned by the SYSTEM user. 3. Grant the EXECUTE privilege of system.afl_wrapper_generator to other users. For example, if the user name is USER1, run the command: GRANT EXECUTE ON system.afl_wrapper_generator to USER1 Note The above steps need to be performed each time after the HANA instance is restarted. Step 2 Generate a PAL Procedure Any user granted with the EXECUTE privilege on the system.afl_wrapper_generator procedure can generate a procedure for a specific PAL function. The syntax is shown below: CALL SYSTEM.AFL_WRAPPER_GENERATOR( '<procedure_name>', '<area_name>', '<function_name>', <signature_table>); <procedure_name>: A name for the PAL procedure. This can be anything you want. <area_name>: Always set to AFLPAL. <function_name>: A PAL built-in function name. <signature_table>: A user-defined table variable. The table contains records to describe the input table type, parameter table type, and result table type. A typical table variable references a table with the following definition: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
SAP AG 2013
Notes 1. The system.afl_wrapper_generator procedure is in definer mode, which means, the user who generates a PAL procedure should grant the SELECT privilege on signature table to the SYSTEM user who is the definer of system.afl_wrapper_generator. For example, if the user name is USER1, run the command: GRANT SELECT ON user1.<signature table> to SYSTEM 2. The records in the signature table must follow this order: first input table types, next parameter table type, and then output table types. 3. The signature table must be created before generating the PAL procedure. The table type names are user-defined. You can find detailed table type definitions for each PAL function in Chapter 3. 4. It is suggested that you add <schema_name> before the table type name in <signature_table>. 5. Since all the generated procedures and the procedure parameter table types belong to the _SYS_AFL schema, their names must be unique. The procedure names are defined by users. When generating a PAL procedure, make sure you give a unique procedure name. The parameter table type names are given by the system, so it is guaranteed the names are unique. 6. If you want to drop an existing procedure and then generate it again, you can use the user SYSTEM to remove the generated procedure and all its parameter table types, by running: DROP _SYS_AFL.<PROCEDURE_NAME>; DROP TYPE _SYS_AFL.<PROCEDURE_NAME>__TT_P1; DROP TYPE _SYS_AFL.<PROCEDURE_NAME>__TT_P2; DROP TYPE _SYS_AFL.<PROCEDURE_NAME>__TT_P3; (until all table type names in the signature table are dropped) Step 3 Call a PAL Procedure After generating a PAL procedure, any user that has the AFL__SYS_AFL_AFLPAL_EXECUTE role can call the procedure, using the syntax below. CALL <procedure_name>( <data_input_table> {,}, <parameter_table>, <output_table> {,}) with overview; <procedure_name>: The procedure name specified when generating the procedure in Step 2. <data_input_table>: User-defined name(s) of the procedures input table(s). Detailed input table definitions for each procedure can be found in Chapter 3. <parameter_table>: User-defined name of the procedures parameter table. The table structure is described in Section 2.4.1. Detailed parameter table definition for each procedure can be found in Chapter 3. <output_table>: User-defined name(s) of the procedures output table(s). Detailed output table definition for each procedure can be found in Chapter 3. Notes 1. The input, parameter, and output tables must be created before calling the procedure. 2. Some PAL algorithms have more than one input table or more than one output table. 3. All AFL objects are owned by the _SYS_AFL user and reside in the _SYS_AFL schema. To call the PAL procedure generated in Step 2, you need the AFL__SYS_AFL_AFLPAL_EXECUTE role.
SAP AG 2013
2.4.1
Parameter Table Structure
PAL functions use parameter tables to transfer parameter values. Each PAL function has its own parameter table. To avoid a conflict of table names when several users call PAL functions at the same time, the parameter table must be created as a local temporary column table, so that each parameter table has its own unique scope per session. The table structure is as follows: Column Name Name intArgs doubleArgs stringArgs Data Type Varchar or char Integer Double Varchar or char Description Parameter name Integer parameter value Double parameter value String parameter value
Each row contains only one parameter value, either integer, double or string. The following table is an example of a parameter table with three parameters. The first parameter, THREAD_NUMBER, is an integer parameter. Thus, in the THREAD_NUMBER row, you should fill the parameter value in the intArgs column, and leave the doubleArgs and stringArgs columns blank. Name THREAD_NUMBER SUPPORT VAR_NAME intArgs 1 0.2 hello doubleArgs stringArgs
SAP AG 2013
PAL Functions
The following are the available algorithms and functions in the Predictive Analysis Library. Category Clustering PAL Algorithm Anomaly Detection K-means Built-in Function Name ANOMALYDETECTION KMEANS VALIDATEKMEANS Self-Organizing Maps Classification Bi-Variate Geometric Regression SELFORGMAP GEOREGRESSION FORECASTWITHGEOR Bi-Variate Natural Logarithmic Regression LNREGRESSION FORECASTWITHLNR C4.5 Decision Tree CREATEDT PREDICTWITHDT CHAID Decision Tree CREATEDTWITHCHAID PREDICTWITHDT Exponential Regression EXPREGRESSION FORECASTWITHEXPR KNN Multiple Linear Regression KNN LRREGRESSION FORECASTWITHLR Polynomial Regression POLYNOMIALREGRESSION FORECASTWITHPOLYNOMIALR Logistic Regression LOGISTICREGRESSION FORECASTWITHLOGISTICR Association Apriori APRIORIRULE LITEAPRIORIRULE Preprocessing Binning Inter-Quartile Range Test Sampling Scaling Range Variance Test Time Series Single Exponential Smoothing Double Exponential Smoothing Triple Exponential Smoothing BINNING IQRTEST SAMPLING SCALINGRANGE VARIANCETEST SINGLESMOOTH DOUBLESMOOTH TRIPLESMOOTH
SAP AG 2013
10
Category Miscellaneous
PAL Algorithm ABC Analysis Weighted Score Table
Built-in Function Name ABC WEIGHTEDTABLE
SAP AG 2013
11
3.1 3.1.1
Clustering Algorithms Anomaly Detection
Anomaly detection is used to find the existing data objects that do not comply with the general behavior or model of the data. Such data objects, which are grossly different from or inconsistent with the remaining set of data, are called anomalies or outliers. Sometimes anomalies are also referred to as discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants in different application domains. Anomalies in data can translate to significant (and often critical) actionable information in a wide variety of application domains. For example, an anomalous traffic pattern in a computer network could mean that a hacked computer is sending out sensitive data to an unauthorized destination. An anomalous MRI image may indicate presence of malignant tumors. Anomalies in credit card transaction data could indicate credit card or identity theft or anomalous readings from a space craft sensor could signify a fault in some component of the space craft. PAL uses k-means to realize anomaly detection in two steps: 1. Use k-means to group the origin data into k clusters. 2. Identify some points that are far from all cluster centers as anomalies.
Prerequisites
The input data contains an ID column and the other columns are of integer or double data type. The input data does not contain null value. The algorithm will issue errors when encountering null values.
ANOMALYDETECTION
Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>', 'AFLPAL', 'ANOMALYDETECTION', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table.
SAP AG 2013
12
Signature Input Table Table Data Column 1 column Other columns

st
Column Data Type Integer or string Integer or double
Description ID Attribute data
Constraint It must be the first column.
Parameter Table Name GROUP_NUMBER Data Type Integer Description Number of groups (k). If k is not specified, the G-means method will be used to determine the number of clusters. DISTANCE_LEVEL Integer Computes the distance between the item and the cluster center. OUTLIER_PERCENTAGE Double 1 = Manhattan distance 2 = Euclidean distance 3 = Minkowski distance
Indicates the proportion of anomalies in the source data. Specifies which point should be defined as outlier:
OUTLIER_DEFINE
Integer
1 = max distance between the point and the center it belongs to 2 = max sum distance from the point to all centers
MAX_ITERATION
Integer
Maximum number of iterations. Center initialization type: 1 = first K 2 = random with replacement 3 = random without replacement 4 = one patent of selecting the init center (US 6,882,998 B1)
INIT_TYPE
Integer
Normalization type: NORMALIZATION Integer 0 = no 1 = yes. For each point X(x1,x,,xn), the normalized value will be X'(|x1|/S,|x2|/S,...,|xn|/S), where S = |x1|+|x2|+...|xn|. 2 = for each column C, get the min and max value of C, and then C[i] = (C[i]-min)/(max-min).
THREAD_NUMBER EXIT_THRESHOLD Integer Double
Number of threads. Threshold (actual value) for exiting the iterations.
SAP AG 2013
13
Output Table Table Result Column 1 column Other columns

st
Description ID Coordinates of outliers
Constraint
It must have the same type as the input data table.
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_AD_RESULT_T; CREATE TYPE PAL_AD_RESULT_T AS TABLE( "ID" INT, "V000" DOUBLE, "V001" DOUBLE ); DROP TYPE PAL_AD_DATA_T; CREATE TYPE PAL_AD_DATA_T AS TABLE( "ID" INT, "V000" DOUBLE, "V001" DOUBLE, primary key("ID") ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); -- create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_AD_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); SAP AG 2013 14
INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_AD_RESULT_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_ANOMALY_DETECTION', 'AFLPAL', 'ANOMALYDETECTION', PDATA); DROP TABLE PAL_AD_DATA_TAB; CREATE COLUMN TABLE PAL_AD_DATA_TAB ( "ID" INT, "V000" DOUBLE, "V001" DOUBLE, primary key("ID") ); INSERT INTO PAL_AD_DATA_TAB VALUES (0 , 0.5, 0.5); INSERT INTO PAL_AD_DATA_TAB VALUES (1 , 1.5, 0.5); INSERT INTO PAL_AD_DATA_TAB VALUES (2 , 1.5, 1.5); INSERT INTO PAL_AD_DATA_TAB VALUES (3 , 0.5, 1.5); INSERT INTO PAL_AD_DATA_TAB VALUES (4 , 1.1, 1.2); INSERT INTO PAL_AD_DATA_TAB VALUES (5 , 0.5, 15.5); INSERT INTO PAL_AD_DATA_TAB VALUES (6 , 1.5, 15.5); INSERT INTO PAL_AD_DATA_TAB VALUES (7 , 1.5, 16.5); INSERT INTO PAL_AD_DATA_TAB VALUES (8 , 0.5, 16.5); INSERT INTO PAL_AD_DATA_TAB VALUES (9 , 1.2, 16.1); INSERT INTO PAL_AD_DATA_TAB VALUES (10, 15.5, 15.5); INSERT INTO PAL_AD_DATA_TAB VALUES (11, 16.5, 15.5); INSERT INTO PAL_AD_DATA_TAB VALUES (12, 16.5, 16.5); INSERT INTO PAL_AD_DATA_TAB VALUES (13, 15.5, 16.5); INSERT INTO PAL_AD_DATA_TAB VALUES (14, 15.6, 16.2); INSERT INTO PAL_AD_DATA_TAB VALUES (15, 15.5, 0.5); INSERT INTO PAL_AD_DATA_TAB VALUES (16, 16.5, 0.5); INSERT INTO PAL_AD_DATA_TAB VALUES (17, 16.5, 1.5); INSERT INTO PAL_AD_DATA_TAB VALUES (18, 15.5, 1.5); INSERT INTO PAL_AD_DATA_TAB VALUES (19, 15.7, 1.6); INSERT INTO PAL_AD_DATA_TAB VALUES (20,-1.0, -1.0); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB ( "NAME" VARCHAR (50), "INTARGS" INTEGER,
SAP AG 2013
15
"DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('GROUP_NUMBER',4,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('INIT_TYPE',4,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('DISTANCE_LEVEL',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('MAX_ITERATION',100,null,null); DROP TABLE PAL_AD_RESULT_TAB; CREATE COLUMN TABLE PAL_AD_RESULT_TAB ( "ID" INT, "V000" DOUBLE, "V001" DOUBLE ); CALL _SYS_AFL.PAL_ANOMALY_DETECTION(PAL_AD_DATA_TAB, PAL_CONTROL_TAB, PAL_AD_RESULT_TAB) with overview; select * from PAL_AD_RESULT_TAB;
Expected Result PAL_AD_RESULT_TAB:
SAP AG 2013
16
3.1.2
K-means
In predictive analysis, k-means clustering is a method of cluster analysis. The k-means algorithm partitions n observations or records into k clusters in which each observation belongs to the cluster with the nearest center. In marketing and customer relationship management areas, this algorithm uses customer data to track customer behavior and create strategic business initiatives. Organizations can thus divide their customers into segments based on variants such as demography, customer behavior, customer profitability, measure of risk, and lifetime value of a customer or retention probability. Clustering works to group records together according to an algorithm or mathematical formula that attempts to find centroids, or centers, around which similar records gravitate. The most common algorithm uses an iterative refinement technique. It is also referred to as Lloyd's algorithm: Given an initial set of k means m1, ..., mk, the algorithm proceeds by alternating between two steps: Assignment step: assigns each observation to the cluster with the closest mean. Update step: calculates the new means to be the center of the observations in the cluster.
The algorithm repeats until the assignments no longer change. The k-means implementation in PAL supports multi-thread, data normalization, different distance level measurement, and cluster quality measurement (Silhouette).The implementation does not support categorical data, but this can be managed through data transformation. The first K and random K starting methods are supported.
Prerequisites
The input data contains an ID column and the other columns are of integer or double data type. The input data does not contain null value. The algorithm will issue errors when encountering null values.
KMEANS
This is a clustering function using the k-means algorithm. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>', 'AFLPAL', 'KMEANS', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <Center Point OUTPUT table type> Direction in in out out
SAP AG 2013
17
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <center point output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column Other columns
st
Constraint This must be the first column.
Parameter Table Name GROUP_NUMBER DISTANCE_LEVEL Data Type Integer Integer Description Number of groups (k). Computes the distance between the item and cluster center. MAX_ITERATION INIT_TYPE Integer Integer 1 = Manhattan distance 2 = Euclidean distance 3 = Minkowski distance
Maximum iterations. Center initialization type: 1 = first K 2 = random with replacement 3 = random without replacement 4 = one patent of selecting the init center (US 6,882,998 B1)
NORMALIZATION
Integer
Normalization type: 0 = no 1 = yes. For each point X (x1,x2,...,xn), the normalized value will be X'(|x1|/S,|x2|/S,...,|xn|/S), where S = |x1|+|x2|+...|xn|. 2 = for each column C, get the min and max value of C, and then C[i] = (C[i]-min)/(max-min).
THREAD_NUMBER EXIT_THRESHOLD Integer Double
Number of threads. Threshold (actual value) for exiting the iterations.
SAP AG 2013
18
Output Tables Table Result Column 1 column 2 column 3 column Center Points 1 column Other columns Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_KMEANS_RESASSIGN_T; CREATE TYPE PAL_KMEANS_RESASSIGN_T AS TABLE( "ID" INT, "CENTER_ASSIGN" INT, "DISTANCE" DOUBLE ); DROP TYPE PAL_KMEANS_DATA_T; CREATE TYPE PAL_KMEANS_DATA_T AS TABLE( "ID" INT, "V000" DOUBLE, "V001" DOUBLE, primary key("ID") ); DROP TYPE PAL_KMEANS_CENTERS_T; CREATE TYPE PAL_KMEANS_CENTERS_T AS TABLE( "CENTER_ID" INT, "V000" DOUBLE, "V001" DOUBLE ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER,
st rd nd st
Column Data Type Integer or string Integer or double Integer or double Integer Double
Description ID Clustered item assigned to class number The distance between the cluster and each point in the cluster Cluster center ID Cluster center coordinates
SAP AG 2013
19
"DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); -- create kmeans procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_KMEANS_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_KMEANS_RESASSIGN_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_KMEANS_CENTERS_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_KMEANS', 'AFLPAL', 'KMEANS', PDATA); DROP TABLE PAL_KMEANS_DATA_TAB; CREATE COLUMN TABLE PAL_KMEANS_DATA_TAB( "ID" INT, "V000" DOUBLE, "V001" DOUBLE, primary key("ID") ); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (0 , 0.5, 0.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (1 , 1.5, 0.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (2 , 1.5, 1.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (3 , 0.5, 1.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (4 , 1.1, 1.2); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (5 , 0.5, 15.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (6 , 1.5, 15.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (7 , 1.5, 16.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (8 , 0.5, 16.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (9 , 1.2, 16.1); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (10, 15.5, 15.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (11, 16.5, 15.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (12, 16.5, 16.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (13, 15.5, 16.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (14, 15.6, 16.2);
SAP AG 2013
20
INSERT INTO PAL_KMEANS_DATA_TAB VALUES (15, 15.5, 0.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (16, 16.5, 0.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (17, 16.5, 1.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (18, 15.5, 1.5); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (19, 15.7, 1.6); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('GROUP_NUMBER',4,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('INIT_TYPE',4,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('DISTANCE_LEVEL',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('MAX_ITERATION',100,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('EXIT_THRESHOLD',null,0.000001,null); INSERT INTO PAL_CONTROL_TAB VALUES ('NORMALIZATION',0,null,null); --clean kmeans result DROP TABLE PAL_KMEANS_RESASSIGN_TAB; CREATE COLUMN TABLE PAL_KMEANS_RESASSIGN_TAB( "ID" INT, "CENTER_ASSIGN" INT, "DISTANCE" DOUBLE, primary key("ID") ); DROP TABLE PAL_KMEANS_CENTERS_TAB; CREATE COLUMN TABLE PAL_KMEANS_CENTERS_TAB( "CENTER_ID" INT, "V000" DOUBLE, "V001" DOUBLE ); CALL _SYS_AFL.PAL_KMEANS(PAL_KMEANS_DATA_TAB, PAL_CONTROL_TAB, PAL_KMEANS_RESASSIGN_TAB, PAL_KMEANS_CENTERS_TAB) with overview; SELECT * FROM PAL_KMEANS_CENTERS_TAB; SELECT * FROM PAL_KMEANS_RESASSIGN_TAB;
SAP AG 2013
21
Expected Result PAL_KMEANS_RESASSIGN_TAB:
PAL_KMEANS_CENTERS_TAB:
SAP AG 2013
22
VALIDATEKMEANS
This is a quality measurement function for k-means clustering. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','VALIDATEKMEANS', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Data INPUT table type> <Type INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<data input table>, <type input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Data Column 1 column Other columns Type Data/ Class Data 1 column 2 column
nd st st
Column Data Type Integer Integer or double Integer Integer
Description ID Attribute data ID Class type
Parameter Table Name VARIABLE_NUM THREAD_NUMBER Output Table Table Result Column 1 column 2 column
nd st
Data Type Integer Integer
Description Number of variables Number of threads
Column Data Type Varchar or char Double
Description Name Measure result
SAP AG 2013
23
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE T_KMEANS_DATA; CREATE TYPE T_KMEANS_DATA AS TABLE( "ID" INT, "V000" DOUBLE, "V001" DOUBLE, primary key("ID") ); DROP TYPE T_KMEANS_TYPE_ASSIGN; CREATE TYPE T_KMEANS_TYPE_ASSIGN AS TABLE( "ID" INTEGER, "TYPE_ASSIGN" INTEGER ); DROP TYPE T_KMEANS_RESULT_SVALUE; CREATE TYPE T_KMEANS_RESULT_SVALUE AS TABLE( "NAME" VARCHAR (50), "S" DOUBLE ); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.T_KMEANS_DATA','in'); insert into PDATA values (2,'DM_PAL.T_KMEANS_TYPE_ASSIGN','in'); insert into PDATA values (3,'DM_PAL.CONTROL_T','in'); insert into PDATA values (4,'DM_PAL.T_KMEANS_RESULT_SVALUE','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palValidateKMeans','AFLPAL','VALIDATEKMEANS' ,PDATA);
SAP AG 2013
24
DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('VARIABLE_NUM', 2, null, null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER', 1, null, null); DROP VIEW V_KMEANS_TYPE_ASSIGN; CREATE VIEW V_KMEANS_TYPE_ASSIGN AS SELECT "ID", "CENTER_ASSIGN" AS "TYPE_ASSIGN" FROM PAL_KMEANS_RESASSIGN_TAB; DROP TABLE KMEANS_SVALUE_TAB; CREATE COLUMN TABLE KMEANS_SVALUE_TAB ( "NAME" VARCHAR (50), "S" DOUBLE ); CALL _SYS_AFL.palValidateKMeans(PAL_KMEANS_DATA_TAB, V_KMEANS_TYPE_ASSIGN, "#CONTROL_TAB", KMEANS_SVALUE_TAB) with overview; SELECT * FROM KMEANS_SVALUE_TAB;
Expected Result KMEANS_SVALUE_TAB:
SAP AG 2013
25
3.1.3
Self-Organizing Maps
Self-organizing feature maps (SOMs) are one of the most popular neural network methods for cluster analysis. They are sometimes referred to as Kohonen self-organizing feature maps, after their creator, Teuvo Kohonen, or as topologically ordered maps. SOMs aim to represent all points in a highdimensional source space by points in a low-dimensional (usually 2-D or 3-D) target space, such that the distance and proximity relationships are preserved as much as possible. This makes SOMs useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling. SOMs can also be viewed as a constrained version of k-means clustering, in which the cluster centers tend to lie in low-dimensional manifold in the feature or attribute space. The learning process mainly includes three steps: 1. Initialize the weighted vectors in each unit. 2. Select the Best Matching Unit (BMU) for every point and update the weighted vectors of BMU and its neighbours. 3. Repeat Step 2 until convergence or the maximum iterations are reached. The SOM approach has many applications such as virtualization, web document clustering, and speech recognition.
Prerequisites
The first column of the input data is an ID column and the other columns are of integer or double data type. The input data does not contain null value. The algorithm will issue errors when encountering null values.
SELFORGMAP
Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>', 'AFLPAL', 'SELFORGMAP', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Map OUTPUT table type> <Assign OUTPUT table type> Direction in in out out
SAP AG 2013
26
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <map output table>, <assign output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column Other columns
st
Constraint This must be the first column.
Parameter Table Name MAX_ITERATION NORMALIZATION Data Type Integer Integer Description Maximum number of iterations. Normalization type: THREAD_NUMBER SIZE_OF_MAP Integer Integer 0 = no 1 = transform to new range (0.0, 1.0) 2 = z-score normalization
Number of threads. Self-organizing map is made up of n n unit cells. This parameter indicates the n.
Output Tables Table Column 1 column SOM Map Other columns except the last one Last column 1 column SOM Assign 2 column
th st st
Column Data Type Integer double Integer Integer or string Integer
Description Unit cell ID. Weight vectors used to simulate the original tuples. Number of original tuples that every unit cell contains. ID of original tuples ID of the unit cells
SAP AG 2013
27
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_SOM_DATA_T; CREATE TYPE PAL_SOM_DATA_T AS TABLE( "TRANS_ID" INT, "V000" DOUBLE, "V001" DOUBLE, primary key("TRANS_ID") ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); DROP TYPE PAL_SOM_MAP_T; CREATE TYPE PAL_SOM_MAP_T AS TABLE( "CELL_ID" INT, "WEIGHT000" DOUBLE, "WEIGHT001" DOUBLE, "NUMS_TUPLE" INT ); DROP TYPE PAL_SOM_RESASSIGN_T; CREATE TYPE PAL_SOM_RESASSIGN_T AS TABLE( "TRANS_ID" INT, "CELL_ID" INT, primary key("TRANS_ID") ); -- create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT,
SAP AG 2013
28
"TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_SOM_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_SOM_MAP_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_SOM_RESASSIGN_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_SELF_ORG_MAP', 'AFLPAL', 'SELFORGMAP', PDATA); DROP TABLE PAL_SOM_DATA_TAB; CREATE COLUMN TABLE PAL_SOM_DATA_TAB( "TRANS_ID" INT, "V000" DOUBLE, "V001" DOUBLE, primary key("TRANS_ID") ); INSERT INTO PAL_SOM_DATA_TAB VALUES (0 , 0.1, 0.2); INSERT INTO PAL_SOM_DATA_TAB VALUES (1 , 0.22, 0.25); INSERT INTO PAL_SOM_DATA_TAB VALUES (2 , 0.3, 0.4); INSERT INTO PAL_SOM_DATA_TAB VALUES (3 , 0.4, 0.5); INSERT INTO PAL_SOM_DATA_TAB VALUES (4 , 0.5, 1.0); INSERT INTO PAL_SOM_DATA_TAB VALUES (5 , 1.1, 15.1); INSERT INTO PAL_SOM_DATA_TAB VALUES (6 , 2.2, 11.2); INSERT INTO PAL_SOM_DATA_TAB VALUES (7 , 1.3, 15.3); INSERT INTO PAL_SOM_DATA_TAB VALUES (8 , 1.4, 15.4); INSERT INTO PAL_SOM_DATA_TAB VALUES (9 , 3.5, 15.9); INSERT INTO PAL_SOM_DATA_TAB VALUES (10,13.1, 1.1); INSERT INTO PAL_SOM_DATA_TAB VALUES (11,16.2, 1.5); INSERT INTO PAL_SOM_DATA_TAB VALUES (12,16.3, 1.3); INSERT INTO PAL_SOM_DATA_TAB VALUES (13,12.4, 2.4); INSERT INTO PAL_SOM_DATA_TAB VALUES (14,16.9, 1.9); INSERT INTO PAL_SOM_DATA_TAB VALUES (15,49.0, 40.1); INSERT INTO PAL_SOM_DATA_TAB VALUES (16,50.1, 50.2); INSERT INTO PAL_SOM_DATA_TAB VALUES (17,50.2, 48.3); INSERT INTO PAL_SOM_DATA_TAB VALUES (18,55.3, 50.4); INSERT INTO PAL_SOM_DATA_TAB VALUES (19,50.4, 56.5); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB ( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE,
SAP AG 2013
29
"STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER', 2, null, null); INSERT INTO PAL_CONTROL_TAB VALUES ('MAX_ITERATION', 200, null, null); INSERT INTO PAL_CONTROL_TAB VALUES ('SIZE_OF_MAP', 4, null, null); INSERT INTO PAL_CONTROL_TAB VALUES ('NORMALIZATION', 0, null, null); DROP TABLE PAL_SOM_MAP_TAB; CREATE COLUMN TABLE PAL_SOM_MAP_TAB ( "CELL_ID" INT, "WEIGHT000" DOUBLE, "WEIGHT001" DOUBLE, "NUMS_TUPLE" INT ); DROP TABLE PAL_SOM_RESASSIGN_TAB; CREATE COLUMN TABLE PAL_SOM_RESASSIGN_TAB ( "TRANS_ID" INT, "CELL_ID" INT, primary key("TRANS_ID") ); CALL _SYS_AFL.PAL_SELF_ORG_MAP(PAL_SOM_DATA_TAB, PAL_CONTROL_TAB, PAL_SOM_MAP_TAB, PAL_SOM_RESASSIGN_TAB) with overview; select * from PAL_SOM_MAP_TAB; select * from PAL_SOM_RESASSIGN_TAB;
SAP AG 2013
30
Expected Result PAL_SOM_MAP_TAB:
PAL_SOM_RESASSIGN_TAB:
SAP AG 2013
31
3.2 3.2.1
Classification Algorithms Bi-Variate Geometric Regression
Geometric regression is an approach used to model the relationship between a scalar variable y and one or more variables denoted X. In geometric regression, data are modeled using geometric functions, and unknown model parameters are estimated from the data. Such models are called geometric models. In PAL, the implementation of geometric regression is to transform to linear regression and solve it:
y = 0 x 1
Where
and
are parameters that need to be calculated.
The steps are: 1. Put natural logarithmic operation on both sides: 2. Transform it into: 3. Let
ln( y ) = ln( 0 x 1 )
ln( y ) = ln( 0) + 1 ln( x )
y ' = ln( y ) , x ' = ln( x ) , 0' = ln( 0)
y ' = 0'+ 1 x '

Thus,
y ' and x ' is a linear relationship and can be solved with the linear regression method.
The implementation also supports calculating the F value and R^2 to determine statistical significance.
Prerequisites
No missing or null data in the inputs. The data is numeric, not categorical.
SAP AG 2013
32
GEOREGRESSION
This is a geometric regression function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','GEOREGRESSION', <signature table>); The signature table should contain the following records: Index 1 2 3 4 5 6 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <Fitted OUTPUT table type> <Significance OUTPUT table type> <PMML OUTPUT table type> Direction in in out out out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <fitted output table>, <significance output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column 3 column
rd nd st
Column Data Type Integer or varchar Integer or double Integer or double
Description ID Variable y Variable x
SAP AG 2013
33
Parameter Table Name THREAD_NUMBER PMML_EXPORT Data Type Integer Integer Description Number of threads. 0 (default): does not export geometric regression model in PMML. 1: exports geometric regression model in PMML in single row. 2: exports geometric regression model in several rows, each row containing a maximum of 5000 characters.
Output Tables Table Result Column 1 column 2 column

nd st
Column Data Type Integer Integer or double
Description ID Value Ai A0: intercept A1: beta coefficient for X1
Constraint
Fitted Data
1 column 2 column
nd st
st
Integer or varchar Integer or double Varchar or char Double Integer CLOB or varchar
ID Value Yi Name Value ID Geometric regression model in PMML format (R^2 / F)
Significance
1 column 2 column
nd st
PMML Result
1 column 2 column
nd
SAP AG 2013
34
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE( "ID" INT,"Y" DOUBLE,"X1" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP TYPE SIGNIFICANCE_T; CREATE TYPE SIGNIFICANCE_T AS TABLE("NAME" varchar(50),"VALUE" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE MODEL_T; CREATE TYPE MODEL_T AS TABLE("ID" INT,"Model" varchar(5000)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); insert into PDATA values (5,'DM_PAL.SIGNIFICANCE_T','out'); insert into PDATA values (6,'DM_PAL.MODEL_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palGeoR','AFLPAL','GEOREGRESSION',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100));
SAP AG 2013
35
INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); INSERT INTO #CONTROL_TAB VALUES ('PMML_EXPORT',2,null,null); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ( "ID" INT,"Y" DOUBLE,"X1" DOUBLE); INSERT INTO DATA_TAB VALUES (0,1.1,1); INSERT INTO DATA_TAB VALUES (1,4.2,2); INSERT INTO DATA_TAB VALUES (2,8.9,3); INSERT INTO DATA_TAB VALUES (3,16.3,4); INSERT INTO DATA_TAB VALUES (4,24,5); INSERT INTO DATA_TAB VALUES (5,36,6); INSERT INTO DATA_TAB VALUES (6,48,7); INSERT INTO DATA_TAB VALUES (7,64,8); INSERT INTO DATA_TAB VALUES (8,80,9); INSERT INTO DATA_TAB VALUES (9,101,10); DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" INT,"Ai" DOUBLE); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); DROP TABLE SIGNIFICANCE_TAB; CREATE COLUMN TABLE SIGNIFICANCE_TAB ("NAME" varchar(50),"VALUE" DOUBLE); DROP TABLE MODEL_TAB; CREATE COLUMN TABLE MODEL_TAB ("ID" INT, "PMMLMODEL" VARCHAR(5000)); CALL _SYS_AFL.palGeoR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB, SIGNIFICANCE_TAB, MODEL_TAB) with overview; SELECT * FROM RESULTS_TAB; SELECT * FROM FITTED_TAB; SELECT * FROM SIGNIFICANCE_TAB; SELECT * FROM MODEL_TAB;
SAP AG 2013
36
Expected Result RESULTS_TAB:
FITTED_TAB:
SIGNIFICANCE_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
37
FORECASTWITHGEOR
This function performs prediction with the geometric regression result. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','FORECASTWITHGEOR', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Predictive INPUT table type> <Coefficient INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<predictive input table>, <coefficient input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Predictive Data Column 1 column 2 column Coefficient 1 column 2 column
nd st nd st
Column Data Type Integer or varchar Integer or double Integer Integer or double
Description ID Variable X ID (start from 0) Value Ai
Parameter Table Name THREAD_NUMBER Data Type Integer Description Number of threads
Output Table Table Fitted Result Column 1 column 2 column

nd st
Column Data Type Integer/ varchar Integer/ double
Description ID Value Yi
SAP AG 2013
38
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PREDICT_T; CREATE TYPE PREDICT_T AS TABLE( "ID" INT,"X1" DOUBLE); DROP TYPE COEFFICIENT_T; CREATE TYPE COEFFICIENT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.PREDICT_T','in'); insert into PDATA values (2,'DM_PAL.COEFFICIENT_T','in'); insert into PDATA values (3,'DM_PAL.CONTROL_T','in'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palForecastWithGeoR','AFLPAL','FORECASTWITHG EOR',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); DROP TABLE PREDICTDATA_TAB; CREATE COLUMN TABLE PREDICTDATA_TAB ( "ID" INT,"X1" DOUBLE); INSERT INTO PREDICTDATA_TAB VALUES (0,1); INSERT INTO PREDICTDATA_TAB VALUES (1,2); INSERT INTO PREDICTDATA_TAB VALUES (2,3);
SAP AG 2013
39
INSERT INTO PREDICTDATA_TAB VALUES (3,4); INSERT INTO PREDICTDATA_TAB VALUES (4,5); DROP TABLE COEEFICIENT_TAB; CREATE COLUMN TABLE COEEFICIENT_TAB ("ID" INT,"Ai" DOUBLE); INSERT INTO COEEFICIENT_TAB VALUES (0,1); INSERT INTO COEEFICIENT_TAB VALUES (1,1.99); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); CALL _SYS_AFL.palForecastWithGeoR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB", FITTED_TAB) with overview; SELECT * FROM FITTED_TAB;
Expected Result FITTED_TAB:
SAP AG 2013
40
3.2.2
Bi-Variate Natural Logarithmic Regression
Bi-variate natural logarithmic regression is an approach to modeling the relationship between a scalar variable y and one variable denoted X. In natural logarithmic regression, data are modeled using natural logarithmic functions, and unknown model parameters are estimated from the data. Such models are called natural logarithmic models. In PAL, the implementation of natural logarithmic regression is to transform to linear regression and solve it:
y = 1 ln( x ) + 0
Where Let
0 and 1 are parameters that need to be calculated.
x ' = ln( x ) y = 0 + 1 x'

y and x ' is a linear relationship and can be solved with the linear regression method.
Then Thus,
Prerequisites
No missing or null data in the inputs. The data is numeric, not categorical. Given the structure as Y and X, there are more than 2 records available for analysis.
SAP AG 2013
41
LNREGRESSION
This is a logarithmic regression function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','LNREGRESSION', <signature table>); The signature table should contain the following records: Index 1 2 3 4 5 6 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <Fitted OUTPUT table type> <Significance OUTPUT table type> <PMML OUTPUT table type> Direction in in out out out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <fitted output table>, <significance output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column 3 column
rd nd st
Description ID Variable y Variable X
SAP AG 2013
42
Parameter Table Name THREAD_NUMBER PMML_EXPORT Data Type Integer Integer Description Number of threads 0 (default): does not export logarithmic regression model in PMML. 1: exports logarithmic regression model in PMML in single row. 2: exports logarithmic regression model in PMML in several rows, each row containing a maximum of 5000 characters.

nd st
Description ID Value Ai A0: intercept A1: beta coefficient for X1 A2: beta coefficient for X2
Constraint
Fitted Data
1 column 2 column
nd st
st
ID Value Yi Name Value ID Logarithmic regression model in PMML format (R^2 / F)
Significance
1 column 2 column
nd st
PMML Result
1 column 2 column
nd
SAP AG 2013
43
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE( "ID" INT,"Y" DOUBLE,"X1" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP TYPE SIGNIFICANCE_T; CREATE TYPE SIGNIFICANCE_T AS TABLE("NAME" varchar(50),"VALUE" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE MODEL_T; CREATE TYPE MODEL_T AS TABLE("ID" INT,"Model" varchar(5000)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); insert into PDATA values (5,'DM_PAL.SIGNIFICANCE_T','out'); insert into PDATA values (6,'DM_PAL.MODEL_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palLnR','AFLPAL','LNREGRESSION',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null);
SAP AG 2013
44
INSERT INTO #CONTROL_TAB VALUES ('PMML_EXPORT',2,null,null); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ( "ID" INT,"Y" DOUBLE,"X1" DOUBLE); INSERT INTO DATA_TAB VALUES (0,10,1); INSERT INTO DATA_TAB VALUES (1,80,2); INSERT INTO DATA_TAB VALUES (2,130,3); INSERT INTO DATA_TAB VALUES (3,160,4); INSERT INTO DATA_TAB VALUES (4,180,5); INSERT INTO DATA_TAB VALUES (5,190,6); INSERT INTO DATA_TAB VALUES (6,192,7); DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" INT,"Ai" DOUBLE); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); DROP TABLE SIGNIFICANCE_TAB; CREATE COLUMN TABLE SIGNIFICANCE_TAB ("NAME" varchar(50),"VALUE" DOUBLE); DROP TABLE MODEL_TAB; CREATE COLUMN TABLE MODEL_TAB("ID" INT, "PMMLMODEL" VARCHAR(5000)); CALL _SYS_AFL.palLnR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB, SIGNIFICANCE_TAB, MODEL_TAB) with overview; SELECT * FROM RESULTS_TAB; SELECT * FROM FITTED_TAB; SELECT * FROM SIGNIFICANCE_TAB; SELECT * FROM MODEL_TAB;
SAP AG 2013
45
FITTED_TAB:
SIGNIFICANCE_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
46
FORECASTWITHLNR
This function performs prediction with the natural logarithmic regression result. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','FORECASTWITHLNR', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Predictive INPUT table type> <Coefficient INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<predictive input table>, <coefficient input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Predictive Data Column 1 column 2 column Coefficient 1 column 2 column
nd st nd st

nd st
Column Data Type Integer or varchar Integer or double
SAP AG 2013
47
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PREDICT_T; CREATE TYPE PREDICT_T AS TABLE( "ID" INT,"X1" DOUBLE); DROP TYPE COEFFICIENT_T; CREATE TYPE COEFFICIENT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.PREDICT_T','in'); insert into PDATA values (2,'DM_PAL.COEFFICIENT_T','in'); insert into PDATA values (3,'DM_PAL.CONTROL_T','in'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palForecastWithLnR','AFLPAL','FORECASTWITHLN R',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); DROP TABLE PREDICTDATA_TAB; CREATE COLUMN TABLE PREDICTDATA_TAB ( "ID" INT,"X1" DOUBLE); INSERT INTO PREDICTDATA_TAB VALUES (0,1); INSERT INTO PREDICTDATA_TAB VALUES (1,2); INSERT INTO PREDICTDATA_TAB VALUES (2,3);
SAP AG 2013
48
INSERT INTO PREDICTDATA_TAB VALUES (3,4); INSERT INTO PREDICTDATA_TAB VALUES (4,5); INSERT INTO PREDICTDATA_TAB VALUES (5,6); INSERT INTO PREDICTDATA_TAB VALUES (6,7); DROP TABLE COEEFICIENT_TAB; CREATE COLUMN TABLE COEEFICIENT_TAB ("ID" INT,"Ai" DOUBLE); INSERT INTO COEEFICIENT_TAB VALUES (0,14.86160299); INSERT INTO COEEFICIENT_TAB VALUES (1,98.29359746); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); CALL _SYS_AFL.palForecastWithLnR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB", FITTED_TAB) with overview; SELECT * FROM FITTED_TAB; Expected Result FITTED_TAB:
SAP AG 2013
49
3.2.3
C4.5 Decision Tree
A decision tree is used as a classifier for determining an appropriate action or decision among a predetermined set of actions for a given case. A decision tree helps you to effectively identify the factors to consider and how each factor has historically been associated with different outcomes of the decision. A decision tree uses a tree-like structure of conditions and their possible consequences. Each node of a decision tree can be a leaf node or a decision node. Leaf node: mentions the value of the dependent (target) variable. Decision node: contains one condition that specifies some test on an attribute value. The outcome of the condition is further divided into branches with subtrees or leaf nodes.
As a classification algorithm, C4.5 builds decision trees from a set of training data, using the concept of information entropy. The training data is a set of already classified samples. At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits it into subsets in one class or the other. Its criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data. The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then proceeds recursively until meeting some stopping criteria such as the minimum number of cases in a leaf node. The C4.5 decision tree functions implemented in PAL support both discrete and continuous values. We discrete a continuous attribute by defining fixed intervals provided by users. For example, if the salary ranges from $100 to $20000, then we can form intervals like $0 $8000, $8000 $18000, and $18000 $20000. An attribute value will fall into any one of these intervals. In PAL implementation, the REP (Reduced Error Pruning) algorithm is used as pruning method.
Prerequisites
The column order and column number of the predicted data are the same as the order and number used in tree model building. The last column of the training data is used as a predicted field and is of discrete type. The predicted data set has an ID column. The input data does not contain null value. The algorithm will issue errors when encountering null values. The table used to store the tree model is a column table.
SAP AG 2013
50
CREATEDT
This function creates a decision tree from the input training data. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','CREATEDT', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <PMML OUTPUT table type> Direction in in out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Training / Historical Data Column Columns Column Data Type Varchar, char, integer, or double Description Table used to build the predictive tree model Constraint Discrete value: integer or varcar/char Continuous value: integer or double
SAP AG 2013
51
Parameter Table Name PERCENTAGE MIN_NUMS_RECORDS THREAD_NUMBER IS_SPLIT_MODEL Data Type Double Integer Integer Integer Description The percentage to be applied to input training data set. Controls the minimum training records in every leaf node. The default is zero. Number of threads. Indicates whether the string of the tree model should be split or not. If the value does not equal to 0, the tree model will be split, and the maximum length of each unit is k. CONTINUOUS_COL Integer or double (optional) Defines which column needs discretization and the interval provided by the user. The column index starts from zero. The integer value specifies the column position. The double value specifies the interval. PMML_EXPORT Integer 0 (default): does not export PMML tree model. 1: exports PMML tree model in single row. 2: exports PMML tree model in several rows, each row containing a maximum of 5000 characters.
Output Tables Table Result (tree model) Column 1 column 2 column

nd st
Column Data Type Integer CLOB or varchar
Description ID Tree model saved as a JSON string.
Constraint
The table must be a column table. The maximum length is 5000.
PMML Result
1 column 2 column
nd
st
Integer CLOB or varchar
ID C4.5 decision tree model in PMML format
SAP AG 2013
52
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double, "CLASSLABEL" VARCHAR(50) ); DROP TYPE PAL_JSONMODEL_T; CREATE TYPE PAL_JSONMODEL_T AS TABLE( "ID" INT, "JSONMODEL" VARCHAR(5000) ); DROP TYPE PAL_PMMLMODEL_T; CREATE TYPE PAL_PMMLMODEL_T AS TABLE( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR(100) ); --create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in');
SAP AG 2013
53
INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_JSONMODEL_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_PMMLMODEL_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_CREATEDT', 'AFLPAL', 'CREATEDT', PDATA);
DROP TABLE
PAL_TRAINING_TAB;
CREATE COLUMN TABLE PAL_TRAINING_TAB( "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double, "CLASSLABEL" VARCHAR(50) ); INSERT INTO PAL_TRAINING_TAB VALUES ('South', 'Winter', 100000, 'Good'); INSERT INTO PAL_TRAINING_TAB VALUES ('North', 'Spring', 45000, 'Average'); INSERT INTO PAL_TRAINING_TAB VALUES ('West', 'Summer', 30000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('East', 'Autumn', 5000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('West', 'Spring', 5000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('East', 'Spring', 200000, 'Good'); INSERT INTO PAL_TRAINING_TAB VALUES ('South', 'Summer', 25000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('South', 'Spring', 10000, 'Average'); INSERT INTO PAL_TRAINING_TAB VALUES ('North', 'Winter', 50000, 'Average'); DROP TABLE PAL_CONTROL_TAB; PAL_CONTROL_TAB(
CREATE COLUMN TABLE "INTARGS" INTEGER,
"NAME" VARCHAR (50), "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('PERCENTAGE',null,1.0,null); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('IS_SPLIT_MODEL',1,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('PMML_EXPORT', 2, null, null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',2,25000,null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',2,60000,null); DROP TABLE PAL_JSONMODEL_TAB; CREATE COLUMN TABLE PAL_JSONMODEL_TAB( "ID" INT, "JSONMODEL" VARCHAR(5000)
SAP AG 2013
54
); DROP TABLE PAL_PMMLMODEL_TAB; CREATE COLUMN TABLE PAL_PMMLMODEL_TAB( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); CALL _SYS_AFL.PAL_CREATEDT(PAL_TRAINING_TAB, PAL_CONTROL_TAB, PAL_JSONMODEL_TAB, PAL_PMMLMODEL_TAB) with overview; SELECT * FROM PAL_JSONMODEL_TAB; SELECT * FROM PAL_PMMLMODEL_TAB;
Expected Result PAL_JSONMODEL_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
55
PREDICTWITHDT
This function uses decision trees to perform prediction. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>', 'AFLPAL', 'PREDICTWITHDT', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Data INPUT table type> <PARAMETER table type> <Model INPUT table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<data input table>, <parameter table>, <model input table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Predicted Data Column 1 column Other columns Predictive Model 1 column 2 column
nd st st
Column Data Type Integer Varchar or char Integer Varchar
Description ID Data to be classified (predicted) ID Serialized tree model
Output Table Table Result (tree model) Column 1 column 2 column

nd st
Column Data Type Integer Varchar or char
Description ID Predictive result
SAP AG 2013
56
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. -- Note: Before generating this model, make sure you have created the tree model using the CREATEDT function. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "ID" INT, "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double ); DROP TYPE PAL_JSONMODEL_T; CREATE TYPE PAL_JSONMODEL_T AS TABLE( "ID" INT, "JSONMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); DROP TYPE PAL_RESULT_T; CREATE TYPE PAL_RESULT_T AS TABLE( "ID" INT, "CLASSLABEL" VARCHAR(50) ); -- create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100),
SAP AG 2013
57
"DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_JSONMODEL_T', 'in'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_RESULT_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_PREDICTWITHDT', 'AFLPAL', 'PREDICTWITHDT', PDATA); DROP TABLE "ID" INT, "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double ); INSERT INTO PAL_DATA_TAB VALUES (0,'South', 'Autumn', 60000); INSERT INTO PAL_DATA_TAB VALUES (1,'North', 'Spring', 30000); INSERT INTO PAL_DATA_TAB VALUES (2,'South', 'Summer', 25000); INSERT INTO PAL_DATA_TAB VALUES (3,'West', 'Winter', 5000); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB ( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); DROP TABLE PAL_RESULT_TAB; CREATE TABLE PAL_RESULT_TAB( "ID" INT, "CLASSLABEL" VARCHAR(50) ); CALL _SYS_AFL.PAL_PREDICTWITHDT(PAL_DATA_TAB, PAL_CONTROL_TAB, PAL_JSONMODEL_TAB, PAL_RESULT_TAB) with overview; SELECT * FROM PAL_RESULT_TAB; PAL_DATA_TAB;
CREATE COLUMN TABLE PAL_DATA_TAB (
SAP AG 2013
58
Expected Result PAL_RESULT_TAB:
SAP AG 2013
59
3.2.4
CHAID Decision Tree
CHAID stands for CHi-squared Automatic Interaction Detection. It is similar to the C4.5 decision tree. CHAID is a classification method for building decision trees by using chi-square statistics to identify optimal splits. CHAID examines the cross tabulations between each of the input fields and the outcome, and tests for significance using a chi-square independence test. If more than one of these relations is statistically significant, CHAID will select the input field that is the most significant (smallest p value). CHAID can generate non-binary trees.
Prerequisites
The column order and column number of the predicted data are the same as the order and number used in tree model building. The last column of the training data is used as a predicted field and is of discrete type. The predicted data set has an ID column. The input data does not contain null value. The algorithm will issue errors when encountering null values. The table used to store the tree model is a column table.
CREATEDTWITHCHAID
This function creates a decision tree from the input training data. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>', 'AFLPAL', 'CREATEDTWITHCHAID', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <PMML OUTPUT table type> Direction in in out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table.
SAP AG 2013
60
Signature Input Table Table Training / Historical Data Column Columns Column Data Type Varchar, char, integer, or double Description Table used to build the predictive tree model Constraint Discrete value: integer or varchar/char Continuous value: integer or double
Parameter Table Name MIN_NUMS_RECORDS PERCENTAGE IS_SPLIT_MODEL Data Type Integer Double Integer Description Controls the minimum training records in every leaf node. The default is zero. The percentage to be applied to determine the input training data set. Indicates whether the string of the tree model should be split. If the value does not equal zero, the tree model will be split, and the maximum length of each unit is 1k. THREAD_NUMBER CONTINUOUS_COL Integer Integer or double (optional) Number of threads. Defines which column needs discretization and the interval provided by the user. Column index starts from zero. The integer value specifies the column position. The double value specifies the interval. PMML_EXPORT Integer 0 (default): does not export PMML tree model. 1: exports PMML tree model in single row. 2: exports PMML tree model in several rows, each row containing a maximum of 5000 characters.
Output Tables Table Result (tree model) Column 1 column 2 column

nd st
Column Data Type Integer Varchar or CLOB
Description ID Tree model saved as a JSON string in the nd 2 column.
Constraint
The table must be a column table. The maximum length is 5000.
PMML Result
1 column 2 column
nd
st
ID CHAID decision tree model in PMML format
SAP AG 2013
61
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double, "CLASSLABEL" VARCHAR(50) ); DROP TYPE PAL_PMMLMODEL_T; CREATE TYPE PAL_PMMLMODEL_T AS TABLE( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); DROP TYPE PAL_JSONMODEL_T; CREATE TYPE PAL_JSONMODEL_T AS TABLE( "ID" INT, "JSONMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); --create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in');
SAP AG 2013
62
INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_JSONMODEL_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_PMMLMODEL_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_CREATEDT_WITH_CHAID', 'AFLPAL', 'CREATEDTWITHCHAID', PDATA); DROP TABLE PAL_TRAINING_TAB;
CREATE COLUMN TABLE PAL_TRAINING_TAB( "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double, "CLASSLABEL" VARCHAR(50) ); INSERT INTO PAL_TRAINING_TAB VALUES ('South', 'Winter', 100000, 'Good'); INSERT INTO PAL_TRAINING_TAB VALUES ('North', 'Spring', 45000, 'Average'); INSERT INTO PAL_TRAINING_TAB VALUES ('West', 'Summer', 30000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('East', 'Autumn', 5000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('West', 'Spring', 5000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('East', 'Spring', 200000, 'Good'); INSERT INTO PAL_TRAINING_TAB VALUES ('South', 'Summer', 25000, 'Poor'); INSERT INTO PAL_TRAINING_TAB VALUES ('South', 'Spring', 10000, 'Average'); INSERT INTO PAL_TRAINING_TAB VALUES ('North', 'Winter', 50000, 'Average'); DROP TABLE PAL_CONTROL_TAB; PAL_CONTROL_TAB(
CREATE COLUMN TABLE "INTARGS" INTEGER,
"NAME" VARCHAR (50), "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('PERCENTAGE',null,1.0,null); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('IS_SPLIT_MODEL',0,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('MIN_NUMS_RECORDS',1,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',2,25000,null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',2,60000,null); INSERT INTO PAL_CONTROL_TAB VALUES ('PMML_EXPORT',2,null,null); DROP TABLE PAL_JSONMODEL_TAB; CREATE COLUMN TABLE PAL_JSONMODEL_TAB( "ID" INT, "JSONMODEL" VARCHAR(5000)
SAP AG 2013
63
); DROP TABLE PAL_PMMLMODEL_TAB; CREATE COLUMN TABLE PAL_PMMLMODEL_TAB( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); CALL _SYS_AFL.PAL_CREATEDT_WITH_CHAID(PAL_TRAINING_TAB, PAL_CONTROL_TAB, PAL_JSONMODEL_TAB, PAL_PMMLMODEL_TAB) with overview; SELECT * FROM PAL_JSONMODEL_TAB; SELECT * FROM PAL_PMMLMODEL_TAB;
Expected Result PAL_JSONMODEL_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
64
PREDICTWITHDT
This function uses decision trees to perform prediction. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>', 'AFLPAL', 'PREDICTWITHDT', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Data INPUT table type> <PARAMETER table type> <Model INPUT table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<data input table>, <parameter table>, <model input table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Predicted Data Column 1 column Other columns 1 column Predictive Model 2 column
nd st st
Column Data Type Integer Varchar or char Integer Varchar
Description ID Data to be classified (predicted) ID Serialized tree model
Output Table Table Result Column 1 column 2 column

nd st
Column Data Type Integer Varchar or char
Description ID Predictive result
SAP AG 2013
65
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. -- Note: Before generating this model, make sure you have created the tree model using the CREATEDWITHCHAID function. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "ID" INT, "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double ); DROP TYPE PAL_JSONMODEL_T; CREATE TYPE PAL_JSONMODEL_T AS TABLE( "ID" INT, "JSONMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); DROP TYPE PAL_RESULT_T; CREATE TYPE PAL_RESULT_T AS TABLE( "ID" INT, "CLASSLABEL" VARCHAR(50) ); -- create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100),
SAP AG 2013
66
"DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_JSONMODEL_T', 'in'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_RESULT_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_PREDICTWITHDT', 'AFLPAL', 'PREDICTWITHDT', PDATA); DROP TABLE "ID" INT, "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double ); INSERT INTO PAL_DATA_TAB VALUES (0,'South', 'Autumn', 60000); INSERT INTO PAL_DATA_TAB VALUES (1,'North', 'Spring', 30000); INSERT INTO PAL_DATA_TAB VALUES (2,'South', 'Summer', 25000); INSERT INTO PAL_DATA_TAB VALUES (3,'West', 'Winter', 5000); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB ( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); DROP TABLE PAL_RESULT_TAB; CREATE TABLE PAL_RESULT_TAB( "ID" INT, "CLASSLABEL" VARCHAR(50) ); CALL _SYS_AFL.PAL_PREDICTWITHDT(PAL_DATA_TAB, PAL_CONTROL_TAB, PAL_JSONMODEL_TAB, PAL_RESULT_TAB) with overview; SELECT * FROM PAL_RESULT_TAB; PAL_DATA_TAB;
SAP AG 2013
67
SAP AG 2013
68
3.2.5
Exponential Regression
Exponential regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X. In exponential regression, data are modeled using exponential functions, and unknown model parameters are estimated from the data. Such models are called exponential models. In PAL, the implementation of exponential regression is to transform to linear regression and solve it:
y = 0 exp( 1 x1 + 2 x 2 + ... + n xn )
Where
0...n are parameters that need to be calculated.
The steps are: 1. Put natural logarithmic operation on both sides:
ln( y ) = ln( 0 exp( 1 x1 + 2 x 2 + ... + n xn ))

2. Transform it into: 3. Let
ln( y ) = ln( 0) + 1 x1 + 2 x 2 + ... + n xn
y ' = ln( y ) , 0' = ln( 0)
y ' = 0'+ 1 x1 + 2 x 2 + ... + n xn

Thus,
y ' and x1... xn is a linear relationship and can be solved using the linear regression method.
Prerequisites
No missing or null data in the inputs. The data is numeric, not categorical. Given the structure as Y and X1...Xn, there are more than n+1 records available for analysis.
SAP AG 2013
69
EXPREGRESSION
This is an exponential regression function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','EXPREGRESSION', <signature table>); The signature table should contain the following records: Index 1 2 3 4 5 6 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <Fitted OUTPUT table type> <Significance OUTPUT table type> <PMML OUTPUT table type> Direction in in out out out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <fitted output table>, <significance output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column Other columns
nd st
Description ID Variable y Variable Xn
SAP AG 2013
70
Parameter Table Name THREAD_NUMBER PMML_EXPORT Data Type Integer Integer Description Number of threads 0 (default): does not export exponential regression model in PMML. 1: exports exponential regression model in PMML in single row. 2: exports exponential regression model in PMML in several rows, each row containing a maximum of 5000 characters.

nd st
Description ID Value Ai A0: the intercept A1: the beta coefficient for X1 A2: the beta coefficient for X2
Constraint
Fitted Data
1 column 2 column
nd st
st
ID Value Yi Name Value ID Exponential regression model in PMML format (R^2 / F)
Significance
1 column 2 column
nd st
PMML Result
1 column 2 column
nd
SAP AG 2013
71
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE( "ID" INT,"Y" DOUBLE,"X1" DOUBLE, "X2" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP TYPE SIGNIFICANCE_T; CREATE TYPE SIGNIFICANCE_T AS TABLE("NAME" varchar(50),"VALUE" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE MODEL_T; CREATE TYPE MODEL_T AS TABLE("ID" INT,"Model" varchar(5000)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); insert into PDATA values (5,'DM_PAL.SIGNIFICANCE_T','out'); insert into PDATA values (6,'DM_PAL.MODEL_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palExpR','AFLPAL','EXPREGRESSION',PDATA); DROP TABLE #CONTROL_TAB;
SAP AG 2013
72
CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); INSERT INTO #CONTROL_TAB VALUES ('PMML_EXPORT',2,null,null); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DOUBLE); DATA_TAB ( "ID" INT,"Y" DOUBLE,"X1" DOUBLE, "X2"
INSERT INTO DATA_TAB VALUES (0,0.5,0.13,0.33); INSERT INTO DATA_TAB VALUES (1,0.15,0.14,0.34); INSERT INTO DATA_TAB VALUES (2,0.25,0.15,0.36); INSERT INTO DATA_TAB VALUES (3,0.35,0.16,0.35); INSERT INTO DATA_TAB VALUES (4,0.45,0.17,0.37); INSERT INTO DATA_TAB VALUES (5,0.55,0.18,0.38); INSERT INTO DATA_TAB VALUES (6,0.65,0.19,0.39); INSERT INTO DATA_TAB VALUES (7,0.75,0.19,0.31); INSERT INTO DATA_TAB VALUES (8,0.85,0.11,0.32); INSERT INTO DATA_TAB VALUES (9,0.95,0.12,0.33); DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" INT,"Ai" DOUBLE); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); DROP TABLE SIGNIFICANCE_TAB; CREATE COLUMN TABLE SIGNIFICANCE_TAB ("NAME" varchar(50),"VALUE" DOUBLE); DROP TABLE MODEL_TAB; CREATE COLUMN TABLE MODEL_TAB ("ID" INT, "PMMLMODEL" VARCHAR(5000)); CALL _SYS_AFL.palExpR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB, SIGNIFICANCE_TAB, MODEL_TAB) with overview; SELECT * FROM RESULTS_TAB; SELECT * FROM FITTED_TAB; SELECT * FROM SIGNIFICANCE_TAB; SELECT * FROM MODEL_TAB;
SAP AG 2013
73
FITTED_TAB:
SIGNIFICANCE_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
74
FORECASTWITHEXPR
This function performs prediction with the exponential regression result. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','FORECASTWITHEXPR', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Data INPUT table type> <Coefficient INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<data input table>, <coefficient input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Predictive Data Column 1 column Other columns Coefficient 1 column 2 column
nd st st
Description ID Variable Xn ID (start from 0) Value Ai

nd st
SAP AG 2013
75
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PREDICT_T; CREATE TYPE PREDICT_T AS TABLE( "ID" INT,"X1" DOUBLE, "X2" DOUBLE); DROP TYPE COEFFICIENT_T; CREATE TYPE COEFFICIENT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.PREDICT_T','in'); insert into PDATA values (2,'DM_PAL.COEFFICIENT_T','in'); insert into PDATA values (3,'DM_PAL.CONTROL_T','in'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palForecastWithExpR','AFLPAL','FORECASTWITHE XPR',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); DROP TABLE PREDICTDATA_TAB; CREATE COLUMN TABLE PREDICTDATA_TAB ("ID" INT,"X1" DOUBLE, "X2" DOUBLE); INSERT INTO PREDICTDATA_TAB VALUES (0,0.5,0.3); INSERT INTO PREDICTDATA_TAB VALUES (1,4,0.4); INSERT INTO PREDICTDATA_TAB VALUES (2,0,1.6);
SAP AG 2013
76
INSERT INTO PREDICTDATA_TAB VALUES (3,0.3,0.45); INSERT INTO PREDICTDATA_TAB VALUES (4,0.4,1.7); DROP TABLE COEEFICIENT_TAB; CREATE COLUMN TABLE COEEFICIENT_TAB ("ID" INT,"Ai" DOUBLE); INSERT INTO COEEFICIENT_TAB VALUES (0,1.7120914258645001); INSERT INTO COEEFICIENT_TAB VALUES (1,0.2652771198483208); INSERT INTO COEEFICIENT_TAB VALUES (2,-3.471103742302148); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); CALL _SYS_AFL.palForecastWithExpR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB", FITTED_TAB) with overview; SELECT * FROM FITTED_TAB;
SAP AG 2013
77
3.2.6
KNN
K-Nearest Neighbor (KNN) is a machine learning algorithm for classifying objects based on learning by analogy, that is, comparing a given tuple with similar training tuples. The training tuples are described by n attributes, each tuple representing a point in an n-dimensional space. All the training tuples are stored in an n-dimensional pattern space. Once there is an unknown tuple, the KNN method searches the pattern space for the k training tuples that are closest to the unknown tuple. These k training tuples are the k nearest neighbors of the unknown tuple.
Prerequisites
The first column of the training data and input data is an ID column. The second column of the training data is of class type. The class type column is of integer type. Other data columns are of integer or double type. The input data does not contain null value.
KNN
This is a classification function using the KNN algorithm. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','KNN', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Training INPUT table type> <Class INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<training input table>, <class input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table.
SAP AG 2013
78
Signature Input Tables Table Training Data Column 1 column 2 column Other columns Class Data 1 column Other columns
st nd st
Column Data Type Integer or varchar Integer Integer or double Integer or varchar Integer or double
Description ID Class type Attribute data ID Attribute data
Parameter Table Name K_NEAREST_NEIGHBOURS ATTRIBUTE_NUM VOTING_TYPE Data Type Integer Integer Integer Description Number of nearest neighbors (k) Number of attributes Voting type: THREAD_NUMBER Integer 0 = majority voting 1 = distance-weighted voting
Number of threads
Output Table Table Result Column 1 column 2 column Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE( "ID" INT,"TYPE" INT,"X1" DOUBLE, "X2" DOUBLE); DROP TYPE CLASSDATA_T; CREATE TYPE CLASSDATA_T AS TABLE( "ID" INT,"X1" DOUBLE, "X2" DOUBLE); DROP TYPE RESULT_T;
nd st
Description ID Class type
SAP AG 2013
79
CREATE TYPE RESULT_T AS TABLE("ID" INT,"Type" INT); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CLASSDATA_T','in'); insert into PDATA values (3,'DM_PAL.CONTROL_T','in'); insert into PDATA values (4,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palKNN','AFLPAL','KNN',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('K_NEAREST_NEIGHBOURS',3,null,null); INSERT INTO #CONTROL_TAB VALUES ('ATTRIBUTE_NUM',2,null,null); INSERT INTO #CONTROL_TAB VALUES ('VOTING_TYPE',0,null,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ( "ID" INT,"TYPE" INT,"X1" DOUBLE, "X2" DOUBLE); INSERT INTO DATA_TAB VALUES (0,2,1,1); INSERT INTO DATA_TAB VALUES (1,3,10,10); INSERT INTO DATA_TAB VALUES (2,3,10,11); INSERT INTO DATA_TAB VALUES (3,3,10,10); INSERT INTO DATA_TAB VALUES (4,1,1000,1000); INSERT INTO DATA_TAB VALUES (5,1,1000,1001); INSERT INTO DATA_TAB VALUES (6,1,1000,999); INSERT INTO DATA_TAB VALUES (7,1,999,999); INSERT INTO DATA_TAB VALUES (8,1,999,1000); INSERT INTO DATA_TAB VALUES (9,1,1000,1000); DROP TABLE CLASSDATA_TAB; CREATE COLUMN TABLE CLASSDATA_TAB ( "ID" INT,"X1" DOUBLE, "X2" DOUBLE); INSERT INTO CLASSDATA_TAB VALUES (0,2,1); INSERT INTO CLASSDATA_TAB VALUES (1,9,10);
SAP AG 2013
80
INSERT INTO CLASSDATA_TAB VALUES (2,9,11); INSERT INTO CLASSDATA_TAB VALUES (3,15000,15000); INSERT INTO CLASSDATA_TAB VALUES (4,1000,1000); INSERT INTO CLASSDATA_TAB VALUES (5,500,1001); INSERT INTO CLASSDATA_TAB VALUES (6,500,999); INSERT INTO CLASSDATA_TAB VALUES (7,199,999); DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" INT,"Type" INT); CALL _SYS_AFL.palKNN(DATA_TAB, CLASSDATA_TAB, "#CONTROL_TAB", RESULTS_TAB) with overview; SELECT * FROM RESULTS_TAB;
SAP AG 2013
81
3.2.7
Multiple Linear Regression
Linear regression is an approach to modeling the linear relationship between a variable Y, usually referred to as the dependent variable, and one or more variables, usually referred to as independent 1 2 3 variables, denoted X , X , X In linear regression, data are modeled using linear functions, and unknown model parameters are estimated from the data. Such models are called linear models. According to linear least-squares estimation, linear regression is to solve the following equation:
( AT A) X = ( AT y )
Where
A is MxN matrix, x is Nx1 matrix, and y is Mx1 matrix.
The implementation also supports calculating F and R^2 to determine statistical significance.
Prerequisites
LRREGRESSION
This is a multiple linear regression function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','LRREGRESSION', <signature table>); The signature table should contain the following records: Index 1 2 3 4 5 6 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <Fitted OUTPUT table type> <Significance OUTPUT table type> <PMML OUTPUT table type> Direction in in out out out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <fitted output table>, <significance output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. SAP AG 2013 82
Signature Input Table Table Data Column 1 column 2 column Other columns
nd st
Description ID Variable y Variable Xn
Parameter Table Name VARIABLE_NUM THREAD_NUMBER PMML_EXPORT Data Type Integer Integer Integer Description Number of variable X. Number of threads. 0 (default): does not export multiple linear regression model in PMML. 1: exports multiple linear regression model in PMML in single row. 2: exports multiple linear regression model in PMML in several rows, each row containing a maximum of 5000 characters.

nd st
Description ID Value Ai
Constraint
A0: the intercept A1: the beta coefficient for X1 A2: the beta coefficient for X2
...
Fitted Data 1 column 2 column Significance 1 column 2 column PMML Result 1 column 2 column
nd st nd st nd st
ID Value Yi Name Value ID Multiple linear regression model in PMML format (R^2 / F)
SAP AG 2013
83
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE( "ID" INT,"Y" DOUBLE,"X1" DOUBLE, "X2" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP TYPE SIGNIFICANCE_T; CREATE TYPE SIGNIFICANCE_T AS TABLE("NAME" varchar(50),"VALUE" DOUBLE); DROP TYPE MODEL_T; CREATE TYPE MODEL_T AS TABLE("ID" INT,"Model" varchar(5000)); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); insert into PDATA values (5,'DM_PAL.SIGNIFICANCE_T','out'); insert into PDATA values (6,'DM_PAL.MODEL_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palLR','AFLPAL','LRREGRESSION',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100));
SAP AG 2013
84
INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); INSERT INTO #CONTROL_TAB VALUES ('PMML_EXPORT',0,null,null); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DOUBLE); DATA_TAB ( "ID" INT,"Y" DOUBLE,"X1" DOUBLE, "X2"
INSERT INTO DATA_TAB VALUES (0,0.5,0.13,0.33); INSERT INTO DATA_TAB VALUES (1,0.15,0.14,0.34); INSERT INTO DATA_TAB VALUES (2,0.25,0.15,0.36); INSERT INTO DATA_TAB VALUES (3,0.35,0.16,0.35); INSERT INTO DATA_TAB VALUES (4,0.45,0.17,0.37); INSERT INTO DATA_TAB VALUES (5,0.55,0.18,0.38); INSERT INTO DATA_TAB VALUES (6,0.65,0.19,0.39); INSERT INTO DATA_TAB VALUES (7,0.75,0.19,0.31); INSERT INTO DATA_TAB VALUES (8,0.85,0.11,0.32); INSERT INTO DATA_TAB VALUES (9,0.95,0.12,0.33); DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" INT,"Ai" DOUBLE); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); DROP TABLE SIGNIFICANCE_TAB; CREATE COLUMN TABLE SIGNIFICANCE_TAB ("NAME" varchar(50),"VALUE" DOUBLE); DROP TABLE PAL_PMMLMODEL_TAB; CREATE COLUMN TABLE PAL_PMMLMODEL_TAB ("ID" INT, "PMMLMODEL" VARCHAR(5000)); CALL _SYS_AFL.palLR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB, SIGNIFICANCE_TAB, PAL_PMMLMODEL_TAB) with overview; SELECT * FROM RESULTS_TAB; SELECT * FROM FITTED_TAB; SELECT * FROM SIGNIFICANCE_TAB;
SAP AG 2013
85
FITTED_TAB:
SIGNIFICANCE_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
86
FORECASTWITHLR
This function performs prediction with the linear regression result. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','FORECASTWITHLR', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Data INPUT table type> <Coefficient INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<data input table>, <coefficient input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Predictive Data Column 1 column Other columns Coefficient 1 column 2 column Parameter Table Name VARIABLE_NUM THREAD_NUMBER Output Table Table Fitted Result Column 1 column 2 column
nd st nd st st
Column Data Type Integer or Varchar Integer or double Integer Integer or double
Description ID Variable Xn ID (start from 0) Value Ai
Description Number of variable X Number of threads
SAP AG 2013
87
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PREDICT_T; CREATE TYPE PREDICT_T AS TABLE( "ID" INT,"X1" DOUBLE, "X2" DOUBLE); DROP TYPE COEFFICIENT_T; CREATE TYPE COEFFICIENT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.PREDICT_T','in'); insert into PDATA values (2,'DM_PAL.COEFFICIENT_T','in'); insert into PDATA values (3,'DM_PAL.CONTROL_T','in'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palForecastWithLR','AFLPAL','FORECASTWITHLR' ,PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); DROP TABLE PREDICTDATA_TAB; CREATE COLUMN TABLE PREDICTDATA_TAB ("ID" INT,"X1" DOUBLE, "X2" DOUBLE); INSERT INTO PREDICTDATA_TAB VALUES (0,0.5,0.3); INSERT INTO PREDICTDATA_TAB VALUES (1,4,0.4); INSERT INTO PREDICTDATA_TAB VALUES (2,0,1.6);
SAP AG 2013
88
INSERT INTO PREDICTDATA_TAB VALUES (3,0.3,0.45); INSERT INTO PREDICTDATA_TAB VALUES (4,0.4,1.7); DROP TABLE COEEFICIENT_TAB; CREATE COLUMN TABLE COEEFICIENT_TAB ("ID" INT,"Ai" DOUBLE); INSERT INTO COEEFICIENT_TAB VALUES (0,1.7120914258645001); INSERT INTO COEEFICIENT_TAB VALUES (1,0.2652771198483208); INSERT INTO COEEFICIENT_TAB VALUES (2,-3.471103742302148); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); CALL _SYS_AFL.palForecastWithLR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB", FITTED_TAB) with overview; SELECT * FROM FITTED_TAB;
SAP AG 2013
89
3.2.8
Polynomial Regression
Polynomial regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X. In polynomial regression, data are modeled using polynomial functions, and unknown model parameters are estimated from the data. Such models are called polynomial models. In PAL, the implementation of exponential regression is to transform to linear regression and solve it:
y = 0 + 1 x + 2 x 2 + ... + n x n
Where Let
0...n
are parameters that need to be calculated.
x = x1' , x 2 = x 2' ,..., x n = xn ' , and then
y ' = 0'+ 1 x1 + 2 x 2 + ... + n xn

So,
y ' and x1... xn is a linear relationship and can be solved using the linear regression method.
Prerequisites
SAP AG 2013
90
POLYNOMIALREGRESSION
This is a polynomial regression function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL', 'POLYNOMIALREGRESSION', <signature table>); The signature table should contain the following records: Index 1 2 3 4 5 6 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <Fitted OUTPUT table type> <Significance OUTPUT table type> <PMML OUTPUT table type> Direction in in out out out out
Procedure Calling CALL <procedure name> (<input table>, <parameter table>, <result output table>, <fitted output table>, <significance output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column 3 column
rd nd st
Description ID Variable y Variable X
SAP AG 2013
91
Parameter Table Name THREAD_NUMBER PMML_EXPORT Data Type Integer Integer Description Number of threads. 0 (default): does not export polynomial regression model in PMML. 1: exports polynomial regression model in PMML in single row. 2: exports polynomial regression model in PMML in several rows, each row containing a maximum of 5000 characters.

nd st
Description ID Value Ai
Constraint
A0: the intercept A1: the beta coefficient for X1 A2: the beta coefficient for X2 ...
Fitted Data 1 column 2 column Significance 1 column 2 column PMML Result 1 column 2 column
nd st nd st nd st
Integer or varchar Integer or double VARCHAR/CHAR Double Integer CLOB or varchar
ID Value Yi Name Value ID Polynomial regression model in PMML format (R^2 / F)
SAP AG 2013
92
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE( "ID" INT,"Y" DOUBLE,"X1" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP TYPE SIGNIFICANCE_T; CREATE TYPE SIGNIFICANCE_T AS TABLE("NAME" varchar(50),"VALUE" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE MODEL_T; CREATE TYPE MODEL_T AS TABLE("ID" INT,"Model" varchar(5000)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); insert into PDATA values (5,'DM_PAL.SIGNIFICANCE_T','out'); insert into PDATA values (6,'DM_PAL.MODEL_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palPolynomialR','AFLPAL','POLYNOMIALREGRESSI ON',PDATA); DROP TABLE #CONTROL_TAB;
SAP AG 2013
93
CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('VARIABLE_NUM',3,null,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); INSERT INTO #CONTROL_TAB VALUES ('PMML_EXPORT',2,null,null); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ( "ID" INT,"Y" DOUBLE,"X1" DOUBLE); INSERT INTO DATA_TAB VALUES (0,5,1); INSERT INTO DATA_TAB VALUES (1,20,2); INSERT INTO DATA_TAB VALUES (2,43,3); INSERT INTO DATA_TAB VALUES (3,89,4); INSERT INTO DATA_TAB VALUES (4,166,5); INSERT INTO DATA_TAB VALUES (5,247,6); INSERT INTO DATA_TAB VALUES (6,403,7); DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" INT,"Ai" DOUBLE); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); DROP TABLE SIGNIFICANCE_TAB; CREATE COLUMN TABLE SIGNIFICANCE_TAB ("NAME" varchar(50),"VALUE" DOUBLE); DROP TABLE MODEL_TAB; CREATE COLUMN TABLE MODEL_TAB ("ID" INT, "PMMLMODEL" VARCHAR(5000));
CALL _SYS_AFL.palPolynomialR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB, SIGNIFICANCE_TAB, MODEL_TAB) with overview; SELECT * FROM RESULTS_TAB; SELECT * FROM FITTED_TAB; SELECT * FROM SIGNIFICANCE_TAB; SELECT * FROM MODEL_TAB;
SAP AG 2013
94
FITTED_TAB:
SIGNIFICANCE_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
95
FORECASTWITHPOLYNOMIALR
This function performs prediction with the polynomial regression result. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL', 'FORECASTWITHPOLYNOMIALR', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Data INPUT table type> <Coefficient INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<data input table>, <coefficient input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Tables Table Predictive Data Column 1 column 2 column Coefficient 1 column 2 column
nd st nd st

nd st
SAP AG 2013
96
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PREDICT_T; CREATE TYPE PREDICT_T AS TABLE( "ID" INT,"X1" DOUBLE); DROP TYPE COEFFICIENT_T; CREATE TYPE COEFFICIENT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.PREDICT_T','in'); insert into PDATA values (2,'DM_PAL.COEFFICIENT_T','in'); insert into PDATA values (3,'DM_PAL.CONTROL_T','in'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palForecastWithPolynomialR','AFLPAL','FORECA STWITHPOLYNOMIALR',PDATA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('VARIABLE_NUM',3,null,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); DROP TABLE PREDICTDATA_TAB; CREATE COLUMN TABLE PREDICTDATA_TAB ( "ID" INT,"X1" DOUBLE); INSERT INTO PREDICTDATA_TAB VALUES (0,0.3); INSERT INTO PREDICTDATA_TAB VALUES (1,4.0);
SAP AG 2013
97
INSERT INTO PREDICTDATA_TAB VALUES (2,1.6); INSERT INTO PREDICTDATA_TAB VALUES (3,0.45); INSERT INTO PREDICTDATA_TAB VALUES (4,1.7); DROP TABLE COEEFICIENT_TAB; CREATE COLUMN TABLE COEEFICIENT_TAB ("ID" INT,"Ai" DOUBLE); INSERT INTO COEEFICIENT_TAB VALUES (0,4.0); INSERT INTO COEEFICIENT_TAB VALUES (1,3.0); INSERT INTO COEEFICIENT_TAB VALUES (2,2.0); INSERT INTO COEEFICIENT_TAB VALUES (3,1.0); DROP TABLE FITTED_TAB; CREATE COLUMN TABLE FITTED_TAB ("ID" INT,"Fitted" DOUBLE); CALL _SYS_AFL.palForecastWithPolynomialR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB", FITTED_TAB) with overview; SELECT * FROM FITTED_TAB;
SAP AG 2013
98
3.2.9
Logistic Regression
Logistic regression is a prediction approach similar to Ordinary Least Squares (OLS) regression, but logistic regression can be used to predict a dichotomous outcome. Logistic regression allows you to predict a discrete outcome, such as group membership, from a set of variables that are continuous, discrete, dichotomous, or a mix of any of these. Generally, the dependent or response variable is dichotomous, such as presence/absence or success/failure. Discriminant analysis is also used to predict group membership with only two groups, but only from continuous independent variables. Thus, when independent variables are categorical, or a mix of continuous and categorical, logistic regression is preferred. A simple logistic regression function can be defined by the formula:
p (t ) = 1 /(1 + e t )
In PAL, the logistic regression model is made by:
h ( x ) = g ( T x ) = 1 /(1 + exp( T x ))
Where T x = 0 x0 + 1 x1 + ... + n xn
Assuming that there are only two class labels, {0,1}, you can get the below formula:
P( y = 1 | x; ) = h ( x ) P ( y = 0 | x; ) = 1 h ( x )
And merge them into:
P( y | x; ) = h ( x ) y (1 h ( x ))1 y
Where
0 , 1 , , n
are regression coefficients and their values can be obtained through the
Maximum Likelihood Estimation (MLE) method. The log likelihood function is:
L( ) = ln( L( )) = ln( i =1 p( y ( i ) | x ( i ) ; ))
m
= ln( i =1 ( h ( x ( i ) )) y (1 h ( x ( i ) ))1 y )
m
(i) (i)
= i =1 ln(( h ( x ( i ) )) y (1 h ( x ( i ) ))1 y )
m
(i) (i)
= i =1 ( y ( i ) ln(h ( x ( i ) )) + (1 y ( i ) ) ln(1 h ( x ( i ) )))

m
The maximum value of the function can be obtained through the method of Newton iteration or Stochastic Gradient Ascent. You can choose a threshold by random and then iterate the formula until the result of new value subtracting old value is smaller than this threshold. The formula is:
j := j + ( y ( i ) h ( x ( i ) )) x j ( i )
Prerequisites
No missing or null data in inputs. Data is numeric, not categorical. Given the structure as Y and X1...Xn, there must be more than n+1 records available for analysis.
SAP AG 2013
99
LOGISTICREGRESSION
This is a logistic regression function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL', 'LOGISTICREGRESSION', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <PMML OUTPUT table type> Direction in in out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <PMML output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column Columns Type column Column Data Type Integer or double Integer Description Variable Xn Variable TYPE Only 0 and 1 are supported Constraint
Parameter Table Name VARIABLE_NUM METHOD Data Type Integer Integer Description Number of variable X. STEP_SIZE EXIT_THRESHOLD THREAD_NUMBER Double Double Integer 0 (recommended): uses the Newton iteration method. 1: uses the gradient-decent method.
Step size for convergence. This parameter is used only when METHOD is 1. Threshold (actual value) for exiting the iterations. Number of threads. Note: It is recommended to specify this parameter to a value equal to or greater than 4.
SAP AG 2013
100
Name MAX_ITERATION PMML_EXPORT
Description Maximum number of iterations. 0 (default): does not export logistic regression model in PMML. 1: exports logistic regression model in PMML in single row. 2: exports logistic regression model in PMML in several rows, each row containing a maximum of 5000 characters.

nd st
Description ID Value Ai A0: intercept A1: beta coefficient for X1 A2: beta coefficient for X2
PMML Result (logistic regression model)
1 column 2 column
nd
st
ID Logistic regression model in PMML format
SAP AG 2013
101
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE("X1" DOUBLE,"X2" DOUBLE,"TYPE" INT); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE( "Name" VARCHAR (50),"intArgs" INTEGER,"doubleArgs" DOUBLE,"stringArgs" VARCHAR (100)); DROP TYPE PAL_PMMLMODEL_T; CREATE TYPE PAL_PMMLMODEL_T AS TABLE( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); insert into PDATA values (4,'DM_PAL.PAL_PMMLMODEL_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palLogisticR','AFLPAL','LOGISTICREGRESSION', PDATA); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ("X1" DOUBLE,"X2"DOUBLE,"TYPE" INT); INSERT INTO DATA_TAB VALUES (110,2.62,1); INSERT INTO DATA_TAB VALUES (110,2.875,1); INSERT INTO DATA_TAB VALUES (93,2.32,1); INSERT INTO DATA_TAB VALUES (110,3.215,0);
SAP AG 2013
102
INSERT INTO DATA_TAB VALUES (175,3.44,0); INSERT INTO DATA_TAB VALUES (105,3.46,0); INSERT INTO DATA_TAB VALUES (245,3.57,0); INSERT INTO DATA_TAB VALUES (62,3.19,0); INSERT INTO DATA_TAB VALUES (95,3.15,0); INSERT INTO DATA_TAB VALUES (123,3.44,0); INSERT INTO DATA_TAB VALUES (123,3.44,0); INSERT INTO DATA_TAB VALUES (180,4.07,0); INSERT INTO DATA_TAB VALUES (180,3.73,0); INSERT INTO DATA_TAB VALUES (180,3.78,0); INSERT INTO DATA_TAB VALUES (205,5.25,0); INSERT INTO DATA_TAB VALUES (215,5.424,0); INSERT INTO DATA_TAB VALUES (230,5.345,0); INSERT INTO DATA_TAB VALUES (66,2.2,1); INSERT INTO DATA_TAB VALUES (52,1.615,1); INSERT INTO DATA_TAB VALUES (65,1.835,1); INSERT INTO DATA_TAB VALUES (97,2.465,0); INSERT INTO DATA_TAB VALUES (150,3.52,0); INSERT INTO DATA_TAB VALUES (150,3.435,0); INSERT INTO DATA_TAB VALUES (245,3.84,0); INSERT INTO DATA_TAB VALUES (175,3.845,0); INSERT INTO DATA_TAB VALUES (66,1.935,1); INSERT INTO DATA_TAB VALUES (91,2.14,1); INSERT INTO DATA_TAB VALUES (113,1.513,1); INSERT INTO DATA_TAB VALUES (264,3.17,1); INSERT INTO DATA_TAB VALUES (175,2.77,1); INSERT INTO DATA_TAB VALUES (335,3.57,1); INSERT INTO DATA_TAB VALUES (109,2.78,1); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ( "Name" VARCHAR (50), "intArgs" INTEGER, "doubleArgs" DOUBLE, "stringArgs" VARCHAR (100)); INSERT INTO #CONTROL_TAB VALUES ('VARIABLE_NUM',2,null,null); INSERT INTO #CONTROL_TAB VALUES ('EXIT_THRESHOLD',null,0.00001,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); INSERT INTO #CONTROL_TAB VALUES ('MAX_ITERATION',80,null,null); INSERT INTO #CONTROL_TAB VALUES ('PMML_EXPORT', 1, null, null); INSERT INTO #CONTROL_TAB VALUES ('METHOD', 0, null, null);
DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" INT,"Ai" DOUBLE);
SAP AG 2013
103
DROP TABLE PAL_PMMLMODEL_TAB; CREATE COLUMN TABLE PAL_PMMLMODEL_TAB( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); CALL _SYS_AFL.palLogisticR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, PAL_PMMLMODEL_TAB) with overview; SELECT * FROM RESULTS_TAB; SELECT * FROM PAL_PMMLMODEL_TAB; Expected Result RESULTS_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
104
FORECASTWITHLOGISTICR
This function performs predication with logistic regression result. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL', 'FORECASTWITHLOGISTICR', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <Data INPUT table type> <PARAMETER table type> <Coefficient INPUT table type> <OUTPUT table type> Direction in in in out
Procedure Calling CALL <procedure name>(<data input table>, <parameter table>, input table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Predictive Data Column 1 column Other columns Coefficient 1 column 2 column
nd st st
<coefficient
Column Data Type Integer Integer or double Integer Integer or double
Description ID Variable Xn ID Value Ai
Parameter Table Name VARIABLE_NUM THREAD_NUMBER Data Type Integer Integer Description Number of variable X Number of threads
SAP AG 2013
105
Output Table Table Fitted Result Column 1 column 2 column 3 column Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. set schema DM_PAL; DROP TYPE PREDICT_T; CREATE TYPE PREDICT_T AS TABLE("ID" INT,"X1" DOUBLE,"X2" DOUBLE); DROP TYPE COEFFICIENT_T; CREATE TYPE COEFFICIENT_T AS TABLE("ID" INT,"Ai" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE FITTED_T; CREATE TYPE FITTED_T AS TABLE("ID" INT,"Fitted" DOUBLE,"TYPE" INT); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.PREDICT_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.COEFFICIENT_T','in'); insert into PDATA values (4,'DM_PAL.FITTED_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palForecastWithLogisticR','AFLPAL','FORECAST WITHLOGISTICR',PDATA); DROP TABLE PREDICTDATA_TAB; CREATE COLUMN TABLE PREDICTDATA_TAB ( "ID" INT,"X1" DOUBLE, "X2" DOUBLE); INSERT INTO PREDICTDATA_TAB VALUES (0,120,2.8); INSERT INTO PREDICTDATA_TAB VALUES (1,110,2.875);
rd nd st
Column Data Type Integer Integer or double Integer
Description ID Value Yi TYPE
SAP AG 2013
106
INSERT INTO PREDICTDATA_TAB VALUES (2,93,2.32); INSERT INTO PREDICTDATA_TAB VALUES (3,110,3.215); INSERT INTO PREDICTDATA_TAB VALUES (4,175,3.44); INSERT INTO PREDICTDATA_TAB VALUES (5,105,3.46); INSERT INTO PREDICTDATA_TAB VALUES (6,245,3.57); INSERT INTO PREDICTDATA_TAB VALUES (7,62,3.19); INSERT INTO PREDICTDATA_TAB VALUES (8,95,3.15); INSERT INTO PREDICTDATA_TAB VALUES (9,123,3.44); INSERT INTO PREDICTDATA_TAB VALUES (10,123,3.44); INSERT INTO PREDICTDATA_TAB VALUES (11,180,4.07); INSERT INTO PREDICTDATA_TAB VALUES (12,180,3.73); INSERT INTO PREDICTDATA_TAB VALUES (13,180,3.78); INSERT INTO PREDICTDATA_TAB VALUES (14,205,5.25); INSERT INTO PREDICTDATA_TAB VALUES (15,215,5.424); INSERT INTO PREDICTDATA_TAB VALUES (16,230,5.345); INSERT INTO PREDICTDATA_TAB VALUES (17,66,2.2); INSERT INTO PREDICTDATA_TAB VALUES (18,52,1.615); INSERT INTO PREDICTDATA_TAB VALUES (19,65,1.835); INSERT INTO PREDICTDATA_TAB VALUES (20,97,2.465); INSERT INTO PREDICTDATA_TAB VALUES (21,150,3.52); INSERT INTO PREDICTDATA_TAB VALUES (22,150,3.435); INSERT INTO PREDICTDATA_TAB VALUES (23,245,3.84); INSERT INTO PREDICTDATA_TAB VALUES (24,175,3.845); INSERT INTO PREDICTDATA_TAB VALUES (25,66,1.935); INSERT INTO PREDICTDATA_TAB VALUES (26,91,2.14); INSERT INTO PREDICTDATA_TAB VALUES (27,113,1.513); INSERT INTO PREDICTDATA_TAB VALUES (28,264,3.17); INSERT INTO PREDICTDATA_TAB VALUES (29,175,2.77); INSERT INTO PREDICTDATA_TAB VALUES (30,335,3.57); INSERT INTO PREDICTDATA_TAB VALUES (31,109,2.78); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('VARIABLE_NUM',2,null,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null); DROP TABLE COEEFICIENT_TAB; CREATE COLUMN TABLE COEEFICIENT_TAB ("ID" INT,"Ai" DOUBLE); INSERT INTO COEEFICIENT_TAB VALUES (0, 18.866298717199392); INSERT INTO COEEFICIENT_TAB VALUES (1, 0.03625559608220791); INSERT INTO COEEFICIENT_TAB VALUES (2, -8.08347518244258); DROP TABLE FITTED_TAB;
SAP AG 2013
107
CREATE COLUMN TABLE FITTED_TAB ("ID" INT, "Fitted" DOUBLE,"TYPE" INT); CALL _SYS_AFL.palForecastWithLogisticR(PREDICTDATA_TAB, "#CONTROL_TAB", COEEFICIENT_TAB, FITTED_TAB) with overview; SELECT * FROM FITTED_TAB; Expected Result FITTED_TAB:
SAP AG 2013
108
3.3 3.3.1
Association Algorithms Apriori
Apriori is a classic predictive analysis algorithm for finding association rules used in association analysis. Association analysis uncovers the hidden patterns, correlations or casual structures among a set of items or objects. For example, association analysis enables you to understand what products and services customers tend to purchase at the same time. By analyzing the purchasing trends of your customers with association analysis, you can predict their future behavior. Apriori is designed to operate on databases containing transactions. As is common in association rule mining, given a set of items, the algorithm attempts to find subsets which are common to at least a minimum number of the item sets. Apriori uses a bottom up approach, where frequent subsets are extended one item at a time, a step known as candidate generation, and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k-1, and then prunes the candidates which have an infrequent sub pattern. The candidate set contains all frequent k -length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates. The Apriori function in PAL uses vertical data format to store the transaction data in memory. The function can take varchar/char or integer transaction ID and item ID as input. It supports the output of confidence, support, and lift value, but does not limit the number of output rules. However, you can use SQL script to select the number of output rules, for example: SELECT TOP 2000 FROM RULE_RESULTS where lift > 0.5
Prerequisite
The input data does not contain null value.
SAP AG 2013
109
APRIORIRULE
This function reads input transaction data and generates association rules by the Apriori algorithm. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','APRIORIRULE', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <PMML OUTPUT table type> Direction in in out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <PMML output table>) WITH overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Dataset/ Historical Data Column 1 column Item column
st
Column Data Type Integer, varchar, or char Integer, varchar, or char
Description Transaction ID Item ID
Parameter Table Name MIN_SUPPORT MIN_CONFIDENCE THREAD_NUMBER MAXITEMLENGTH PMML_EXPORT Data Type Double Double Integer Integer Integer Description User-specified minimum support (actual value). User-specified minimum confidence (actual value). Number of threads. Total length of leading items and dependent items in the output. The default is 10. 0 (default): does not export Apriori model in PMML. 1: exports Apriori model in PMML in single row. 2: exports Apriori model in PMML in several rows, each row containing a maximum of 5000 characters.
SAP AG 2013
110
Output Tables Table Result Column 1 column 2 column 3 column 4 column 5 column PMML Result 1 column 2 column
nd st th th rd nd st
Column Data Type Varchar or char Varchar or char Double Double Double Integer CLOB or varchar
Description Leading items Dependent items Support value Confidence value Lift value ID Apriori model in PMML format
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "CUSTOMER" INT, "ITEM" VARCHAR(20) ); DROP TYPE PAL_RESULT_T; CREATE TYPE PAL_RESULT_T AS TABLE( "PRERULE" VARCHAR(500), "POSTRULE" VARCHAR(500), "SUPPORT" DOUBLE, "CONFIDENCE" DOUBLE, "LIFT" DOUBLE ); DROP TYPE PAL_PMMLMODEL_T; CREATE TYPE PAL_PMMLMODEL_T AS TABLE( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50),
SAP AG 2013
111
"INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_RESULT_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_PMMLMODEL_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_APRIORI_RULE', 'AFLPAL', 'APRIORIRULE', PDATA); DROP TABLE PAL_TRANS_TAB; CREATE COLUMN TABLE PAL_TRANS_TAB( "CUSTOMER" INT, "ITEM" VARCHAR(20) ); INSERT INTO PAL_TRANS_TAB VALUES (2, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (2, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (3, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (3, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (3, 'item4'); INSERT INTO PAL_TRANS_TAB VALUES (4,'item1'); INSERT INTO PAL_TRANS_TAB VALUES (4,'item3'); INSERT INTO PAL_TRANS_TAB VALUES (5, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (5, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (6, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (6, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (0, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (0, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (0, 'item5'); INSERT INTO PAL_TRANS_TAB VALUES (1, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (1, 'item4'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item5'); INSERT INTO PAL_TRANS_TAB VALUES (8, 'item1');
SAP AG 2013
112
INSERT INTO PAL_TRANS_TAB VALUES (8, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (8, 'item3'); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER', 2, null, null); INSERT INTO PAL_CONTROL_TAB VALUES ('MIN_SUPPORT', null, 0.2, null); INSERT INTO PAL_CONTROL_TAB VALUES ('MIN_CONFIDENCE', null, 0.4, null); DROP TABLE PAL_RESULT_TAB; CREATE COLUMN TABLE PAL_RESULT_TAB( "PRERULE" VARCHAR(500), "POSTRULE" VARCHAR(500), "SUPPORT" Double, "CONFIDENCE" Double, "LIFT" DOUBLE ); DROP TABLE PAL_PMMLMODEL_TAB; CREATE COLUMN TABLE PAL_PMMLMODEL_TAB( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); CALL _SYS_AFL.PAL_APRIORI_RULE(PAL_TRANS_TAB, PAL_CONTROL_TAB, PAL_RESULT_TAB, PAL_PMMLMODEL_TAB) WITH overview; SELECT * FROM PAL_RESULT_TAB; SELECT * FROM PAL_PMMLMODEL_TAB;
SAP AG 2013
113
Expected Result: PAL_RESULT_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
114
LITEAPRIORIRULE
This is a light association rule mining algorithm to realize the Apriori algorithm. It only calculates two large item sets. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','LITEAPRIORIRULE', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <PMML OUTPUT table type> Direction in in out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <PMML output table>) WITH overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Dataset/ Historical Data Column 1 column 2 column
nd st
Column Data Type Integer, varchar, or char Integer, varchar, or char
Description Transaction ID Item ID
SAP AG 2013
115
Parameter Table Name MIN_SUPPORT MIN_CONFIDENCE THREAD_NUMBER OPTIMIZATION_TYPE Data Type Double Double Integer Integer or double Description User-specified minimum support (actual value). User-specified minimum confidence (actual value). Number of threads. If you want to use the entire data, set it to 0. If you want to sample the source input data, specify a double value as the sampling percentage. If you use the sampling data, this parameter indicates whether to calculate the precise result. The setting 0 represents NOT to recalculate the precise result. 0 (default): does not export liteApriori model in PMML. 1: exports liteApriori model in PMML in single row. 2: exports liteApriori model in PMML in several rows, each row containing a maximum of 5000 characters.
IS_RECALCULATE
Integer
PMML_EXPORT
Integer
Output Tables Table Result Column 1 column 2 column 3 column 4 column 5 column PMML Result 1 column 2 column
nd st th th rd nd st
Column Data Type Varchar or char Varchar or char Double Double Double Integer CLOB or varchar
Description Leading items Dependent items Support value Confidence value Lift value ID liteApriori model in PMML format
SAP AG 2013
116
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "CUSTOMER" INT, "ITEM" VARCHAR(20) ); DROP TYPE PAL_RESULT_T; CREATE TYPE PAL_RESULT_T AS TABLE( "PRERULE" VARCHAR(500), "POSTRULE" VARCHAR(500), "SUPPORT" DOUBLE, "CONFIDENCE" DOUBLE, "LIFT" DOUBLE ); DROP TYPE PAL_PMMLMODEL_T; CREATE TYPE PAL_PMMLMODEL_T AS TABLE( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_RESULT_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_PMMLMODEL_T', 'out');
SAP AG 2013
117
GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_LITE_APRIORI_RULE', 'AFLPAL', 'LITEAPRIORIRULE', PDATA); DROP TABLE PAL_TRANS_TAB; CREATE COLUMN TABLE PAL_TRANS_TAB( "CUSTOMER" INT, "ITEM" VARCHAR(20) ); INSERT INTO PAL_TRANS_TAB VALUES (2, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (2, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (3, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (3, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (3, 'item4'); INSERT INTO PAL_TRANS_TAB VALUES (4,'item1'); INSERT INTO PAL_TRANS_TAB VALUES (4,'item3'); INSERT INTO PAL_TRANS_TAB VALUES (5, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (5, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (6, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (6, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (0, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (0, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (0, 'item5'); INSERT INTO PAL_TRANS_TAB VALUES (1, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (1, 'item4'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item3'); INSERT INTO PAL_TRANS_TAB VALUES (7, 'item5'); INSERT INTO PAL_TRANS_TAB VALUES (8, 'item1'); INSERT INTO PAL_TRANS_TAB VALUES (8, 'item2'); INSERT INTO PAL_TRANS_TAB VALUES (8, 'item3'); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100)); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER', 2, null, null); INSERT INTO PAL_CONTROL_TAB VALUES ('MIN_SUPPORT', null, 0.3, null); INSERT INTO PAL_CONTROL_TAB VALUES ('MIN_CONFIDENCE', null, 0.4, null); INSERT INTO PAL_CONTROL_TAB VALUES ('OPTIMIZATION_TYPE', 0, 0.7, null); INSERT INTO PAL_CONTROL_TAB VALUES ('IS_RECALCULATE', 1, null, null);
SAP AG 2013
118
DROP TABLE PAL_RESULT_TAB; CREATE COLUMN TABLE PAL_RESULT_TAB( "PRERULE" VARCHAR(500), "POSTRULE" VARCHAR(500), "SUPPORT" Double, "CONFIDENCE" Double, "LIFT" DOUBLE ); DROP TABLE PAL_PMMLMODEL_TAB; CREATE COLUMN TABLE PAL_PMMLMODEL_TAB( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); CALL _SYS_AFL.PAL_LITE_APRIORI_RULE(PAL_TRANS_TAB, PAL_CONTROL_TAB, PAL_RESULT_TAB, PAL_PMMLMODEL_TAB) WITH overview; SELECT * FROM PAL_RESULT_TAB; SELECT * FROM PAL_PMMLMODEL_TAB; Expected Result PAL_RESULT_TAB:
PAL_PMMLMODEL_TAB:
SAP AG 2013
119
3.4 3.4.1
Time Series Algorithms Single Exponential Smoothing
Single exponential smoothing is often used in financial market and economic data. In PAL, the algorithm begins by setting S 2 to y1 , where S stands for smoothed observation, y stands for the original observation, and the subscripts refer to the time periods, 1, 2, , n. There is no S1 , because the smoothed series starts with the smoothed version of the second observation. For any time period t , the smoothed value St is found by computing
St = yt 1 + (1 ) St 1 (t > 1)
Where the constant or parameter And you can get
is the smoothing factor, and
0 < <1.
S1 through the below equation:
S1 = y0 Prerequisites
SINGLESMOOTH
This is a single exponential smoothing function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','SINGLESMOOTH', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
SAP AG 2013
120
Signature Input Table Table Data Column 1 column 2 column

nd st
Description ID Raw data
Parameter Table Name RAW_DATA_COL ALPHA FORECAST_NUM STARTTIME Data Type Integer Double Integer Integer Description Column number of the column that contains the raw data. Value of the smoothing constant alpha (0 < < 1). Number of values to be forecast. When it is set to 1, the algorithm only forecasts one value. Start time of raw data sequence. The default is 1.
Output Table Table Result Column 1 column 2 column Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE("ID" INT, "RAWDATA" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("TIME" INT, "OUTPUT" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "strArgs" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in');
nd st
Description ID Output result
SAP AG 2013
121
insert into PDATA values (3,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('SINGLESMOOTH_TEST','AFLPAL','SINGLESMOOTH',P DATA); DROP TABLE CONTROL_TAB;
CREATE COLUMN TABLE CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "strArgs" VARCHAR(100)); INSERT INTO CONTROL_TAB VALUES ('RAW_DATA_COL',1,null,null); INSERT INTO CONTROL_TAB VALUES ('ALPHA',null,0.1,null); INSERT INTO CONTROL_TAB VALUES ('FORECAST_NUM',1,null,null); INSERT INTO CONTROL_TAB VALUES ('STARTTIME',2000,null,null); DROP TABLE SINGLE_TAB; CREATE COLUMN TABLE SINGLE_TAB ("ID" INT, "RAWDATA" DOUBLE); INSERT INTO SINGLE_TAB VALUES (0,200.0); INSERT INTO SINGLE_TAB VALUES (1,135.0); INSERT INTO SINGLE_TAB VALUES (2,195.0); INSERT INTO SINGLE_TAB VALUES (3,197.5); INSERT INTO SINGLE_TAB VALUES (4,310.0); INSERT INTO SINGLE_TAB VALUES (5,175.0); INSERT INTO SINGLE_TAB VALUES (6,155.0); INSERT INTO SINGLE_TAB VALUES (7,130.0); INSERT INTO SINGLE_TAB VALUES (8,220.0); INSERT INTO SINGLE_TAB VALUES (9,277.5); INSERT INTO SINGLE_TAB VALUES (10,235.0); DROP TABLE RESULT_TAB; CREATE COLUMN TABLE RESULT_TAB ("TIME" INT, "OUTPUT" DOUBLE); CALL _SYS_AFL.SINGLESMOOTH_TEST(SINGLE_TAB, CONTROL_TAB, RESULT_TAB) with overview; SELECT * FROM RESULT_TAB;
SAP AG 2013
122
Expected Result RESULT_TAB:
SAP AG 2013
123
3.4.2
Double Exponential Smoothing
In PAL, double exponential smoothing is also referred to as "Holt-Winters double exponential smoothing." The algorithm uses weighted historical trending to predict the future values of an account. It is more accurate for accounts that tend to trend in one direction over time. In PAL, the result of double exponential smoothing is computed by the following formula:
S0 = X 0 B0 = X 1 X 0 St = X t + (1 ) ( St 1 + Bt 1 ) Bt = ( St St 1 ) + (1 ) Bt 1 Ft + m = St + m Bt
Where
{ X t } : raw data sequence of observations, beginning at time t = 0 . {S t } : smoothed value for time t . {Bt } : the best estimate of the trend at time t . Ft + m : output of the algorithm, which is an estimate of x at time t + m based on the raw data up to time t .
: data smoothing factor. The range is : trend smoothing factor. The range is
0 < < 1.
0 < < 1.
Note: get
F0 is not defined because there is no estimation for time 0. According to the definition, you can
F1 = S 0 + B0 and so on.
Prerequisites
SAP AG 2013
124
DOUBLESMOOTH
This is a double exponential smoothing function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','DOUBLESMOOTH', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column Parameter Table Name RAW_DATA_COL ALPHA BETA FORECAST_NUM STARTTIME Output Table Table Result Column 1 column 2 column
nd st nd st
Data Type Integer Double Double Integer Integer
Description Column number of the column that contains the raw data. Value of the smoothing constant alpha (0 < < 1). Value of the smoothing constant beta (0 < < 1). Number of values to be forecast (num > 0). Start time of raw data sequence. The default is 1.
SAP AG 2013
125
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE("ID" INT, "RAWDATA" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("TIME" INT, "OUTPUT" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "strArgs" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('DOUBLESMOOTH_TEST','AFLPAL','DOUBLESMOOTH',P DATA); DROP TABLE CONTROL_TAB; CREATE COLUMN TABLE CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "strArgs" VARCHAR(100)); INSERT INTO CONTROL_TAB VALUES ('RAW_DATA_COL',1,null,null); INSERT INTO CONTROL_TAB VALUES ('ALPHA',null,0.501,null); INSERT INTO CONTROL_TAB VALUES ('BETA',null,0.072,null); INSERT INTO CONTROL_TAB VALUES ('FORECAST_NUM',6,null,null); INSERT INTO CONTROL_TAB VALUES ('STARTTIME',2000,null,null); DROP TABLE DOUBLE_TAB; CREATE COLUMN TABLE DOUBLE_TAB ("ID" INT, "RAWDATA" DOUBLE); INSERT INTO DOUBLE_TAB VALUES (0,143.0); INSERT INTO DOUBLE_TAB VALUES (1,152.0); INSERT INTO DOUBLE_TAB VALUES (2,161.0); INSERT INTO DOUBLE_TAB VALUES (3,139.0); INSERT INTO DOUBLE_TAB VALUES (4,137.0);
SAP AG 2013
126
INSERT INTO DOUBLE_TAB VALUES (5,174.0); INSERT INTO DOUBLE_TAB VALUES (6,142.0); INSERT INTO DOUBLE_TAB VALUES (7,141.0); INSERT INTO DOUBLE_TAB VALUES (8,162.0); INSERT INTO DOUBLE_TAB VALUES (9,180.0); INSERT INTO DOUBLE_TAB VALUES (10,164.0); INSERT INTO DOUBLE_TAB VALUES (11,171.0); INSERT INTO DOUBLE_TAB VALUES (12,206.0); INSERT INTO DOUBLE_TAB VALUES (13,193.0); INSERT INTO DOUBLE_TAB VALUES (14,207.0); INSERT INTO DOUBLE_TAB VALUES (15,218.0); INSERT INTO DOUBLE_TAB VALUES (16,229.0); INSERT INTO DOUBLE_TAB VALUES (17,225.0); INSERT INTO DOUBLE_TAB VALUES (18,204.0); INSERT INTO DOUBLE_TAB VALUES (19,227.0); INSERT INTO DOUBLE_TAB VALUES (20,223.0); INSERT INTO DOUBLE_TAB VALUES (21,242.0); INSERT INTO DOUBLE_TAB VALUES (22,239.0); INSERT INTO DOUBLE_TAB VALUES (23,266.0); DROP TABLE RESULT_TAB; CREATE COLUMN TABLE RESULT_TAB ("TIME" INT, "OUTPUT" DOUBLE); CALL _SYS_AFL.DOUBLESMOOTH_TEST(DOUBLE_TAB, CONTROL_TAB, RESULT_TAB) with overview; SELECT * FROM RESULT_TAB;
SAP AG 2013
127
Expected Result RESULT_TAB
SAP AG 2013
128
3.4.3
Triple Exponential Smoothing
Triple exponential smoothing is used to handle the time series data containing a seasonal component. This method is based on three smoothing equations: Stationary Component, Trend, and Seasonal. Both Seasonal and Trend can be additive or multiplicative. In PAL, the algorithm is finished with multiplicative and triple exponential smoothing is given by the formula below:
St =
Xt + (1 ) ( S t 1 + Bt 1 ) Ct L Xt + (1 ) Ct L St
Bt = ( S t S t 1 ) + (1 ) Bt 1 Ct =
Ft + m = ( S t + m Bt ) Ct L+1+(( m 1) mod L )
Where:
Data smoothing factor. The range is
0 < < 1.
Trend smoothing factor. The range is
0 < < 1.
0 < < 1.
X
S
Seasonal change smoothing factor. The range is Observation Smoothed observation Trend factor Seasonal index The forecast at
B
C
F t
Note:
m periods ahead
The index that denotes a time period
, , and
are the constants that must be estimated in such a way that the MSE of the error
is minimized. The formula for the initial trend estimate is:
Setting the initial estimates for the seasonal indices
for i = 0,1,...,L-1 is a bit more involved, then:
Where
Note:
is the average value of
x in the L cycle of your data.
SAP AG 2013
129
Prerequisites
TRIPLESMOOTH
This is a triple exponential smoothing function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','TRIPLESMOOTH', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column
nd st
Parameter Table Name RAW_DATA_COL ALPHA BETA GAMMA Data Type Integer Double Double Double Description Column number of the column that contains the raw data. Value of the smoothing constant alpha (0 < < 1). Value of the smoothing constant beta (0 < < 1). Value of the smoothing constant gamma ( 0 < < 1).
SAP AG 2013
130
Name CYCLE FORECAST_NUM STARTTIME
Data Type Integer Integer Integer
Description A cycle of length L (L > 1). For example, quarterly data cycle is 4, and monthly data cycle is 12. Number of values to be forecast (num > 0). Start time of raw data sequence (default value = 1).
Output Table Table Result Column 1 column 2 column Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE("ID" INT, "RAWDATA" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("TIME" INT, "OUTPUT" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "strArgs" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('TRIPLESMOOTH_TEST','AFLPAL','TRIPLESMOOTH',P DATA); DROP TABLE CONTROL_TAB;
nd st
CREATE COLUMN TABLE CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "strArgs" VARCHAR(100));
SAP AG 2013
131
INSERT INTO CONTROL_TAB VALUES ('RAW_DATA_COL',1,null,null); INSERT INTO CONTROL_TAB VALUES ('ALPHA',null,0.822,null); INSERT INTO CONTROL_TAB VALUES ('BETA',null,0.055,null); INSERT INTO CONTROL_TAB VALUES ('GAMMA',null,0.055,null); INSERT INTO CONTROL_TAB VALUES ('CYCLE',4,null,null); INSERT INTO CONTROL_TAB VALUES ('STARTTIME',2000,null,null); INSERT INTO CONTROL_TAB VALUES ('FORECAST_NUM',6,null,null); DROP TABLE TRIPLE_TAB; CREATE COLUMN TABLE TRIPLE_TAB ("ID" INT, "RAWDATA" DOUBLE); INSERT INTO TRIPLE_TAB VALUES (0,362.0); INSERT INTO TRIPLE_TAB VALUES (1,385.0); INSERT INTO TRIPLE_TAB VALUES (2,432.0); INSERT INTO TRIPLE_TAB VALUES (3,341.0); INSERT INTO TRIPLE_TAB VALUES (4,382.0); INSERT INTO TRIPLE_TAB VALUES (5,409.0); INSERT INTO TRIPLE_TAB VALUES (6,498.0); INSERT INTO TRIPLE_TAB VALUES (7,387.0); INSERT INTO TRIPLE_TAB VALUES (8,473.0); INSERT INTO TRIPLE_TAB VALUES (9,513.0); INSERT INTO TRIPLE_TAB VALUES (10,582.0); INSERT INTO TRIPLE_TAB VALUES (11,474.0); INSERT INTO TRIPLE_TAB VALUES (12,544.0); INSERT INTO TRIPLE_TAB VALUES (13,582.0); INSERT INTO TRIPLE_TAB VALUES (14,681.0); INSERT INTO TRIPLE_TAB VALUES (15,557.0); INSERT INTO TRIPLE_TAB VALUES (16,628.0); INSERT INTO TRIPLE_TAB VALUES (17,707.0); INSERT INTO TRIPLE_TAB VALUES (18,773.0); INSERT INTO TRIPLE_TAB VALUES (19,592.0); INSERT INTO TRIPLE_TAB VALUES (20,627.0); INSERT INTO TRIPLE_TAB VALUES (21,725.0); INSERT INTO TRIPLE_TAB VALUES (22,854.0); INSERT INTO TRIPLE_TAB VALUES (23,661.0); DROP TABLE RESULT_TAB; CREATE COLUMN TABLE RESULT_TAB ("TIME" INT, "OUTPUT" DOUBLE); CALL _SYS_AFL.TRIPLESMOOTH_TEST(TRIPLE_TAB, CONTROL_TAB, RESULT_TAB) with overview; SELECT * FROM RESULT_TAB;
SAP AG 2013
132
SAP AG 2013
133
3.5 3.5.1
Preprocessing Algorithms Binning
Binning data is a common requirement prior to running certain predictive algorithms. It generally reduces the complexity of the model, for example, the model in a decision tree. Binning methods smooth a sorted data value by consulting its neighborhood, that is, the values around it. The sorted values are distributed into a number of buckets, or bins. Because binning methods consult the neighborhood of values, they perform local smoothing. There are four binning methods: Equal widths based on the number of bins Equal widths based on the bin width Equal number of records per bin Mean / standard deviation bin boundaries
And three methods for smoothing: Smoothing by bin means: each value in a bin is replaced by the mean value of the bin. Smoothing by bin medians: each bin value is replaced by the bin median. Smoothing by bin boundaries: the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by its closest boundary value.
Prerequisites
The input data does not contain null value. The data is numeric, not categorical.
SAP AG 2013
134
BINNING
This function preprocesses the data. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','BINNING', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column
nd st
Column Data Type Integer or String Integer or double
Description ID Variable temperature
SAP AG 2013
135
Parameter Table Name BINNING_METHOD Data Type Integer Description Binning methods: SMOOTH_METHOD Integer 0: equal widths based on the number of bins 1: equal widths based on the bin width 2: equal number of records per bin 3: mean/ standard deviation bin boundaries
Smoothing methods: 0: smoothing by bin means 1: smoothing by bin medians 2: smoothing by bin boundaries
BIN_NUMBER BIN_DISTANCE SD
Integer Integer Integer
Number of needed bins Specifies the distance for binning. This is required only when you have set BINNING_METHOD to 1. Specifies the standard deviation method. This is required only when you have set BINNING_METHOD to 3. Examples: 1 S.D.; 2 S.D.; 3 S.D.
Output Table Table Result Column 1 column 2 column 3 column

rd nd st
Column Data Type Integer or string Integer Integer or double
Description ID Variable TYPE Variable PRE_RESULT
SAP AG 2013
136
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T ; CREATE TYPE DATA_T AS TABLE("ID" INT, "TEMPERATURE" DOUBLE) ; DROP TYPE CONTROL_T ; CREATE TYPE CONTROL_T AS TABLE( "Name" VARCHAR (50),"intArgs" INTEGER,"doubleArgs" DOUBLE,"stringArgs" VARCHAR (100)); DROP TYPE RESULT_T ; CREATE TYPE RESULT_T AS TABLE("ID" INT, "BIN_NUMBER" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('BINNING_TEST','AFLPAL','BINNING',PDATA); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ("ID" INT, "TEMPERATURE" INSERT INTO DATA_TAB VALUES (0, 6.0) ; INSERT INTO DATA_TAB VALUES (1, 12.0) ; INSERT INTO DATA_TAB VALUES (2, 13.0) ; INSERT INTO DATA_TAB VALUES (3, 15.0) ; INSERT INTO DATA_TAB VALUES (4, 10.0) ; INSERT INTO DATA_TAB VALUES (5, 23.0) ; INSERT INTO DATA_TAB VALUES (6, 24.0) ; INSERT INTO DATA_TAB VALUES (7, 30.0) ; INSERT INTO DATA_TAB VALUES (8, 32.0) ; INSERT INTO DATA_TAB VALUES (9, 25.0) ; INSERT INTO DATA_TAB VALUES (10, 38.0) ; DOUBLE) ; INT, "PRE_RESULT"
SAP AG 2013
137
DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ( "Name" VARCHAR (50), "intArgs" INTEGER, "doubleArgs" DOUBLE, "stringArgs" VARCHAR (100)); INSERT INTO #CONTROL_TAB VALUES ('BINNING_METHOD',0,null,null); INSERT INTO #CONTROL_TAB VALUES ('SMOOTH_METHOD',0,null,null); INSERT INTO #CONTROL_TAB VALUES ('BIN_NUMBER',4,null,null); INSERT INTO #CONTROL_TAB VALUES ('BIN_DISTANCE',10,null,null); INSERT INTO #CONTROL_TAB VALUES ('SD',1,null,null); DROP TABLE RESULT_TAB; CREATE TABLE RESULT_TAB ("ID" INT, "BIN_NUMBER" INT, "PRE_RESULT" DOUBLE) ; CALL _SYS_AFL.BINNING_TEST(DATA_TAB, "#CONTROL_TAB", RESULT_TAB) with overview; SELECT * FROM RESULT_TAB; Expected Result RESULT_TAB:
SAP AG 2013
138
3.5.2
Inter-quartile Range Test
Given a series of numeric data, the inter-quartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1) of the data.
IQR = Q 3 Q1 Q1 is equal to 25th percentile and Q 3 is equal to 75th percentile.

The p-th percentile of a numeric vector is a number, which is greater than or equal to p% of all the values of this numeric vector. IQR Test is a method to test the outliers of a series of numeric data. The algorithm performs the following tasks: 1. Calculates
Q1 , Q 3 , and IQR . Q 3 + 1.5 IQR Q1 1.5 IQR
2. Set upper and lower bound as follows: Upper-bound = Lower-bound =
3. Tests all the values of a numeric vector to determine if it is in the range. The value outside the range is marked as an outlier, meaning it does not pass the IQR test.
Prerequisites
The input data does not contain null value. The algorithm will issue errors when encountering null values.
IQRTEST
This function performs the inter-quartile range test and outputs the test results.
SAP AG 2013
139
Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','IQRTEST', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <IQR OUTPUT table type> <Test OUTPUT table type> Direction in in out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <IQR output table>, <test output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column
nd st
Column Data Type Integer, varchar, or char Integer or double
Description ID Data that needs to be tested
Parameter Table Name MULTIPLIER Data Type Double Description The multiplier used in the IQR test. The default is 1.5.
Output Tables Table IQR Values Column 1 column 2 column Test Result 1 column 2 column
nd st nd st
Column Data Type Double Double Integer Integer or double
Description Q1 value Q3 value ID Test result: 0: a value is in the range 1: a value is out of range
SAP AG 2013
140
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE( "ID" VARCHAR(10),"VAL" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "stringArgs" VARCHAR (100)); DROP TYPE IQR_T; CREATE TYPE IQR_T AS TABLE("Q1" DOUBLE, "Q3" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ID" VARCHAR(10), "TEST" INT); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.IQR_T','out'); insert into PDATA values (4,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palIQR','AFLPAL','IQRTEST',PDATA); DROP TABLE TESTDT_TAB;
CREATE COLUMN TABLE TESTDT_TAB("ID" VARCHAR(10),"VAL" DOUBLE); INSERT INTO TESTDT_TAB VALUES ('P1', 10); INSERT INTO TESTDT_TAB VALUES ('P2', 11); INSERT INTO TESTDT_TAB VALUES ('P3', 10); INSERT INTO TESTDT_TAB VALUES ('P4', 9); INSERT INTO TESTDT_TAB VALUES ('P5', 10); INSERT INTO TESTDT_TAB VALUES ('P6', 24); INSERT INTO TESTDT_TAB VALUES ('P7', 11); INSERT INTO TESTDT_TAB VALUES ('P8', 12); INSERT INTO TESTDT_TAB VALUES ('P9', 10);
SAP AG 2013
141
INSERT INTO TESTDT_TAB VALUES ('P10', 9); INSERT INTO TESTDT_TAB VALUES ('P11', 1); INSERT INTO TESTDT_TAB VALUES ('P12', 11); INSERT INTO TESTDT_TAB VALUES ('P13', 12); INSERT INTO TESTDT_TAB VALUES ('P14', 13); INSERT INTO TESTDT_TAB VALUES ('P15', 12); DROP TABLE #CONTROL_TAB;
CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE, "stringArgs" VARCHAR (100)); INSERT INTO #CONTROL_TAB VALUES ('MULTIPLIER',null,1.5,null); DROP TABLE IQR_TAB; CREATE COLUMN TABLE IQR_TAB ("Q1" DOUBLE, "Q3" DOUBLE); DROP TABLE RESULTS_TAB; CREATE COLUMN TABLE RESULTS_TAB ("ID" VARCHAR(10), "TEST" INT); CALL _SYS_AFL.palIQR(TESTDT_TAB, "#CONTROL_TAB", IQR_TAB, RESULTS_TAB) with overview; SELECT * FROM IQR_TAB; SELECT * FROM RESULTS_TAB;
SAP AG 2013
142
Expected Result IQR value:
Test result:
SAP AG 2013
143
3.5.3
Sampling
Sampling is used to extract a subset of sample units from all the samples. It is usually difficult for researchers to make direct observations on every individual in the population of concern, so they extract part of the sample units for research. The basic requirement for sampling is to guarantee that the extracted sample unit has a full representation of all the samples. There are many sampling methods. PAL supports eight of them, including: First_N Middle_N Last_N Every_Nth SimpleRandom_WithReplacement SimpleRandom_WithoutReplacement Systematic Stratified
Prerequisites
The input data does not contain null value.
SAMPLING
This function takes samples from a population. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','SAMPLING', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
SAP AG 2013
144
Signature Input Table Table Data Column Columns Column Data Type Integer, double, varchar, or char Description Any data users need
Parameter Table Name SAMPLING_METHOD Data Type Integer Description Sampling method: SAMPLING_SIZE Integer 0 : First_N 1 : Middle_N 2 : Last_N 3 : Every_Nth 4 : SimpleRandom_WithReplacement 5 : SimpleRandom_WithoutReplacement 6 : Systematic 7 : Stratified
Number of the samples. Use this parameter when PERCENTAGE is not set.
PERCENTAGE
Double
Percentage of the samples. Use this parameter when SAMPLING_SIZE is not set.
THREAD_NUMBER INTERVAL
Integer Integer
Number of threads The interval between two samples Note: This parameter is only required for the Every_Nth method. If this parameter is not specified, the SAMPLING_SIZE parameter will be used.
STRATA_NUM
Integer
The number of the sub-populations. Note: This parameter is only required for the stratified method. In this function a population with three strata is sampled.
STRATA1_COUNT STRATA2_COUNT STRATA3_COUNT
Integer Integer Integer
The needed numbers of the first strata. The needed numbers of the second strata. The needed numbers of the third strata.
Output Table Table Result Column Columns Column Data Type Integer, double, varchar, or char Description The Output Table has the same structure as defined in the Input Table.
SAP AG 2013
145
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T ; CREATE TYPE DATA_T AS TABLE("EMPNO" INT, DOUBLE) ; DROP TYPE CONTROL_T ; CREATE TYPE CONTROL_T AS TABLE( "Name" VARCHAR (50),"intArgs" INTEGER,"doubleArgs" DOUBLE,"stringArgs" VARCHAR (100)); DROP TYPE RESULT_T ; CREATE TYPE RESULT_T AS TABLE("RESULT_EMPNO" INT, "RESULT_GENDER" VARCHAR (50), "RESULT_INCOME" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('SAMPLING_TEST','AFLPAL','SAMPLING',PDATA); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ("EMPNO" INT, "GENDER" VARCHAR (50), "INCOME" DOUBLE) ; INSERT INTO DATA_TAB VALUES (1, 'male', 4000.5) ; INSERT INTO DATA_TAB VALUES (2, 'male', 5000.7) ; INSERT INTO DATA_TAB VALUES (3, 'female', 5100.8) ; INSERT INTO DATA_TAB VALUES (4, 'male', 5400.9) ; INSERT INTO DATA_TAB VALUES (5, 'female', 5500.2) ; INSERT INTO DATA_TAB VALUES (6, 'male', 5540.4) ; INSERT INTO DATA_TAB VALUES (7, 'male', 4500.9) ; INSERT INTO DATA_TAB VALUES (8, 'female', 6000.8) ; INSERT INTO DATA_TAB VALUES (9, 'male', 7120.8) ; INSERT INTO DATA_TAB VALUES (10, 'female', 8120.9) ; INSERT INTO DATA_TAB VALUES (11, 'female', 7453.9) ; "GENDER" VARCHAR (50), "INCOME"
SAP AG 2013
146
INSERT INTO DATA_TAB VALUES (12, 'male', 7643.8) ; INSERT INTO DATA_TAB VALUES (13, 'male', 6754.3) ; INSERT INTO DATA_TAB VALUES (14, 'male', 6759.9) ; INSERT INTO DATA_TAB VALUES (15, 'male', 9876.5) ; INSERT INTO DATA_TAB VALUES (16, 'female', 9873.2) ; INSERT INTO DATA_TAB VALUES (17, 'male', 9889.9) ; INSERT INTO DATA_TAB VALUES (18, 'male', 9910.4) ; INSERT INTO DATA_TAB VALUES (19, 'male', 7809.3) ; INSERT INTO DATA_TAB VALUES (20, 'female', 8705.7) ; INSERT INTO DATA_TAB VALUES (21, 'male', 8756.0) ; INSERT INTO DATA_TAB VALUES (22, 'female', 7843.2) ; INSERT INTO DATA_TAB VALUES (23, 'male', 8576.9) ; INSERT INTO DATA_TAB VALUES (24, 'male', 9560.9) ; INSERT INTO DATA_TAB VALUES (25, 'female', 8794.9) ; DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ( "Name" VARCHAR (50), "intArgs" INTEGER, "doubleArgs" DOUBLE, "stringArgs" VARCHAR (100)); INSERT INTO #CONTROL_TAB VALUES ('SAMPLING_METHOD',0,null,null); INSERT INTO #CONTROL_TAB VALUES ('SAMPLING_SIZE',8,null,null); --INSERT INTO #CONTROL_TAB VALUES ('PERCENTAGE',NULL,0.1,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO #CONTROL_TAB VALUES ('INTERVAL',5,null, null); INSERT INTO #CONTROL_TAB VALUES ('STRATA_NUM',3,null,null); INSERT INTO #CONTROL_TAB VALUES ('STRATA1_COUNT',9,null,null); INSERT INTO #CONTROL_TAB VALUES ('STRATA2_COUNT',9,null,null); INSERT INTO #CONTROL_TAB VALUES ('STRATA3_COUNT',7,null,null); DROP TABLE RESULT_TAB; CREATE TABLE RESULT_TAB ("RESULT_EMPNO" INT, "RESULT_GENDER" VARCHAR (50), "RESULT_INCOME" DOUBLE ) ; CALL _SYS_AFL.SAMPLING_TEST(DATA_TAB, "#CONTROL_TAB", RESULT_TAB) WITH OVERVIEW; SELECT * FROM RESULT_TAB;
SAP AG 2013
147
Expected Result If method is 0 and SAMPLING_SIZE is 8:
If method is 1 and SAMPLING_SIZE is 8:
SAP AG 2013
148
If method is 3 and INTERVAL is 5:
SAP AG 2013
149
If method is 0 and PERCENTAGE is 0.1:
SAP AG 2013
150
3.5.4
Scaling Range
This function is used when the attribute data are to be scaled to fall within a specified range, such as, -1.0 to 1.0, or 0.0 to 1.0. You can normalize an attribute by scaling its values to make them fall within a specified range. Normalization is particularly useful for classification algorithms involving neural networks, or distance measurements such as nearest neighbor classification and clustering. There are many data normalization methods. In PAL, the scaling range algorithm includes three methods: min-max normalization, z-score normalization, and normalization by decimal scaling. Min-max normalization performs a linear transformation on the original data. Suppose that minA and maxA are the minimum and maximum values of an attribute, A. Min-max normalization maps a value v, of A to V in the range[new_minA, new_maxA] by computing
V ' = ( v min A) ( new _ max A new _ min A) /(max A min A) + new _ min A
In z-score normalization (or zero-mean normalization), the values for an attribute, A, are normalized based on the mean and standard deviation of A. A value, v, of A is normalized to V by computing V = (v A) / A Where A and A are the mean and standard deviation, respectively, of attribute A. This method of normalization is useful when the actual minimum and maximum of attribute A are unknown, or when there are outliers that dominate the min-max normalization. Normalization by decimal scaling normalizes by moving the decimal point of values of attributes A. The number of decimal points moved depends on the maximum absolute value of A. A value, v, of A is normalized to V by computing
V ' = v / 10 j
Where
j is the smallest integer such that Max(|V|) < 1.
Prerequisites
The input data does not contain null value. The data is numeric, not categorical.
SAP AG 2013
151
SCALINGRANGE
This function normalizes the data. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','SCALINGRANGE', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column Other columns
st
Description ID Variable Xn
Parameter Table Name SCALING_METHOD Data Type Integer Description Scaling method: THREAD_NUMBER NEW_MAX NEW_MIN Integer Double or integer Double or integer 0: Min-max normalization 1: Z-Score normalization 2: Decimal scaling normalization
Number of threads The new maximum value of the min-max normalization method The new minimum value of min-max normalization method
SAP AG 2013
152
Output Table Table Result Column 1 column Other columns

st
Description ID Variable Xn
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T ; CREATE TYPE DATA_T AS TABLE("ID" INT, "X1" DOUBLE, "X2" DOUBLE) ; DROP TYPE CONTROL_T ; CREATE TYPE CONTROL_T AS TABLE( "Name" VARCHAR (50),"intArgs" INTEGER,"doubleArgs" DOUBLE,"stringArgs" VARCHAR (100)); DROP TYPE RESULT_T ; CREATE TYPE RESULT_T AS TABLE("ID" INT, "PRE_X1" DOUBLE); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('SCALINGRANGE_TEST','AFLPAL','SCALINGRANGE',P DATA); DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ("ID" INT, "X1" INSERT INTO DATA_TAB VALUES (0, 6.0, 9.0) ; INSERT INTO DATA_TAB VALUES (1, 12.1, 8.3) ; INSERT INTO DATA_TAB VALUES (2, 13.5, 15.3) ; INSERT INTO DATA_TAB VALUES (3, 15.4, 18.7) ; INSERT INTO DATA_TAB VALUES (4, 10.2, 19.8) ; DOUBLE, "X2" DOUBLE) ; DOUBLE, "PRE_X2"
SAP AG 2013
153
INSERT INTO DATA_TAB VALUES (5, 23.3, 20.6) ; INSERT INTO DATA_TAB VALUES (6, 24.4,24.3) ; INSERT INTO DATA_TAB VALUES (7, 30.6, 25.3) ; INSERT INTO DATA_TAB VALUES (8, 32.5, 27.6) ; INSERT INTO DATA_TAB VALUES (9, 25.6, 28.5) ; INSERT INTO DATA_TAB VALUES (10, 38.7, 29.4) ; INSERT INTO DATA_TAB VALUES (11, 38.7, 29.4) ; DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ( "Name" VARCHAR (50), "intArgs" INTEGER, "doubleArgs" DOUBLE, "stringArgs" VARCHAR (100)); INSERT INTO #CONTROL_TAB VALUES ('SCALING_METHOD',2,null,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO #CONTROL_TAB VALUES ('NEW_MAX',null,1.0,null); INSERT INTO #CONTROL_TAB VALUES ('NEW_MIN',null,0.0,null); DROP TABLE RESULT_TAB; CREATE TABLE RESULT_TAB ("ID" INT, "PRE_X1" DOUBLE, "PRE_X2" DOUBLE) ; CALL _SYS_AFL.SCALINGRANGE_TEST(DATA_TAB, "#CONTROL_TAB", RESULT_TAB) with overview; SELECT * FROM RESULT_TAB;
Expected Result If method is 0:
SAP AG 2013
154
If method is 1:
If method is 2:
SAP AG 2013
155
3.5.5
Variance Test
Variance Test is a method to identify the outliers of
n number of numeric data {xi } where 0 < i < n + 1 , using the mean {} and the standard deviation { } of n number of numeric data {xi } .
Below is the algorithm for Variance Test: 1. Calculate the mean ( ) and the standard deviation ( ) :
1 n xi n i =1
1 n ( xi ) 2 n i =1
2. Set the upper and lower bounds as follows: Upper-bound = + multiplier * Lower-bound = - multiplier * Where the multiplier is a double type coefficient provided by the user to test whether all the values of a numeric vector are in the range. If a value is outside the range, it means it doesn't pass the Variance Test. The value is marked as an outlier.
Prerequisites
SAP AG 2013
156
VARIANCETEST
This is a variance test function. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','VARIANCETEST', <signature table>); The signature table should contain the following records: Index 1 2 3 4 Table Type Name <INPUT table type> <PARAMETER table type> <Result OUTPUT table type> <Test OUTPUT table type> Direction in in out out
Procedure Calling CALL <procedure name>(<input table>, <parameter table>, <result output table>, <test output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table. Signature Input Table Table Data Column 1 column 2 column
nd st
Parameter Table Name SIGMA_NUM THREAD_NUMBER Data Type Double Integer Description Multiplier for sigma Number of threads

nd st
Column Data Type Double Double
Description Mean value Standard deviation
Constraint
SAP AG 2013
157
Table Test
Column 1 column 2 column

nd st
Column Data Type Integer or varchar Integer
Description ID Result output
Constraint
0: in bounds 1: out of bounds
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE("ID" INT,"X" DOUBLE); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("MEAN" DOUBLE,"SD" DOUBLE); DROP TYPE TEST_T; CREATE TYPE TEST_T AS TABLE("ID" INT,"Test" INT); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); insert into PDATA values (4,'DM_PAL.TEST_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('palVarianceTest','AFLPAL','VARIANCETEST',PDA TA); DROP TABLE #CONTROL_TAB; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TAB ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TAB VALUES ('SIGMA_NUM',null,3.0,null); INSERT INTO #CONTROL_TAB VALUES ('THREAD_NUMBER',8,null,null);
SAP AG 2013
158
DROP TABLE DATA_TAB; CREATE COLUMN TABLE DATA_TAB ( "ID" INT,"X" DOUBLE); INSERT INTO DATA_TAB VALUES (0,25); INSERT INTO DATA_TAB VALUES (1,20); INSERT INTO DATA_TAB VALUES (2,23); INSERT INTO DATA_TAB VALUES (3,29); INSERT INTO DATA_TAB VALUES (4,26); INSERT INTO DATA_TAB VALUES (5,23); INSERT INTO DATA_TAB VALUES (6,22); INSERT INTO DATA_TAB VALUES (7,21); INSERT INTO DATA_TAB VALUES (8,22); INSERT INTO DATA_TAB VALUES (9,25); INSERT INTO DATA_TAB VALUES (10,26); INSERT INTO DATA_TAB VALUES (11,28); INSERT INTO DATA_TAB VALUES (12,29); INSERT INTO DATA_TAB VALUES (13,27); INSERT INTO DATA_TAB VALUES (14,26); INSERT INTO DATA_TAB VALUES (15,23); INSERT INTO DATA_TAB VALUES (16,22); INSERT INTO DATA_TAB VALUES (17,23); INSERT INTO DATA_TAB VALUES (18,25); INSERT INTO DATA_TAB VALUES (19,103); DROP TABLE RESULT_TAB; CREATE COLUMN TABLE RESULT_TAB ("MEAN" DOUBLE,"SD" DOUBLE); DROP TABLE TEST_TAB; CREATE COLUMN TABLE TEST_TAB ("ID" INT,"Test" INT); CALL _SYS_AFL.palVarianceTest(DATA_TAB, "#CONTROL_TAB", RESULT_TAB, TEST_TAB) with overview; SELECT * FROM RESULT_TAB; SELECT * FROM TEST_TAB;
SAP AG 2013
159
TEST_TAB:
SAP AG 2013
160
3.6 3.6.1
Miscellaneous ABC Analysis
This algorithm is used to classify objects (such as customers, employees, or products) based on a particular measure (such as revenue or profit). It suggests that inventories of an organization are not of equal value, thus can be grouped into three categories (A, B, and C) by their estimated importance. A items are very important for an organization. B items are of medium importance, that is, less important than A items and more important than C items. C items are of the least importance. An example of ABC classification is as follows: A items 20% of the items (customers) accounts for 70% of the revenue. B items 30% of the items (customers) accounts for 20% of the revenue. C items 50% of the items (customers) accounts for 10% of the revenue.
Prerequisites
Input data cannot contain null value. The item names in the Input table must be of string data type and be unique.
ABC
This function performs the ABC analysis algorithm. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>','AFLPAL','ABC', <signature table>); The signature table should contain the following records: Index 1 2 3 Table Type Name <INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in out
SAP AG 2013
161
Signature Input Table Table Data Column 1st column 2nd column Column Data Type VARCHAR/CHAR Double Description Item name Value
Parameter Table Name THREAD_NUMBER PERCENT_A PERCENT_B PERCENT_C Data Type Integer Double Double Double Description Number of threads Interval for A class Interval for B class Interval for C class Default Value
Output Table Table Result Column 1st column 2nd column Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE DATA_T; CREATE TYPE DATA_T AS TABLE("ITEM" VARCHAR(100),"VALUE" DOUBLE); DROP TYPE CONTROL_T; CREATE TYPE CONTROL_T AS TABLE("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); DROP TYPE RESULT_T; CREATE TYPE RESULT_T AS TABLE("ABC" VARCHAR(10),"ITEM" VARCHAR(100)); DROP table PDATA; CREATE column table PDATA("ID" INT,"TYPENAME" VARCHAR(100),"DIRECTION" VARCHAR(100)); insert into PDATA values (1,'DM_PAL.DATA_T','in'); Column Data Type VARCHAR/CHAR VARCHAR/CHAR Description ABC class Items
SAP AG 2013
162
insert into PDATA values (2,'DM_PAL.CONTROL_T','in'); insert into PDATA values (3,'DM_PAL.RESULT_T','out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_ABC','AFLPAL','ABC',PDATA); DROP TABLE #CONTROL_TBL; CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TBL ("Name" VARCHAR(100), "intArgs" INT, "doubleArgs" DOUBLE,"strArgs" VARCHAR(100)); INSERT INTO #CONTROL_TBL VALUES ('THREAD_NUMBER',1,null,null); INSERT INTO #CONTROL_TBL VALUES ('PERCENT_A',null,0.7,null); INSERT INTO #CONTROL_TBL VALUES ('PERCENT_B',null,0.2,null); INSERT INTO #CONTROL_TBL VALUES ('PERCENT_C',null,0.1,null); DROP TABLE TESTABCTAB; CREATE COLUMN TABLE TESTABCTAB("ITEM" VARCHAR(100),"VALUE" DOUBLE); INSERT INTO TESTABCTAB VALUES ('item1', 15.4); INSERT INTO TESTABCTAB VALUES ('item2', 200.4); INSERT INTO TESTABCTAB VALUES ('item3', 280.4); INSERT INTO TESTABCTAB VALUES ('item4', 100.9);#100.9 INSERT INTO TESTABCTAB VALUES ('item5', 40.4); INSERT INTO TESTABCTAB VALUES ('item6', 25.6); INSERT INTO TESTABCTAB VALUES ('item7', 18.4); INSERT INTO TESTABCTAB VALUES ('item8', 10.5); INSERT INTO TESTABCTAB VALUES ('item9', 96.15); INSERT INTO TESTABCTAB VALUES ('item10', 9.4); DROP TABLE RESULT_TBL; CREATE COLUMN TABLE RESULT_TBL("ABC" VARCHAR(10),"ITEM" VARCHAR(100)); CALL _SYS_AFL.PAL_ABC(TESTABCTAB, "#CONTROL_TBL", RESULT_TBL) with overview; select * from RESULT_TBL;
SAP AG 2013
163
Expected Result RESULT_TBL:
SAP AG 2013
164
3.6.2
Weighted Score Table
A weighted score table is a method of evaluating alternatives when the importance of each criterion differs. In a weighted score table, each alternative is given a score for each criterion. These scores are then weighted by the importance of each criterion. All of an alternative's weighted scores are then added together to calculate its total weighted score. The alternative with the highest total score should be the best alternative. You can use weighted score tables to make predictions about future customer behavior. You first create a model based on historical data in the data mining application, and then apply the model to new data to make the prediction. The prediction, that is, the output of the model, is called a score. You can create a single score for your customers by taking into account different dimensions. A function defined by weighted score tables is a linear combination of functions of a variable.
f ( x1 ,..., xn ) = w1 f1 ( x1 ) + ... + wn f n ( xn ) Prerequisites

The input data does not contain a null value. The column of the Map Function table is sorted by the attribute order of the Input Data table.
WEIGHTEDTABLE
This function performs weighted table calculation. It is similar to the Volume Driver function in the Business Function Library (BFL). Volume Driver calculates only one column, but weightedTable calculates multiple columns at the same time. Procedure Generation CALL SYSTEM.AFL_WRAPPER_GENERATOR('<procedure name>', 'AFLPAL', 'WEIGHTEDTABLE', <signature table>); The signature table should contain the following records: Index 1 2 3 4 5 Table Type Name <Data INPUT table type> <Map INPUT table type> <Control INPUT table type> <PARAMETER table type> <OUTPUT table type> Direction in in in in out
Procedure Calling CALL <procedure name>(<data input table>, <map input table>, <control input table>, <parameter table>, <output table>) with overview; The procedure name is the same as specified in the procedure generation. The input, parameter, and output tables must be of the types specified in the signature table.
SAP AG 2013
165
Signature Input Tables Table Target/ Input Data Column Columns Column Data Type Varchar, char, integer, or double Description Specifies which will be used to calculate the scores Constraint Discrete value: integer, string, double Continuous value: integer, double An ID column is mandatory. Its data type should be integer. Map Function Columns Varchar, char, integer, or double Creates the map function Every attribute (except ID) in the Input Data table maps to two columns in the Map Function table: Key column and Value column. The Value column must be of double type. This table has three columns. When the Input Data table has n attributes (except ID), the Weight Table will have n rows.
Control
Columns
Integer or double

nd st
Column Data Type Integer Double
Description ID Result value
SAP AG 2013
166
Example Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "ID" INT, "GENDER" VARCHAR(10), "INCOME" INT, "HEIGHT" DOUBLE ); DROP TYPE PAL_MAP_FUN_T; CREATE TYPE PAL_MAP_FUN_T AS TABLE( "GENDER" VARCHAR(10), "VAL1" DOUBLE, "INCOME" INT, "VAL2" DOUBLE, "HEIGHT" DOUBLE, "VAL3" DOUBLE ); DROP TYPE PAL_PARA_T; CREATE TYPE PAL_PARA_T AS TABLE( "WEIGHT" DOUBLE, "ISDIS" INT, "ROWNUM" INT ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); DROP TYPE PAL_RESULT_T; CREATE TYPE PAL_RESULT_T AS TABLE( "ID" INT,
SAP AG 2013
167
"RESULT" DOUBLE ); -- create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_MAP_FUN_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_PARA_T', 'in'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (5, 'DM_PAL.PAL_RESULT_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_WEIGHTEDTABLE', 'AFLPAL', 'WEIGHTEDTABLE', PDATA); DROP TABLE PAL_DATA_TAB; CREATE COLUMN TABLE PAL_DATA_TAB ( "ID" INT, "GENDER" VARCHAR(10), "INCOME" INT, "HEIGHT" DOUBLE ); INSERT INTO PAL_DATA_TAB VALUES (0,'male',5000,1.73); INSERT INTO PAL_DATA_TAB VALUES (1,'male',9000,1.80); INSERT INTO PAL_DATA_TAB VALUES (2,'female',6000,1.55); INSERT INTO PAL_DATA_TAB VALUES (3,'male',15000,1.65); INSERT INTO PAL_DATA_TAB VALUES (4,'female',2000,1.70); INSERT INTO PAL_DATA_TAB VALUES (5,'female',12000,1.65); INSERT INTO PAL_DATA_TAB VALUES (6,'male',1000,1.65); INSERT INTO PAL_DATA_TAB VALUES (7,'male',8000,1.60); INSERT INTO PAL_DATA_TAB VALUES (8,'female',5500,1.85);#5500 INSERT INTO PAL_DATA_TAB VALUES (9,'female',9500,1.85); DROP TABLE PAL_MAP_FUN_TAB; PAL_MAP_FUN_TAB (
CREATE COLUMN TABLE "VAL1" DOUBLE, "INCOME" INT, "VAL2" DOUBLE, "HEIGHT" DOUBLE,
"GENDER" VARCHAR(10),
SAP AG 2013
168
"VAL3" DOUBLE ); INSERT INTO PAL_MAP_FUN_TAB VALUES ('male',2.0, INSERT INTO PAL_MAP_FUN_TAB VALUES (null,0.0, INSERT INTO PAL_MAP_FUN_TAB VALUES (null,0.0, DROP TABLE PAL_PARA_TAB; CREATE COLUMN TABLE PAL_PARA_TAB ( "WEIGHT" DOUBLE, "ISDIS" INT, "ROWNUM" INT ); INSERT INTO PAL_PARA_TAB VALUES (0.5,1,2); INSERT INTO PAL_PARA_TAB VALUES (2.0,-1,4); INSERT INTO PAL_PARA_TAB VALUES (1.0,-1,4); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB ( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); DROP TABLE PAL_RESULT_TAB; CREATE COLUMN TABLE PAL_RESULT_TAB( "ID" INT, "RESULT" DOUBLE ); CALL _SYS_AFL.PAL_WEIGHTEDTABLE(PAL_DATA_TAB, PAL_MAP_FUN_TAB, PAL_PARA_TAB, PAL_CONTROL_TAB, PAL_RESULT_TAB) with overview; SELECT * FROM PAL_RESULT_TAB; 0,0.0, 9000,2.0, 1.5,0.0); 1.6,1.0); 1.71,2.0); INSERT INTO PAL_MAP_FUN_TAB VALUES ('female',1.5, 5500,1.0,
12000,3.0, 1.80,3.0);
SAP AG 2013
169
SAP AG 2013
170
End-to-End Scenarios
Scenario
You want to predict segmentation/clustering of new customers for a supermarket. First use the Kmeans function in PAL to perform segmentation/clustering for existing customers in the supermarket. The output can then be used as the training data for the C4.5 Decision Tree function to predict new customers segmentation/clustering.
Technology Background
K-means clustering is a method of cluster analysis whereby the algorithm partitions N observations or records into K clusters, in which each observation belongs to the cluster with the nearest center. It is one of the most commonly used algorithms in clustering method. Decision trees are powerful and popular tools for classification and prediction. Decision tree learning, used in statistics, data mining, and machine learning uses a decision tree as a predictive model which maps the observations about an item to the conclusions about the item's target value.
Implementation Steps
Assume that: DM_PAL is a schema belonging to USER1, who has been granted the privilege of executing SYSTEM.afl_wrapper_generator; and USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE role.
Step 1 Input customer data and use the K-means function to partition the data set into K clusters. In this example, nine rows of data will be input. K equals 3, which means the customers will be partitioned into three levels. SET SCHEMA DM_PAL; DROP TYPE PAL_KMEANS_RESASSIGN_T; CREATE TYPE PAL_KMEANS_RESASSIGN_T AS TABLE( "ID" INT, "CENTER_ASSIGN" INT, "DISTANCE" DOUBLE ); DROP TYPE PAL_KMEANS_DATA_T; CREATE TYPE PAL_KMEANS_DATA_T AS TABLE( "ID" INT, "AGE" DOUBLE, "INCOME" DOUBLE, primary key("ID") );
SAP AG 2013
171
DROP TYPE PAL_KMEANS_CENTERS_T; CREATE TYPE PAL_KMEANS_CENTERS_T AS TABLE( "CENTER_ID" INT, "V000" DOUBLE, "V001" DOUBLE ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); -- create kmeans procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_KMEANS_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_KMEANS_RESASSIGN_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_KMEANS_CENTERS_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_KMEANS', 'AFLPAL', 'KMEANS', PDATA); DROP TABLE PAL_KMEANS_DATA_TAB; CREATE COLUMN TABLE PAL_KMEANS_DATA_TAB( "ID" INT, "AGE" DOUBLE, "INCOME" DOUBLE, primary key("ID") ); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (0 , 20, 100000); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (1 , 21, 101000); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (2 , 22, 102000); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (3 , 30, 200000); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (4 , 31, 201000);
SAP AG 2013
172
INSERT INTO PAL_KMEANS_DATA_TAB VALUES (5 , 32, 202000); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (6 , 40, 400000); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (7 , 41, 401000); INSERT INTO PAL_KMEANS_DATA_TAB VALUES (8 , 42, 402000); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('GROUP_NUMBER',3,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('INIT_TYPE',4,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('DISTANCE_LEVEL',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('MAX_ITERATION',100,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('EXIT_THRESHOLD',null,0.000001,null); INSERT INTO PAL_CONTROL_TAB VALUES ('NORMALIZATION',0,null,null); --clean kmeans result DROP TABLE PAL_KMEANS_RESASSIGN_TAB; CREATE COLUMN TABLE PAL_KMEANS_RESASSIGN_TAB( "ID" INT, "CENTER_ASSIGN" INT, "DISTANCE" DOUBLE, primary key("ID") ); DROP TABLE PAL_KMEANS_CENTERS_TAB; CREATE COLUMN TABLE PAL_KMEANS_CENTERS_TAB( "CENTER_ID" INT, "V000" DOUBLE, "V001" DOUBLE ); CALL _SYS_AFL.PAL_KMEANS(PAL_KMEANS_DATA_TAB, PAL_CONTROL_TAB, PAL_KMEANS_RESASSIGN_TAB, PAL_KMEANS_CENTERS_TAB) with overview;
SELECT * FROM PAL_KMEANS_CENTERS_TAB; SELECT * FROM PAL_KMEANS_RESASSIGN_TAB; DROP TABLE PAL_KMEANS_RESULT_TAB; CREATE COLUMN TABLE PAL_KMEANS_RESULT_TAB(
SAP AG 2013
173
"AGE" DOUBLE, "INCOME" DOUBLE, "LEVEL" INT ); TRUNCATE TABLE PAL_KMEANS_RESULT_TAB; INSERT INTO PAL_KMEANS_RESULT_TAB( SELECT PAL_KMEANS_DATA_TAB.AGE,PAL_KMEANS_DATA_TAB.INCOME,PAL_KMEANS_RESASSIGN_TA B.CENTER_ASSIGN FROM PAL_KMEANS_RESASSIGN_TAB INNER JOIN PAL_KMEANS_DATA_TAB ON PAL_KMEANS_RESASSIGN_TAB.ID = PAL_KMEANS_DATA_TAB.ID); SELECT * FROM PAL_KMEANS_RESULT_TAB;
The result should show the following in PAL_KMEANS_RESULT_TAB.
SAP AG 2013
174
Step 2 Use the above output as the training data of C4.5 Decision Tree. The C4.5 Decision Tree function will generate a tree model which maps the observations about an item to the conclusions about the item's target value. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "AGE" DOUBLE, "INCOME" DOUBLE, "LEVEL" INT ); DROP TYPE PAL_JSONMODEL_T; CREATE TYPE PAL_JSONMODEL_T AS TABLE( "ID" INT, "JSONMODEL" VARCHAR(5000) ); DROP TYPE PAL_PMMLMODEL_T; CREATE TYPE PAL_PMMLMODEL_T AS TABLE( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR(100) ); --create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_JSONMODEL_T', 'out'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_PMMLMODEL_T', 'out');
SAP AG 2013
175
GRANT SELECT ON DM_PAL.PDATA to SYSTEM; call SYSTEM.afl_wrapper_generator('PAL_CREATEDT', 'AFLPAL', 'CREATEDT', PDATA);
DROP TABLE
PAL_TRAINING_TAB; "REGION" VARCHAR(50), "SALESPERIOD" VARCHAR(50), "REVENUE" Double, "CLASSLABEL" VARCHAR(50)
CREATE COLUMN TABLE PAL_TRAINING_TAB(
); DROP TABLE PAL_CONTROL_TAB; PAL_CONTROL_TAB( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('PERCENTAGE',null,1.0,null); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('IS_SPLIT_MODEL',1,null,null); INSERT INTO PAL_CONTROL_TAB VALUES ('PMML_EXPORT', 2, null, null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',1,102001,null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',1,202001,null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',0,23,null); INSERT INTO PAL_CONTROL_TAB VALUES ('CONTINUOUS_COL',0,12,null); DROP TABLE PAL_JSONMODEL_TAB; CREATE COLUMN TABLE PAL_JSONMODEL_TAB( "ID" INT, "JSONMODEL" VARCHAR(5000) ); DROP TABLE PAL_PMMLMODEL_TAB; CREATE COLUMN TABLE PAL_PMMLMODEL_TAB( "ID" INT, "PMMLMODEL" VARCHAR(5000) ); CALL _SYS_AFL.PAL_CREATEDT(PAL_KMEANS_RESULT_TAB, PAL_CONTROL_TAB, PAL_JSONMODEL_TAB, PAL_PMMLMODEL_TAB) with overview; SELECT * FROM PAL_JSONMODEL_TAB;
CREATE COLUMN TABLE
SAP AG 2013
176
SELECT * FROM PAL_PMMLMODEL_TAB;
SAP AG 2013
177
Step 3 Use the above tree model to map each new customer to the corresponding level he or she belongs to. SET SCHEMA DM_PAL; DROP TYPE PAL_DATA_T; CREATE TYPE PAL_DATA_T AS TABLE( "ID" INT, "AGE" DOUBLE, "INCOME" DOUBLE ); DROP TYPE PAL_JSONMODEL_T; CREATE TYPE PAL_JSONMODEL_T AS TABLE( "ID" INT, "JSONMODEL" VARCHAR(5000) ); DROP TYPE PAL_CONTROL_T; CREATE TYPE PAL_CONTROL_T AS TABLE( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); DROP TYPE PAL_RESULT_T; CREATE TYPE PAL_RESULT_T AS TABLE( "ID" INT, "CLASSLABEL" VARCHAR(50) ); -- create procedure DROP TABLE PDATA; CREATE COLUMN TABLE PDATA( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); INSERT INTO PDATA VALUES (1, 'DM_PAL.PAL_DATA_T', 'in'); INSERT INTO PDATA VALUES (2, 'DM_PAL.PAL_CONTROL_T', 'in'); INSERT INTO PDATA VALUES (3, 'DM_PAL.PAL_JSONMODEL_T', 'in'); INSERT INTO PDATA VALUES (4, 'DM_PAL.PAL_RESULT_T', 'out'); GRANT SELECT ON DM_PAL.PDATA to SYSTEM;
SAP AG 2013
178
call SYSTEM.afl_wrapper_generator('PAL_PREDICTWITHDT', 'AFLPAL', 'PREDICTWITHDT', PDATA); DROP TABLE PAL_DATA_TAB; "ID" INT, "AGE" DOUBLE, "INCOME" DOUBLE ); INSERT INTO PAL_DATA_TAB VALUES (10 ,20, 100003); INSERT INTO PAL_DATA_TAB VALUES (11 ,30, 200003); INSERT INTO PAL_DATA_TAB VALUES (12 ,40, 400003); DROP TABLE PAL_CONTROL_TAB; CREATE COLUMN TABLE PAL_CONTROL_TAB ( "NAME" VARCHAR (50), "INTARGS" INTEGER, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR (100) ); INSERT INTO PAL_CONTROL_TAB VALUES ('THREAD_NUMBER',2,null,null); DROP TABLE PAL_RESULT_TAB; CREATE TABLE PAL_RESULT_TAB( "ID" INT, "CLASSLABEL" VARCHAR(50) ); CALL _SYS_AFL.PAL_PREDICTWITHDT(PAL_DATA_TAB, PAL_CONTROL_TAB, PAL_JSONMODEL_TAB, PAL_RESULT_TAB) with overview; SELECT * FROM PAL_RESULT_TAB;
The expected prediction result is as follows:
SAP AG 2013
179
Best Practices
Create an SQL view for the input table if the table structure does not meet what is specified in this guide. Avoid null values in the input data. You can replace the null values with the default values via an SQL statement (SQL view or SQL update) because PAL functions cannot infer the default values. Create the parameter table as a local temporary table to avoid table name conflicts. If you do not use PMML export, you do not need to create a PMML output table to store the result. Just set the PMML_EXPORT parameter to 0 and pass ? or null to the function. When using the KMEANS function, different INIT_TYPF and NORMALIZATION settings may produce different results. You may need to try a few combinations of these two parameters to get the best result. When using the APRIORIRULE function, in some circumstances the rules set can be huge. To avoid an extra long runtime, you can set the MAXITEMLENGTH parameter to a smaller number, such as 2 or 3.
SAP AG 2013
180

Predictive Analysis Overview 2013

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predictive Analysis Overview 2013

Uploaded by

Copyright:

Available Formats

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Target Audience Consultants Administrators SAP Hardware Partner Others

Public March 2013

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Classification Algorithms ............................................................................................ 32

Association Algorithms ............................................................................................. 109 Time Series Algorithms ............................................................................................ 120

Preprocessing Algorithms ........................................................................................ 134

Miscellaneous ........................................................................................................... 161

SAP HANA Predictive Analysis Library (PAL) Reference

End-to-End Scenarios ..................................................................................................... 171 Best Practices .................................................................................................................. 180

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Application Function Libraries (AFL)

SAP HANA Predictive Analysis Library (PAL) Reference

How to Call PAL Functions

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Parameter Table Structure

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

PAL Algorithm ABC Analysis Weighted Score Table

Built-in Function Name ABC WEIGHTEDTABLE

SAP HANA Predictive Analysis Library (PAL) Reference

Clustering Algorithms Anomaly Detection

SAP HANA Predictive Analysis Library (PAL) Reference

Signature Input Table Table Data Column 1 column Other columns

Column Data Type Integer or string Integer or double

Description ID Attribute data

Constraint It must be the first column.

THREAD_NUMBER EXIT_THRESHOLD Integer Double

Number of threads. Threshold (actual value) for exiting the iterations.

SAP HANA Predictive Analysis Library (PAL) Reference

Output Table Table Result Column 1 column Other columns

Column Data Type Integer or string Integer or double

Description ID Coordinates of outliers

It must have the same type as the input data table.

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Expected Result PAL_AD_RESULT_TAB:

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Column Data Type Integer or string Integer or double

Description ID Attribute data

Constraint This must be the first column.

THREAD_NUMBER EXIT_THRESHOLD Integer Double

Number of threads. Threshold (actual value) for exiting the iterations.

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Expected Result PAL_KMEANS_RESASSIGN_TAB:

SAP HANA Predictive Analysis Library (PAL) Reference

Column Data Type Integer Integer or double Integer Integer

Description ID Attribute data ID Class type

Data Type Integer Integer

Description Number of variables Number of threads

Column Data Type Varchar or char Double

Description Name Measure result

SAP HANA Predictive Analysis Library (PAL) Reference

SAP HANA Predictive Analysis Library (PAL) Reference

Expected Result KMEANS_SVALUE_TAB: