Professional Documents
Culture Documents
NG *****************/
There are three keywords to this course:
1.Credit - Funds lent out on 'credit' of the borrowers to be repaid late
r.
2.Risk - An environment of uncertainity. It leads to a randomness in the
cash flows. Risk, in business, would refer to a situation where there is uncerta
inity
in the outcome creating a randomness in cash flows.
3.Modelling - A model is a replication of a real time business problem. M
odels help us in identifying the ex-ante the probable outcomes that can
happen. Models are constructed under certain assum
ptions and they help in predicting the possible average outcomes based on the pr
esent average
outcomes embodied in the data.
How does a bank make profit from credit business?
-> A bank borrows from a low risk segment and lends out to a high risk se
gment. It offers a lower rate of interest to the depositors and charges a higher
rate
of interest from its borrowers. For example: Mr.A keeps Rs.10000
in ABC bank. Mr.B has a requirement of Rs.10000 for personal loans. So ABC bank
s uses the
money kept by Mr.A to finance Mr.B. Now the market rate of
interest on personal loans is 14% and the rate of interest on savings account is
4%. Then the
profit made by ABC on each Rupee lent out is (14%-4
%) =10%.
-> Overdrafts is a type of revolving credit that banks offer on current a
ccounts. How does bank make money on O/D? If an account is overdrawn the bank ge
nerally
borrows from the treasury and gives the money to the customer. The rat
e of interest on borrowing from treasury is lower than the rate of interest paid
on the
current account. Therefore, the interest differential is the profit of
the bank or the Credit interest income. Now some banks also gives interest on c
urrent
accounts if some funds are maintained there. So in this case, the bank
lend these funds to treasury and earn interest. So, the interest income from th
is
source is the interest earned from treasury less the debit interest
given to the customer. This is known as the Debit interest income. Now, the Cre
dit
interest income + Debit Interest income is together called the Net
Interest Income (NII).
How is the risk coming into play?
-> Now Suppose after paying back Rs.3000 Mr.B defaults on the payment and
he does not give it back.
-> This is the CREDIT RISK, where the borrower does not pay back th
e loan. This causes a randomness in the cash flows which can manifest into diffe
rent
kinds of risk.
The next day Mr.A comes in to withdraw his money. What happens?
-> The first risk that happens is: The bank cannot pay Mr.A the ent
ire sum of money. Hence, the bank goes in for liquidation. They ask authorities
to bail
them out. This is known as Liqui
dity Risk.
-> Once the files in for Liquidation, the market loses confidence o
n the bank. Therefore, the Reputation of the bank is damdged. So This is the cas
e of
Reputational risk.
Different kinds of risk that the bank faces:
1. Credit Risk: This is the risk arising from lending out to borrower's. Hence
the factors which explain the borrower's risk also explain the credit risk.
2. Market Risk: This is the risk of losses experienced due to fluctuation in th
e financial system or the entire market. Such risks include: Stock market crashe
s,
Price Fluctuations, interest rate fluctuations etc.
3. Operational Risk: Risks resulting from the breakdowns in the internal proced
ures and operational inefficiencies of the bank. For ex: Server breakdown, Impro
per
working of ATMs etc.
4. Reputational Risk: Risk arising from the negative perception on part of the
customers.
5. Liquidity Risk: Risk that asset owner is unable to recover the final value o
f the asset sold.
How can the Credit risk managed?
-> Identify the sources of the credit risk: Borrower's capacity of paying bac
k the loan, Stability of the borrower, Willingness of the borrower to pay back,
Changes
in the customer's risk pr
ofiles, exposure of the borrower to the macroeconomic fluctuation.
-> Manage these sources from where the risks are likely to arise.
-> To manage these sources of the borrower's risk, we need to identify what a
re the major drivers of borrower's risk. That is where the roll of credit risk m
odels
largely come in.
For Ex: A customer applies for a loan. The bank asks for all the Pre-requisite i
nformation of the customer and feeds them into the system. The business problem
here is: Whether the customer is a Good customer or Bad Customer. So if I am dev
eloping a model to identify whether an applicant is good or bad, then I need to
technically define the dependent variable for my model. The dependent variable h
ere is:
Y = 1 if the customer is good
= 0 otherwise.
We try to identify the chances of a customer to be a good customer. In the stati
stical paradigm, what we do is: Calculate the probability of the customer to be
a good or a bad customer, i.e. based on the variables we have to explain the beh
aviour of an average customer we predict P(Y=1) and resultantly P(Y=0). Now, the
decision rule which help us to do this is known as the Application score. From
P(Y=1) an application score is calculated and there is a pre-defined application
cut-off score. Therefore, if the application score of an applicant lies above t
he cut-off score then the customer is given the loan and if the score lies below
the application cut-off score, the application is rejected. This is known as th
e Application Scorecard. This is a basic credit risk model which helps in managi
ng the credit risk that may arise at the time of originations of accounts in the
books of the banks.
Now suppose the customer who applied for the loan has been granted it. So, if he
is granted the loan it would mean that he passed all the application criteria a
nd the affordability requirement. So, the bank is mostly sure that the customer
has the capacity to pay back the loan. Now, what they would be concerned with is
: Is the customer willing to pay the loan? This calls for analysing the behavio
ur of the customer. Banks want to understand the chances of the customer to defa
ult over the next 12 months. So the Business problem of the customer would be:
Y =1 if the customer defaults over the next 12 months
=0 otherwise.
So the bank wants to asses
objective is to calculate
the customer is assigned a
the Probability of default
score.
uld be a risk to the bank. Most profitable customers for the bank are those who
revolve a decent percentage of their credit line and make payments regularly to
keep things within check. So the variables which capture the willingness of the
customer to pay back is most important to capture the default probabilities of t
he customer. If given the capacity to pay back, the willingness of the customer
to pay back is low then the chances of default will be higher. However, it can a
lso be the case that the capability to payback of the customer has also fallen.
Therefore, the final model variables in a behaviour scorecard model of a credit
card must capture both these aspects of the customers. These are captured using
the derived variables.
What are derived variables? -> derived variables are variables which are de
rived from the raw data variables. Raw variables are also known as Primary varia
bles. Some primary variables which are seen in the credit card datasets:
1. Date Variables: Account Opening Date, Date of first Origin with the Bank
, Application Date, Date of Last Purchase, Date of Last Payment, As on Date, Acc
ount
Closing Date.
2. Unique identifiers or ID variables: CustomerID, AccountID, ProductID etc
.
3. Categorical Variables: Product_type, VIP_status, Fraud_Bankruptcy_indica
tor, Delinquency_status,Account Status etc.
4. Numerical Variables: Amount of Payments made, No of Payments made, Credi
t Limit, Balance Outstanding, Purchase amount, No.of purchases made, Number of C
ash
payment made, Amount of Cash payments made, Sal
ary or the income of the customer etc.
Derived Variables are obtained as a function of the Primary variables. Some impo
rtant derived variables that we can frame here are as follows:
1. Highest Balance Ever in the last 12 months: Let the balance outstanding be
BO1, BO2, .... , BO12 for the last 12 months. Then the Highest Balance ever for
the
customer is = MAX (BO1, BO2,...BO12). Why is this derived variable
important to me? -> The higher the value of this variable for a given account me
ans higher
is the tendency of the account to revolve the credit.
Is it possible for me to say that a customer whose highest balance e
ver in the last 12 months is $2000 is more risky than a customer whose highest b
alance
ever in the last 12 months is $ 900?
-> This variable gives a partial picture about the customer's r
iskiness. Now it might be that the person who maintains $2000 as the highest bal
ance in
the last twelve months, generally does not have that
high an outstanding, or even if he has, he repays it over time. Therefore, we n
eed variables
which will capture other aspects of the borrow
er like his average outstanding and his payment propensities.
2. Average outstanding in the last twelve months: Sum of the total outstanding
balances/ total number of months(12). This gives us an idea about the average
outstanding balance of a given account.
Using 1 and 2 we may create a third variable as well. Let us call this
variable as:
V001 = Average balance in the last 12 months/ highest Ba
lance ever in the last 12 months.
For a particualr account: Average bal = $1800 and highest balance = $2
000, therefore V001 = 0.9
the d
in th
of th
So th
Account_id
PL_12356
PL_12356
PL_12356
PL_12356
As_on_date Default_flag
31.03.2015
0
30.06.2015
0
30.09.2015
1
31.12.2015
1
Variable_Category
Score
Current
Utilisation
< 5%
[5%,15%]
[15%,25%]
[25%,40%]
> 50%
28
21
17
10
03
%time in
delinquency
> 0 in the
last 12
months
<5%
[5%,25%]
[25%,50%]
> 50%
37
18
09
00
Number of
months with
purchase > 0
in the last
12 months
< 3
[3,6]
[6,9]
> 9
27
20
16
05
Month on
books
< 6
[6,18]
>18
00
25
37
The base score is 88. So if a persone is having a score below 88 then he would b
e a defaulter. Else, if he has a score above 88 he will not be considered a defa
ulter.
An account has an outstanding of 20000 and the card limit is 50000. This custome
r had defaulted once in the last 12 months. He generally purchase on his credit
card for just three months and tries to repay a majority of the amount within th
e next 30 days. Now score would be:
Current Utilisation = 20000/50000 = 0.4 = 40% -> Score = 10
%time of delinquent in the last 12 months = 8% -> Score =18
Number of months with purchases > 0 = 3 -> Score = 20
Total score of the customer = 10+18+20 = 48. Now 48 < 88. Therefore, the custome
r is Bad as per the developed scorecard.
How to build a scorecard? -> We would discuss the main steps of buliding a score
card.
/* **** STEPS OF BUILDING A SCORECARD **** */
Step-1. Understanding the Business Problem.
-> Justification to the business about the model development. It involves
putting forward arguements as to why the proposed model is to be developed.
To provide the justification following are the important lines of argu
ement that are seen frequently:
a. The existent model in place is not performing well in terms of
stability, accuracy or distinguishing capacity. Therefore, the reasons for the
improper working of the present model has to be identified. So any scorecard mod
el development, in reality, begins with the validation of the existent model.Fol
lowing
are some of the important observations that the model developer can make:
-> The model's discriminatory power has deteriorated : Deteriora
tion of a scorecard is identified through the Change in Gini coefficient.
-> The population for which the scorecard was developed has chan
ged: A huge shift in Population Stability Index is observed.
-> The variables used in the model has changed over time in term
s of the characteristic : A huge shift in the Variable Deviation Index and Chara
cter Stability Index is observed.
-> The segments in the model has changed - > The Segments do not
rank order. It may be that over time the segments have shrunk in size which pre
vents proper rank ordering.
LIST OF CONCEPTS :
1. GINI COEFFICIENT 2. POPULATION STABILITY INDEX 3.CHARACTERISTIC S
TABILITY INDEX 4.VARIABLE DEVIATION INDEX 5.RANK ORDERING 6.SEGMENTATION
-> Analysis of the portfolio: A portfolio is defined as a collection of lo
ans. It is characterised by the Number of Accounts and the Recievables (or the b
alance outstanding) for the portfolio. A further deep dive analysis of the portf
olio comprises of understanding the balance by the delinquency buckets. Looking
into the distribution of the accounts and the balance by the delinquency buckets
gives the analyst an idea about the riskiness associated with the portfolio.
Step-2. Defining the dependent variables and understanding the relevant independ
ent variables.
-> The dependent variable in a credit risk mmodel is the variable which i
s to be modelled. For ex: In an application scorecard, the probability of the cu
stomer
to be a 'good' customer is estimated. Therefore, the dependent variable will be:
Y = 1 if the customer is good
= 0 otherwise -> To develop a model to predict the chances o
f the customer to be a good customer, i.e. P (Y =1).
Similarly, for developing a behaviour scorecard, the dependent variable in the m
odel is:
Y = 1 if the customer defaulter
= 0 otherwise. -> The problem is to model the P (Y=1) i.e. t
he chances of the customer to be a defaulter.
Dataset B
IDVar X3 X4
1
2
3
->
Master_Data
IDVar X1 X2 X3 X4
1
2
3
The variable IDVar is the unique merging key. Now one data challenge can be that
the ID variables are maintained in a different format across the two datasets.
Then merging them becomes very difficult. Therefore, we must ensure that the uni
que merging key exists. For retail banking data: Customer ID and Account ID are
the two very widely used used merging keys for banking data. For Commercial port
folios: Obligor Id and Transaction ID are used as the common merging keys. (MERG
ING AND APPENDING IN SAS - data step merge and proc sql joins)
4. All the variables are in sync with their business definitions. for ex:
Loan outstanding -> This variable cannot have a zero value for accounts which a
re on books. If an account is still on books, it means that some fraction of the
loan is still outstanding. Therefore, Loan outstanding > 0. If Loan outstanding
is 0 and the account is stilll open then it means that there is some data issue
. This is actually a missing observation under Loan outstanding which has been d
efined as 0. So, it is suggested that for such variables a frequency distributio
n is done for zero and non-zero values and it is checked that the accounts which
are loan outstanding = 0 are actually closed. Another check of this kind is to
see whether the account opening date > As on date. If this condition holds true
then it means that there is some issue on the data entry side and the actually t
he account opening date is missing. Such missing values needs to be addressed.
Step 4 - Data Quality Checks : One of the most important mandates faced by the b
anks is to ensure that they maintain modelling data of sufficiently high standar
ds. To ensure the robustness of the data, certain checks need to be performed. I
n the risk modelling domain datasets are maintained at different frequencies of
time. For eg: Some organisation maintain their data in quarterly interval, some
others maintain their data in monthly intervals. Banks mostly maintain data on a
monthly basis. Monthly snapshots are used for model development exercise. Now,
when data over a long horizon of time is used, then it becomes neccessary to che
ck whether the behaviour of a variable is consistent or robust over time or not.
A list of checks are performed on the characterisitics of the data. Such checks
are known as Data quality checks.
Some basic Nuances of data quality check procedure:
a. To check for recent database changes or changes in the data architec
ture of the organisation -> If there is a database change in the neighbourhood o
f the model development then it is important to check for the common variables b
etween the two databases and identify whether they have the same values recorded
at a given point of time. Also, the distribution of the common variables needs
to be checked in the immediate neighbourhood of the time where the change of the
database took place.
b. To check for the presence of variables over time -> (First Occurence
and Last Occurence Analysis). For building business models like credit scorecar
ds this analysis is not always important because the span of time over which the
data is considered is not very large. Howerver, for developing Basel models thi
s analysis is important because sufficiently long period of time is considered f
or the model development data. Such long periods are enough to be considered for
policy changes and changes in the management decisions. Thus, First occurence a
nd the last occurence analysis is important for regulatory model building.
c. To check for relevant variables and observations: For developing cre
dit risk models all the variables in the database cannot be used. Those variable
s which reflect credit and borrower's risk will be included. There are two types
of exclusions that we talk about: Observation exclusion and Performance exclusi
ons. Observations which satisfy exclusion criteria in the Observation period are
called observation exclusions. Similarly, accounts which satisfies exclusion cr
iteria in the performance period are known as Performance exclusions. Any variab
les which include operational risk or market risk are to be removed. For ex: any
operational variable like Cheque_Bk_number, Branch_code, etc are removed from t
he analysis since they reflect operational aspects of the information. Similarly
, all accounts (observations) are not used in the model development exercise. Ac
counts which are Fraud or bankrupt are removed since they are operational risk f
actors. Similarly, accounts which are deceased, closed or immaterial accounts ar
e removed from the analysis. For credit cards 'lost cards' also form an importan
t exclusion criteria.
d. Identifying the missing percentage in the data: Some variables must b
e looked into for missing observations particularly - ID variables, Date variabl
es, Categorical variables (like product type, account type etc.).
e. Descriptive Univariate analysis of numerical and categorical variables
: For numerical variables univariate analysis comprises of the basic measures of
central tendency - Mean, Median , Mode , basic measures of dispersion - range,
variance, standard deviation, Measures of location - Percentiles, Deciles, Quart
iles etc. The measures of location are very important for identifying and treatm
ent of outliers. For categorical variables the frequency distribution is used to
analyse the behaviour of the variable over time. (PROC UNIVARIATE in SAS)
How do we infer about abnormal trends in the data quality exercise?
-> A RED-GREEN trigger is used to identify abnormal trends in the beh
aviour of the values. A normal distribution is a symmetric distribution of a var
iable about its mean. This identifies with a distribution which does not have an
y assymetry created by the presence of extreme observations. Assuming that a var
iable has a symmetric frequeny distribution, 99.97% of the observations are expe
cted to lie within +-3*std_deviations of the mean. So, if a standard normal vari
ate is created:
Z = (Value - Mean)/Std_Deviation then if Z > 3 then a RED is tri
ggered Else if Z < 3 then a GREEN is triggered.
For ex: There is a balance_outstanding variable for twelve months in 2015.For ea
ch month the average value is calculated and the trend of that average is checke
d
201501 201502 201503 201504 201505 201506 201507 21050
8 201509 201510 201511 201512
Mean_Outstnd_bal
1500 1800 1650 1570 1770 1590 1700 1850
2000 15000 1680 2200
For 201510 we see that the standard normal variate is greater than 3. Therefore
, we can say that there was an abnormal trend in the month.
Step 5 - Variable Selection Process : This process basically describes the techi
nques of selecting the independent variables in the model. This acts as a waterf
all of variables and helps us to zero down on the most important variables which
we would need for developing our scorecard.
Given a variable X1 for my scorecard, when will I consider it to be a
potential variable for my model? (Remember the dependent variable was Y =1 if e
vent Y =0 if non-event) -> X1 will be included as an explanatory variable for Y
if it has the capacity to distinguish between Y=1 and Y =0 group. What are the T
echniques that will help me know whether this model has the capacity to distingu
ish between Y =1 and Y =0?
-> Parametric and non-parametric Mean-difference tests (For numerica
l variables)
-> Kolmogorov-Smirnov tests (KS test) -(For Categorical variables)