You are on page 1of 16

Employee Turnover Prediction

TEAM-1

• A.BHANU PRAKASH
• K.HANEETH
• R.SIREESHA
• N.ANJALI
Case Study:

The dataset consists of 15000 observations of 10 features,which are described below:


 satisfaction level (0–1)
 last evaluation (Time since last evaluation in years)
 number projects (Number of projects completed while at work)
 average_monthly_hours (Average monthly hours at workplace)
 time_spend_company (Time spent at the company in years)
 Work accident (Whether the employee had a workplace accident)
 left (Whether the employee left the workplace or not (1 or 0))
 promotion_last_5years (Whether the employee was promoted in the last five years)
 sales (Department in which they work for)
 salary (Relative level of salary)
Describe():
Problem Statement and solution:

 The main objective of the case study is to retain the employees the
organization.
 The main objective of the prediction is to reduce the attrition rates of the
organization using analytical methods.
 Analytics can help organizations control employee turnover through
predictive models which can be used for developing strategies.
Data Exploration:

 First of all, let us find out the number of employees who left the company
and those who didn’t:

 There are 3571 employees left and 11428 employees stayed in our
data002ELet us get a sense of the numbers across these two classes:
Data Visualization:

 Let us visualize our data to get a much clearer picture of the data and the
significant features.
 Bar chart for department employee work for and the frequency of
turnover:

 It is evident that the frequency of employee turnover depends a great


deal on the department they work for. Thus, department can be a good
predictor of the outcome variable.
Logistic Regression Model

 Logistic regression is another technique borrowed by machine learning


from the field of statistics.
Random Forest

 Random Forest is a flexible, easy to use machine learning algorithm that


produces, even without hyper-parameter tuning, a great result most of the
time.
Support Vector Machine

 “Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used
for both classification or regression challenges.
 SVM is computationally very expensive to tune it’s hyperparameters for two reasons:
With big datasets, it becomes very slow.
It has good number of hyperparameters to tune that takes very long time to tune
on a CPU.
Benefits of Turnover Prediction:

This model is helpful while making the following decisions:


 Evaluation of employee requirements, their strengths and weaknesses
 Minimize cost of new talent acquisition based on the employee profiling and
company requirements
 Analysis and assessment of the loss in expertise and skillsets
 Measurement of financial and productivity loss due to attrition
 Able to plan and minimize the loss
 Provides good understanding of workforce supply and demand
 Able to prepare contingency plans based on the insight and foresight
provided by the model
Output:

 The output depends on the chosen model. For instance, ‘logistic model’
produces scorecards for employees based on their predicted ‘attrition risk’
parameters; while the classification model catalogues the employees into
wider parameters, such as-more likely or less likely to quit, high risk or low
risk, etc.
 However, the bottom line is to keep it simple enough to understand and
implement accordingly. Changing the various factors help in assessing the
impact of changes and making the right decisions.
Conclusion:

 Let’s conclude by printing out the test accuracy rates for all classifiers
we’ve trained so far and plot ROC curves. Then we will pick the classifier
that has the highest area under ROC curve.
 Random Forest has higher accuracy rate and an f1-score with 99.27% and
99.44% respectively. Therefore, we safely say Random Forest outperforms
the rest of the classifiers. Now let’s look at the feature importance of
random forest classifier.

You might also like