BDA Report Smart Home Prediction

Smart Home Prediction
A Report
Submitted for the Requirements of the Course
Big Data Analytics
by
Aditya Vikram Verma (Roll NO.- 1600161C203)

Amritesh Rai (Roll No- 160069C203)
Amber Bhargava (Roll No.- 1600168C203)
Raghav Maheshwari (Roll No.- 1600306C203)
Department of Computer Science and Engineering

BML Munjal University
April 2019
1. Introduction
This project is on the prediction of number of family members present at a particular time
and date in two houses which we name as House A and House B.
2. The Dataset
We have train (126) and test (42) data set, train data set has both input and output
variable(s). You need to predict the sales for test data set.
Variable Description
Day Day of Week
Time Time of Day
House A Number of people in house A
House B Number of people in house B
PeopleA Family members in House A
PeopleB Family members in House B
3. Methods to Predict Big mart sales
3.1 Studying the data(Pre- Processing of data)
In this section we studied the data to know about what type of data we have,how
many columns are there, which columns are of categorical type and which are of
float types, to know which columns have missing values,etc.
Code-
df=pd.read_csv('sampledata.csv')
df1= df.copy()
print(df.head())
print(df.columns)
print(df.shape)
print(df.unique())
print("*********quantative*********")
print(df.describe())
print("*********qualitative*********")
print(df.describe(include=[object]))
df[df.dtypes[(df.dtypes=="float64")|(df.dtypes=="int64")].index.valu
es].hist(figsize=[11,11])
plt.show()
print('total no of misiing values')

print(df.apply(lambda x: sum(x.isnull()),axis=0))
From this we inferred that there are 6 columns with last two columns having
categorical values. There are no columns with missing values, so we need not apply
data cleaning in this case.
3.2 Data cleaning
Not required as there were no missing values.
3.3 Data Visualizations
Bar charts are used to analyze the relation between the day and number of family
members present and also between time and number of family members present in
both the houses.
For House A
For House B
3.4 Model Building
In this section we build our model for prediction.
K-Means Analysis
In this section we applied K-MeansAlgo. and analysed the result.
# Data pre processing
dataset['Day_encoded'] = dataset['Day'].map({'Monday': 1, 'Tuesday': 2,
'Wednesday': 3, 'Thursday': 4, 'Friday': 5, 'Saturday': 6, 'Sunday': 7})
# Dividing the Dataset into x and y

X = dataset.drop(columns=["House A", "House B", "PeopleA", "PeopleB",
"Day"], axis=1)
YA = dataset["PeopleA"]
YB = dataset["PeopleB"]
# Dividing the dataset into test and train for house A

X_trainA, x_testA, Y_trainA, y_testA = train_test_split(X, YA,
random_state=0)
# Dividing the dataset into test and train for house B

X_trainB, x_testB, Y_trainB, y_testB = train_test_split(X, YB,
random_state=0)
# Creating a model for using knn for house A and house B

knnHouseA = KNeighborsClassifier(n_neighbors=2).fit(X_trainA, Y_trainA)
knnHouseB = KNeighborsClassifier(n_neighbors=3).fit(X_trainB, Y_trainB)
# Getting the prediction score for both house models using test data
scoreA = knnHouseA.score(x_testA, y_testA)
scoreB = knnHouseB.score(x_testB, y_testB)
Result From K-Means – The accuracy for House A is 90.47 and House B is 97.61
percentage.
Conclusion
We selected 2 columns for prediction - ('Day', 'Time').
After applying 4 ML Algorithms. We concluded that KNN was the best ML Algorithm
for our dataset.
References:
[1] https://medium.com/@contactsunny/linear-regression-in-python-using-scikit-learn-
f0f7b125a204
[2] http://dataaspirant.com/2017/02/01/decision-tree-algorithm-python-with-scikit-learn/
[3] https://stackabuse.com/random-forest-algorithm-with-python-and-scikit-learn/
[4] https://www.datacamp.com/community/tutorials/k-means-clustering-python
Contributions of each member:
S.No Name Contributions Signature

1 Aditya Data analysis and Data
Vikram Pre-processing and
Verma frontend
X
Aditya Vikram Vemra
2 Amber Data Visualization and

Bhargava Analysis
X
Amber Bhargava
3 Amritesh Model building and ML

Rai Algorithms Analysis
X
Amritesh Rai
4 Raghav Dataset building and part

Maheshwari of frontend
X
Raghav Maheshwari

BDA Report Smart Home Prediction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BDA Report Smart Home Prediction

Uploaded by

Copyright:

Available Formats

Smart Home Prediction

Submitted for the Requirements of the Course

Big Data Analytics

Aditya Vikram Verma (Roll NO.- 1600161C203)

Department of Computer Science and Engineering

print('total no of misiing values')

# Dividing the Dataset into x and y

# Dividing the dataset into test and train for house A

# Dividing the dataset into test and train for house B

# Creating a model for using knn for house A and house B

S.No Name Contributions Signature

2 Amber Data Visualization and

3 Amritesh Model building and ML

4 Raghav Dataset building and part

You might also like