You are on page 1of 5

Smart Home Prediction

A Report

Submitted for the Requirements of the Course

Big Data Analytics

by

Aditya Vikram Verma (Roll NO.- 1600161C203)


Amritesh Rai (Roll No- 160069C203)
Amber Bhargava (Roll No.- 1600168C203)
Raghav Maheshwari (Roll No.- 1600306C203)

Department of Computer Science and Engineering


BML Munjal University
April 2019
1. Introduction
This project is on the prediction of number of family members present at a particular time
and date in two houses which we name as House A and House B.
2. The Dataset
We have train (126) and test (42) data set, train data set has both input and output
variable(s). You need to predict the sales for test data set.
Variable Description
Day Day of Week
Time Time of Day
House A Number of people in house A
House B Number of people in house B
PeopleA Family members in House A
PeopleB Family members in House B
3. Methods to Predict Big mart sales
3.1 Studying the data(Pre- Processing of data)
In this section we studied the data to know about what type of data we have,how
many columns are there, which columns are of categorical type and which are of
float types, to know which columns have missing values,etc.
Code-
df=pd.read_csv('sampledata.csv')
df1= df.copy()

print(df.head())
print(df.columns)
print(df.shape)
print(df.unique())

print("*********quantative*********")
print(df.describe())
print("*********qualitative*********")
print(df.describe(include=[object]))
df[df.dtypes[(df.dtypes=="float64")|(df.dtypes=="int64")].index.valu
es].hist(figsize=[11,11])
plt.show()

print('total no of misiing values')


print(df.apply(lambda x: sum(x.isnull()),axis=0))

From this we inferred that there are 6 columns with last two columns having
categorical values. There are no columns with missing values, so we need not apply
data cleaning in this case.
3.2 Data cleaning
Not required as there were no missing values.
3.3 Data Visualizations
Bar charts are used to analyze the relation between the day and number of family
members present and also between time and number of family members present in
both the houses.

For House A
For House B
3.4 Model Building
In this section we build our model for prediction.
K-Means Analysis
In this section we applied K-MeansAlgo. and analysed the result.
# Data pre processing
dataset['Day_encoded'] = dataset['Day'].map({'Monday': 1, 'Tuesday': 2,
'Wednesday': 3, 'Thursday': 4, 'Friday': 5, 'Saturday': 6, 'Sunday': 7})

# Dividing the Dataset into x and y


X = dataset.drop(columns=["House A", "House B", "PeopleA", "PeopleB",
"Day"], axis=1)
YA = dataset["PeopleA"]
YB = dataset["PeopleB"]

# Dividing the dataset into test and train for house A


X_trainA, x_testA, Y_trainA, y_testA = train_test_split(X, YA,
random_state=0)

# Dividing the dataset into test and train for house B


X_trainB, x_testB, Y_trainB, y_testB = train_test_split(X, YB,
random_state=0)

# Creating a model for using knn for house A and house B


knnHouseA = KNeighborsClassifier(n_neighbors=2).fit(X_trainA, Y_trainA)
knnHouseB = KNeighborsClassifier(n_neighbors=3).fit(X_trainB, Y_trainB)

# Getting the prediction score for both house models using test data
scoreA = knnHouseA.score(x_testA, y_testA)
scoreB = knnHouseB.score(x_testB, y_testB)

Result From K-Means – The accuracy for House A is 90.47 and House B is 97.61
percentage.
Conclusion
We selected 2 columns for prediction - ('Day', 'Time').
After applying 4 ML Algorithms. We concluded that KNN was the best ML Algorithm
for our dataset.

References:
[1] https://medium.com/@contactsunny/linear-regression-in-python-using-scikit-learn-
f0f7b125a204
[2] http://dataaspirant.com/2017/02/01/decision-tree-algorithm-python-with-scikit-learn/
[3] https://stackabuse.com/random-forest-algorithm-with-python-and-scikit-learn/
[4] https://www.datacamp.com/community/tutorials/k-means-clustering-python
Contributions of each member:

S.No Name Contributions Signature


1 Aditya Data analysis and Data
Vikram Pre-processing and
Verma frontend
X
Aditya Vikram Vemra

2 Amber Data Visualization and


Bhargava Analysis

X
Amber Bhargava

3 Amritesh Model building and ML


Rai Algorithms Analysis

X
Amritesh Rai

4 Raghav Dataset building and part


Maheshwari of frontend

X
Raghav Maheshwari

You might also like