Sensing Complex Social Systems Through Mobile Phone Data

Reality Mining: sensing complex social systems Nathan Eagle, Alex Pentland Pervasive and Ubiquitous Computing, 2006
Aim
How data collected from mobile phones can be used to uncover regular rules and structures in the behavior of both individual and organization
Mobile Phones as Wearable Sensors

Surveys are done by social scientists to learn about human behavior Usual survey techniques suffer from:
bias sparsity of data lack of continuity between discrete questionnaire
absence of dense, continuous data
Use of phones to collect data on human behavior
Bluetooth
Bluetooth is short-range RF network
10-30 meters in practice
Device-discovery is a standard among Bluetooth devices

Bluetooth MAC address (BTID), Device name, device type BTID is unique
Bluetooth scan is energy-consuming
Dataset & Privacy

Prior consent and human subject approval Dataset
100 Nokia 6600 users
75 Lab users 20 incoming masters students 5 incoming freshman
~450k hours of information about users location, communication, and usage behavior
http://reality.media.mit.edu
User modeling
Easily identifiable routines in every persons life Simple model of behavior
Home, work, elsewhere
Data collected from

Bluetooth, cell tower, temporal information from phone Incorporate information from static BT devices
BT on a desktop
User modeling
Accurate location from cell tower
Complicated as a phone can receive signals from far-away towers Accuracy gets better if user spends enough time
Distribution of time spent with a set of towers adds accuracy
Cell tower probability density functions

The probability of being associated with one of the 25 visible cell towers is plotted above for five users who work on the third floor corner of the same office building. Each tower is listed on the x-axis and the probability of the phone logging it while the user is in his office is shown on the yaxis. (Range was assured to 10 m by the presence of a static Bluetooth device.) It can be seen that each user sees a different distribution of cell towers depending on the location of his office, with the exception of Users 4 and 5, who are officemates and have the same distribution despite being in the office at different times
Office mates
Observations
Different sets of towers for users within 10 m of radius 6% of time, users were without signal 21% to 29%, users were in range of Bluetooth devices or other mobile phones Could Bluetooth be used for localization inside building during such times?
GPS does not work indoors
Encountered devices for a subject during the month of January

The subject is only regularly proximate to other Bluetooth devices between 9:00 and 17:00, while at workbut never at any other times. This predictable behavior will be defined as low entropy. The subjects desktop computer is logged most frequently throughout the day, with the exception of the hour between 14:00 and 15:00. During this time window, Subject 9 is most often proximate to Subject 4
Models for location & activity

Human life is imbued with routine access
Minute-to-minute routineyearly patterns
There is inherent randomness present among the routines Use of information entropy metric to quantify the predictable amount
A low-entropy subjects daily distribution of home/work transitions.

The most likely location of the subject: Work, Home, Elsewhere, and No Signal. While the subjects state sporadically jumps to No Signal, the other states occur with very regular frequency. This is confirmed by the Bluetooth encounters plotted below representing the structured working schedule of the lowentropy subject
A low-entropy subjects daily distribution of encountered Bluetooth devices.
Entropy across demographics

Entropy, H(x), was calculated from the {work, home, no signal, elsewhere} set of behaviors for 100 samples of a 7-day period. The Media Lab freshmen have the least predictable schedules, which makes sense because they come to the lab much less regular basis. The staff and faculty have the most least entropic schedules, typically adhering to a consistent work routine
User modeling
Role of time is very clear in predicting user behavior Uses HMM and EM to model and trains with 1 month of data 95% accuracy achieved
Mobile Usage Pattern

35% of subjects use the clock application regularly
Yet it takes 10 keystrokes to open the application More used at home
Not much use of sophisticated features Snake used as much as elaborate media player
Average application usage in three locations (other, work, and home) for 100 subjects.
The x-axis displays the fraction of time each application is used, as a function of total application usage. For example, the usage at home of the clock application comprises almost 3% of the total times the phone is used. The phone application itself comprises more than 80% of the total usage and was not included in this figure
Data characterization and validation

Data stored on a flash memory card
Flash memory cards have finite number of readwrite cycles
Frequent updates led to corruption of memory cards

10 cards were lost
Later increments were done in RAM and final logs were written to the card
Bluetooth errors
Several technical issues in verifying the accuracy of collected data
10m range with ability to penetrate walls Periodical scans miss short proximity event A device may not be discovered (1% to 3%) Application crash (once every three days)
Redundancy could be leveraged
Most of the time, above problems were identified as noise

Logs help in finding anamolies
Human-induced errors
Two main errors
Phone being off
Battery exhausted Explicit turn-off
1/5 of users do it regularly classrooms, night, movies. Log is time-stamped before the turn-off
Separated from user

Phone is on but not carried by the user
More severe problem
Human-induced errors
Forgetting phone
30% claim of never forgetting it 40% claim once every month 30% claim once every week
A Forgotten phone classifier Identifying a forgotten phones is challenging

Subject could be sick Casually moved beyond 10m of phone
Not enough unique features
Missing data
Major causes
Data corruption Powered-off devices
Logs accounting for 85.3% of the time

<5% : data corruption Rest: powered-off devices by 1/5th of users
Surveys
Subjects were also surveyed about their social network For senior students
High correlation
Logged BTID and dyadic self-report/proximity data
For incoming students

Not significant correlation
Community structure
Human landmarks
Who the user will meet can be guessed
Relationship inference
Nature of association can be inferred
Used GMM for clustering
Proximity Frequency
Proximity networks
Different than the organizational structure
Structured around the faculty director
Hub-and-spoke with changing roles Proximity n/w data is extremely dynamic and sparse. Deadlines bring more reliance on support of the group
Exploring dynamics of a group in response to both external and internal stimuli
Proximity networks
Peoples free time and schedules shift dramatically to met deadlines and project goals
Spending much of the night in lab just before the event
How the aggregate work cycles expand in reaction to global deadlines

Visit of sponsore
Conclusions
First paper to log data at such a magnitude and depth Provides ethnographic studies, individual user modeling, group user modeling

Sensing Complex Social Systems Through Mobile Phone Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sensing Complex Social Systems Through Mobile Phone Data

Uploaded by

Copyright:

Available Formats

Reality Mining: sensing complex social systems Nathan Eagle, Alex Pentland Pervasive and Ubiquitous Computing, 2006

Mobile Phones as Wearable Sensors

Use of phones to collect data on human behavior

Device-discovery is a standard among Bluetooth devices

Bluetooth scan is energy-consuming

Dataset & Privacy

Data collected from

Cell tower probability density functions

Encountered devices for a subject during the month of January

Models for location & activity

A low-entropy subjects daily distribution of home/work transitions.

A low-entropy subjects daily distribution of encountered Bluetooth devices.

Entropy across demographics

Mobile Usage Pattern

Data characterization and validation

Frequent updates led to corruption of memory cards

Most of the time, above problems were identified as noise

Separated from user

A Forgotten phone classifier Identifying a forgotten phones is challenging

Logs accounting for 85.3% of the time

For incoming students

Used GMM for clustering

How the aggregate work cycles expand in reaction to global deadlines

You might also like