Professional Documents
Culture Documents
Panagiotis Papadimitriou
papadimitriou@stanford.edu
Hector Garcia-Molina
hector@cs.stanford.edu
Leakage Problem
Name: Sarah
Sex: Female . Name: Mark
Sex: Male
. Jeremy Sarah App. U1 App. U2 Mark
Kathryn
Stanford Infolab
Outline
Problem Description Guilt Models
Pr{U1 leaked data} = 0.7 Pr{U2 leaked data} = 0.2
Distribution Strategies
Stanford Infolab
Stanford Infolab
Problem Entities
Entity Distributor Facebook Dataset T Set of all Facebook profiles
R1, , Rn Ri: Set of peoples profiles who have added the application Ui
Leaker
Stanford Infolab
Explicit
All people who added application
(example we used so far)
Stanford Infolab
Stanford Infolab
p p
Guilty Agent: Agent who leaks at least one profile Pr{Gi|S}: probability that agent Ui is guilty, given the leaked set of profiles S
Stanford Infolab
p(1-p)
(1-p)p
or
or
(1-p)2
or
Stanford Infolab 9
Pr{G2}
Pr{G2}
Pr{G1} Pr{G1}
Stanford Infolab
10
Stanford Infolab
11
S (leaked)
R1
R3 R4
U3 U4
Stanford Infolab
Pr{G1|S}>>Pr{G2|S}
Pr{G1|S}>> Pr{G4|S}
12
1 Ri
R R
j i i
, i, j 1,..., n
Intuition: Minimized data sharing among agents makes leaked data reveal the guilty agents
Stanford Infolab 13
U1 U2 U3 U4
U1
U2
U3 U4
Stanford Infolab
15
U1 U2 U3 U4
1 Ri
Stanford Infolab
16
Stanford Infolab
17
Distribution Strategies
Sample Data Requests
The distributor has the freedom to select the data items to provide the agents with General Idea:
Provide agents with as much disjoint sets of data as possible
Explicit Data Requests The distributor must provide agents with the data they request General Idea:
Add fake data to the distributed ones to minimize overlap of distributed data
Problem: There are cases where the distributed data must overlap E.g., |Ri|++|Rn|>|T|
Problem: Agents can collude and identify fake data NOT COVERED in this talk
Stanford Infolab 18
Conclusions
Data Leakage Modeled as maximum likelihood problem Data distribution strategies that help identify the guilty agents
Stanford Infolab
19
Thank You!