Professional Documents
Culture Documents
rubenszmm@gmail.com
http://github.com/RubensZimbres
NAÏVE
BAYES
MIXTURE
MODELS
𝑃 𝑐 𝑎 . 𝑃(𝑎) 𝑃 𝐵 = 𝑃 𝐵|𝐴 . 𝑃(𝐴)
𝑃 𝑎𝑐 =
𝑃(𝑐)
BAYES
OPTIMAL
CLASSIFIER
MIXTURE
OF
GAUSSIANS
ANOMALY
DETECTION
arg max 𝑃 𝑥 𝑇 . 𝑃(𝑇|𝐷)
1 1 𝑥−𝑥 !
𝑃 𝑥𝑥 = . 𝑒𝑥𝑝 −
2𝜋𝜎 ! 2 𝜎
NAÏVE
BAYES
CLASSIFIER
𝑁! 𝐶! + 𝑁! 𝐶!
𝑍!" =
𝑁! + 𝑁!
arg max 𝑃 𝑆𝑝𝑜|𝑇𝑜𝑡 . 𝑃(𝑆𝑜𝑐|𝑆𝑝𝑜)
𝑃(𝑍!" ) → 0.50
BAYES
MAP
(maximum
a
posteriori)
EM
ALGORITHM
ℎ!"# = arg max 𝑃 𝑐|𝑎 . 𝑃(𝑎)
𝑃 𝑥 . 𝑃 𝑥|𝑥
𝐸 𝑠𝑡𝑒𝑝 𝑃 𝑥|𝑥 =
𝑃 𝑥 .𝑃 𝑥
MAXIMUM
LIKELIHOOD
𝑃(𝑥|𝑥)
𝑀 𝑠𝑡𝑒𝑝 𝑃 𝑥′ =
ℎ!" = arg max 𝑃 𝑐|𝑎
𝑛
𝐸 𝑠𝑡𝑒𝑝 𝑃 𝑥|𝑥 = 𝐴𝑠𝑠𝑖𝑔𝑛 𝑣𝑎𝑙𝑢𝑒
TOTAL
PROBABILITY
𝑀 𝑠𝑡𝑒𝑝 𝑃 𝑥′ = 𝑃(𝐵 = 1|𝐴 = 1, 𝐶 = 0)
𝑇𝑜𝑡𝑎𝑙𝑃 𝐵 = 𝑃 𝐵|𝐴 . 𝑃(𝐴)
𝑑 𝑓(𝑥) 𝑓′ 𝑥 𝑔 𝑥 + 𝑓 𝑥 . 𝑔′(𝑥)
LAPLACE
ESTIMATE
(small
samples)
=
𝑑𝑥 𝑔(𝑥) 𝑔(𝑥)!
𝐴 + 0.5 𝑑 𝑑
𝑃 𝐴 =
2𝑓 𝑥 = 2 𝑓 𝑥
𝐴+𝐵+1 𝑑𝑥 𝑑𝑥
BAYESIAN
NETWORKS
𝑑 𝑑 𝑑
𝑓 𝑥 +𝑔 𝑥 = 𝑓 𝑥 + 𝑔 𝑥
𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑡𝑢𝑝𝑙𝑒𝑠 ¬ 𝑓𝑜𝑟 𝑦 = 0 ∧ 𝑦 = 1
𝑑 𝑑 𝑑
LIMITS
𝑓 𝑥 + 2𝑔 𝑥 = 𝑓 𝑥 + 2 𝑔 𝑥
𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑓 𝑥 + ℎ − 𝑓(𝑥)
lim
!→! ℎ CHAIN
RULE
ℎ = Δ𝑥 = 𝑥′ − 𝑥
𝑑
𝑔 𝑓 𝑥 = 𝑔! 𝑓(𝑥) . 𝑓′(𝑥)
𝑑𝑥
solve
f(x)
apply
in
g’(x)
DERIVATIVES
𝜕 !
𝑥 = 𝑛. 𝑥 !!!
𝜕𝑥
VARIANCE
𝜕 ! 𝜕𝑦 ! 𝜕𝑦
𝑦 = .
𝜕𝑥 𝜕𝑦 𝜕𝑥 (𝑥 − 𝑥)!
𝑉𝑎𝑟 =
𝑛−1
PRODUCT
RULE
𝑑 STANDARD
DEVIATION
𝑓 𝑥 . 𝑔 𝑥 = 𝑓′ 𝑥 𝑔 𝑥 + 𝑓 𝑥 . 𝑔′(𝑥)
𝑑𝑥
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
COVARIANCE
LOSS
𝐿𝑜𝑠𝑠 = 𝐵𝑖𝑎𝑠 ! + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 ! + 𝑁𝑜𝑖𝑠𝑒
𝑥 − 𝑥 . (𝑦 − 𝑦)
𝐶𝑜𝑣 =
𝑛−1
SUM
OF
SQUARED
ERRORS
(𝑦 − 𝑦)!
CONFIDENCE
INTERVAL
𝐸𝑤 =
2
𝜎
𝑥 ± 1.96
𝑛 COST
FUNCTION
(𝑦 − 𝑦)!
𝐽 𝜃! ≔ 𝜃! − 𝜂.
2
CONFIDENCE
INTERVAL
ERROR
𝑒𝑟𝑟𝑜𝑟(1 − 𝑒𝑟𝑟𝑜𝑟)
𝑒𝑟𝑟𝑜𝑟 ± 1.96.
GINI
COEFFICIENT
𝑁
(𝑁 + 1 − 𝑥). 𝑦!
𝑁 + 1 − 2.
𝑦
CHI
SQUARED
𝐺𝑖𝑛𝑖 =
𝑁
(𝑦 − 𝑦)! 𝛿 !
𝐶ℎ𝑖 = =
𝑦 𝑦
NUMBER
OF
EXAMPLES
1
log(𝑁! ) + log (𝛿 )
R
SQUARED
𝑚≥
𝜖
𝑦
𝑛 𝑥𝑦 − 𝑥. 𝑦 𝑤ℎ𝑒𝑟𝑒 𝜖 = ∧ 𝛿 = 𝑦 − 𝑦
𝑅! =
𝑦
𝑛 𝑥 ! − ( 𝑥)! . 𝑛 𝑦 ! − ( 𝑦)!
MARKOV
CHAINS
𝑓 𝑥 = 𝐸𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟 ! . [𝑥!! . . . 𝑥!" ]
𝑃!!! 𝑋 = 𝑥 = 𝑃! . 𝑋 = 𝑥 . 𝑇(𝑥 → 𝑥)
! t-‐SNE
||𝑥! − 𝑥! ||!
K
NEAREST
NEIGHBOR
exp −
2𝜎 !
𝐶𝑜𝑛𝑑𝑖𝑡. 𝑃𝑟𝑜𝑏 =
𝑓(𝑥) ||𝑥! − 𝑥! ||!
exp −
𝑓 𝑥 ←
2𝜎 !
𝑘
||𝑦! − 𝑦! ||!
! exp −
𝐷𝐸 𝑥! , 𝑥! = 𝑥! − 𝑥! + (𝑦!" − 𝑦!" )!
2𝜎 !
𝐶𝑜𝑛𝑑𝑖𝑡. 𝑃𝑟𝑜𝑏 =
||𝑦! − 𝑦! ||!
exp −
2𝜎 !
WEIGHTED
NEAREST
NEIGHBOR
(!! )
𝑃𝑒𝑟𝑝𝑙𝑒𝑥𝑖𝑡𝑦 = 2!(!! )
𝑓(𝑥)
𝑓 𝑥 = . 𝐷(𝑥! 𝑥! )!
𝐷(𝑥! 𝑥! )!
where:
PRINCIPAL
COMPONENTS
ANALYSIS
𝐻 𝑃! = − 𝑝!|! 𝑙𝑜𝑔! 𝑃!|!
!
𝑥′ = 𝑥 − 𝑥
𝐸𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒 = 𝐴 − 𝜆𝐼
COSINE
DISTANCE
𝐸𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟 = 𝐸𝑛𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒. [𝐴]
𝑢. 𝑣
𝐶𝑜𝑠 =
𝑢 . 𝑣
TF-‐IDF
𝑃
= 𝑒 !"!!
1−𝑃
𝑁
𝑤!" = 𝑡𝑓!" . 𝑙𝑜𝑔
𝑑𝑓!
𝑦. log (𝑦) + 1 − 𝑦 . log (1 − 𝑦)
𝐽 𝜃 =−
𝑛
LINEAR
REGRESSION
1
𝑤ℎ𝑒𝑟𝑒 𝑦 =
1 + 𝑒 !"!!
!
𝑥! 𝑥! 𝑦 − 𝑥! 𝑥! 𝑥! 𝑦
𝑚! =
𝑓𝑜𝑟 𝑦 = 0 ∧ 𝑦 = 1
𝑥!! 𝑥!! − ( 𝑥! 𝑥! )!
−2𝐿𝐿 → 0
𝑏 = 𝑦 − 𝑚! 𝑥! − 𝑚! 𝑥!
𝑥 ! ~ 𝑥! ≠ 𝑥! ′ ~ 𝑥! ′
!
𝑓 𝑥 = 𝑚! 𝑥! + 𝑏
𝑝
!!! 𝑚𝑥 + 𝑏 =
1−𝑝
𝐴 = 𝑋! . 𝑋 !!
. 𝑋 ! . 𝑌
𝑚𝑥 + 𝑏
𝑃 𝑎𝑐 =
𝑚𝑥 + 𝑏 + 1
𝑏
where
𝐴 =
𝑚
DECISION
TREES
!
LOGISTIC
REGRESSION
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = −𝑃. log (𝑃)
!!!
𝑃
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 = 𝑙𝑜𝑔 = 𝑚𝑥 + 𝑏
1−𝑃
𝐼𝑛𝑓𝑜𝐺𝑎𝑖𝑛 = 𝑃! . −𝑃!! . log 𝑃!! − 𝑃!(!!!) −. log (𝑃!(!!!) )
MUTUAL
INFORMATION
𝐼 𝐴, 𝐵 = 𝐻 𝐴 − 𝐻(𝐴|𝐵)
RULE
INDUCTION
𝐺𝑎𝑖𝑛 = 𝑃. [ −𝑃!!! . log (𝑃) − (−𝑃! . log (𝑃))]
EIGENVECTOR
CENTRALITY
=
PAGE
RANK
RULE
VOTE
1−𝑑 𝑃𝑅(𝐵) 𝑃𝑅(𝑛)
𝑃𝑅 𝐴 = −d +
𝑛 𝑂𝑢𝑡(𝐵) 𝑂𝑢𝑡(𝑛)
Weight=accuracy
.
coverage
where
d=1
few
connections
ENTROPY
RATING
𝑅 = 𝑅! + 𝛼 𝑤! . (𝑅!" − 𝑅! )
𝐻 𝐴 =− 𝑃 𝐴 . 𝑙𝑜𝑔𝑃(𝐴)
SIMILARITY
JOINT
ENTROPY
! 𝑅!" − 𝑅! . (𝑅!" − 𝑅! )
𝑤!" =
! 𝑅!" − 𝑅! ! . (𝑅!" − 𝑅! )!
𝐻 𝐴, 𝐵 = − 𝑃 𝐴, 𝐵 . 𝑙𝑜𝑔𝑃(𝐴, 𝐵)
CONDITIONAL
ENTROPY
CONTENT-‐BASED
RECOMMENDATION
𝐻 𝐴|𝐵 = − 𝑃 𝐴, 𝐵 . 𝑙𝑜𝑔𝑃(𝐴|𝐵)
!"#$$ !
𝑅𝑎𝑡𝑖𝑛𝑔 = 𝑥! 𝑦!
!!! !!!
LOGIT
COLLABORATIVE
FILTERING
𝑝
log 𝑜𝑑𝑑𝑠 = 𝑤𝑥 + 𝑏 = 𝑙𝑜𝑔
1−𝑝
𝑅!" = 𝑅! + 𝛼.
𝑅!" − 𝑅! . (𝑅!" − 𝑅! )
!
𝑅!" − 𝑅! .
SOFTMAX
NORMALIZATION
! 𝑅!" − 𝑅! ! . (𝑅!" − 𝑅! )!
𝑒 !"!!
𝑆(𝑓 𝑥 ) =
𝑒 !"!!
BATCH
GRADIENT
DESCENT
CROSS
ENTROPY
(𝑦 − 𝑦)! . 𝑥
𝐽 𝜃! ≔ 𝜃! ± 𝜂.
𝐻(𝑆 𝑓 𝑥 , 𝑓 𝑥 =− 𝑓 𝑥 . 𝑙𝑜𝑔𝑆(𝑓 𝑥 )
2𝑛
STOCHASTIC
GRADIENT
DESCENT
LOSS
𝐻(𝑆(𝑓 𝑥 , 𝑓(𝑥))
𝐿𝑜𝑠𝑠 =
𝐽 𝜃! ≔ 𝜃! ± 𝜂. (𝑦 − 𝑦)! . 𝑥
𝑁
L2
REGULARIZATION
NEURAL
NETWORKS
𝜆. 𝑤 !
! 𝑤 ← 𝑤 − 𝜂. 𝛿. 𝑥 +
2
𝑓 𝑥 = 𝑜 = 𝑤! + 𝑤! 𝑥!
!!!
SIGMOID
AVOID
OVERFIT
NEURAL
NETWORKS
L2
1 !"# !"# (𝑡 − 𝑜)!
𝑤= + F. 𝑤!"!
1 + 𝑒 !(!"!!) 2
where
F=penalty
RADIAL
BASIS
FUNCTION
(!!!)! BACKPROPAGATION
!
ℎ 𝑥 =𝑒 !!
𝛿! = 𝑜! . 1 − 𝑜! . (𝑡 − 𝑜! )
PERCEPTRON
!