# Gaussian Mixture Models, Expectation Maximization

## Preview text

GaussianMixtureModels, ExpectationMaximization
Instructor:JessicaWuͲͲ HarveyMudd College
GaussianMixtureModelsOverview
LearningGoals
 DescribethedifferencesbetweenkͲmeansand GMMs

KͲMeans:AnotherView

BasedonslidesbyDavidKauchak

Initializeclustercenters Assignexamplestoclosestcenter
kͲmeansassumessphericalclusters Updateclustercenters

GaussianMixtureModels

 AssumedatacamefrommixtureofGaussians (ellipticaldata)  Assigndatatoclusterwithcertainprobability (softclustering)

kͲmeans

GMMs

p(blue)=0.8 p(red)=0.2
p(blue)=0.9 p(red)=0.1

 VerysimilarathighͲleveltokͲmeans:iteratebetweenassigning examplesandupdatingclustercenters
BasedonslidesbyDavidKauchak

GMMExample

initializeclustercenters

softclusterexamples

updateclustercenters
(basedonweighted contributionofexamples)

keepiterating…

BasedonslidesbyDavidKauchak [ImagesbyChrisBishop,PRML]

LearningGMMs
LearningGoals
 DescribethetechnicaldetailsofGMMs

UnivariateGaussianDistribution
(scalar)randomvariableX parameters:(scalar)meanN,(scalar)varianceT2
X ~ N(N, T2)
Wikipedia[NormalDistribution]
MultivariateGaussianDistribution
randomvariablevectorX = [X1, …, Xn]T parameters: meanvectorN  n
covariancematrix4 (symmetric,positivedefinite) X ~ N(N, 4)
contourline: typicallydrawn at1/e ofpeakheight
Wikipedia[MultivariateNormalDistribution]

CovarianceMatrix

Recallforpairofr.v.’s X andY,covarianceisdefinedas cov(X,Y) = [(X – [X])(Y – [Y])]

ForX = [X1, …, Xn]T,covariancematrix summarizes covariances acrossallpairsofvariables:
4 = [(X – [X])(X – [X])T]
4 isn q n matrixs.t. 4ij = cov(Xi, Xj)

___parameters

___params
0 0

___params
0 0

GMMsasGenerativeModel

 Therearek components
 Componentj
± hasassociatedmeanvectorNj andcovariancematrix4j
± generatesdatafromN(Nj, 4j)
 Eachexamplex(i) is
generatedaccordingto followingrecipe:
± pickcomponentj atrandom withprobabilityGj
± samplex(i) ~ N(Nj, 4j)

G1 G2 N2

N1

x

G3

N3

BasedonslidesbyAndrewMoore

GMMsasGenerativeModel
Wearegiventrainingset{x(1), …, x(n)} (w/olabels) Wemodeldatabyspecifyingjointdistribution
p(x(i), z(i)) = p(x(i)) | z(i)) p(z(i))

Here,fork components, z(i) ~ Multinomial(G)
x(i) | z(i) = j ~ N(Nj, 4j)

Gj = p(z(i) = j) Gj p 0,

Goals: Determinez(i) (softclusterassignments) DeterminemodelparametersGj,Nj,4j (1 b j b k)
Note: z(i) arelatent r.v.’s (theyarehidden/unobserved)
Thisiswhatmakesestimationproblemdifficult

BasedonnotesbyAndrewNg

GMMOptimization
Assumesupervisedsetting(knownclusterassignments) MLEforunivariateGaussian
sumoverpointsgenerated fromthisGaussian
MLEformultivariateGaussian

BasedonslidesbyDavidSontag

GMMOptimization
Whatifunobserveddata?
Nowwhat?
BasedonnotesbyAndrewNg

BasedonslidesbyAndrewMoore

Expectation Maximization

ExpectationMaximization
LearningGoals
 DescribewhenEMisuseful  DescribethetwostepsofEM  PracticeEMonatoyproblem
ExpectationMaximization
 Clevermethodformaximizingmarginallikelihoods
 Excellentapproachforunsupervisedlearning  Cando“trivial”things(upcomingexample)  Oneofmostgeneralunsupervisedapproacheswithmany
otheruses(e.g.HMMinference) Overview  Beginwithguessformodelparameters  Repeatuntilconvergence
± Updatelatentvariablesbasedonourexpectations[EͲstep] ± Updatemodelparameterstomaximizeloglikelihood[MͲstep]
BasedonnotesbyAndrewNgandslidesbyAndrewMoore

SillyExample

P(A)=½

component2=getsaB component3=getsaC component4=getsaD

P(B)=p P(C)=2p P(D)=½ – 3p

(note0 b p b 1/6)

Assumewewanttoestimatep fromdata.Inagivenclass,therewere

a A’s,b B’s,c C’s,d D’s.

WhatistheMLEofp givena,b,c,d?

soifclassgot

a bc d

14 6 9 10

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]

SameProblemwithHiddenInformation

WhatistheMLEofp now?

Remember P(A)=½
P(B)=p P(C)=2p P(D)=½ – 3p

EXPECTATION Ifweknowvalueofp, wecouldcomputeexpected valuesofa andb.

MAXIMIZATION Ifweknowexpectedvaluesofa andb, wecouldcomputemaximum likelihoodvalueofp.

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]

EMforOurSillyExample
 Beginwithinitialguessforp  IteratebetweenExpectationandMaximizationtoimproveour
estimatesofp anda &b

 Definep(t) =estimateofp ontth iteration

b(t) =estimateofb ontth iteration

 Repeatuntilconvergence

EͲstep

= [b|p(t)]

MͲstep

= MLEofp givenb(t)

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]

EMConvergence

Aside(ideabehindconvergenceproof)

 likelihoodmustincreaseorremainsamebetweeneach

iteration[notobvious]

 likelihoodcanneverexceed1[obvious]

 solikelihoodmustconverge[obvious]

t p(t)

b(t)

00

0

Inourexample,supposewehad h = 20, c = 10, d = 10
p(0) = 0

1 0.0833 2.857 2 0.0937 3.158 3 0.0947 3.185 4 0.0948 3.187

Errorgenerallydecreasesbyconstantfactoreachtimestep 5 0.0948 3.187

(e.g.convergenceislinear)

6 0.0948 3.187

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]