Gaussian Mixture Models, Expectation Maximization


Download Gaussian Mixture Models, Expectation Maximization


Preview text

GaussianMixtureModels, ExpectationMaximization
Instructor:JessicaWuͲͲ HarveyMudd College
TheinstructorgratefullyacknowledgesAndrewNg(Stanford),AndrewMoore(CMU),EricEaton(UPenn), DavidKauchak (Pomona),andthemanyotherswhomadetheircoursematerialsfreelyavailableonline.
RobotImageCredit:Viktoriya Sukhanova ©123RF.com
GaussianMixtureModelsOverview
LearningGoals
‡ DescribethedifferencesbetweenkͲmeansand GMMs

KͲMeans:AnotherView

BasedonslidesbyDavidKauchak

Initializeclustercenters Assignexamplestoclosestcenter
kͲmeansassumessphericalclusters Updateclustercenters

GaussianMixtureModels

‡ AssumedatacamefrommixtureofGaussians (ellipticaldata) ‡ Assigndatatoclusterwithcertainprobability (softclustering)

kͲmeans

GMMs

p(blue)=0.8 p(red)=0.2
p(blue)=0.9 p(red)=0.1

‡ VerysimilarathighͲleveltokͲmeans:iteratebetweenassigning examplesandupdatingclustercenters
BasedonslidesbyDavidKauchak

GMMExample

initializeclustercenters

softclusterexamples

updateclustercenters
(basedonweighted contributionofexamples)

keepiterating…

BasedonslidesbyDavidKauchak [ImagesbyChrisBishop,PRML]

LearningGMMs
LearningGoals
‡ DescribethetechnicaldetailsofGMMs

UnivariateGaussianDistribution
(scalar)randomvariableX parameters:(scalar)meanN,(scalar)varianceT2
X ~ N(N, T2)
Wikipedia[NormalDistribution]
MultivariateGaussianDistribution
randomvariablevectorX = [X1, …, Xn]T parameters: meanvectorN  n
covariancematrix4 (symmetric,positivedefinite) X ~ N(N, 4)
contourline: typicallydrawn at1/e ofpeakheight
Wikipedia[MultivariateNormalDistribution]

CovarianceMatrix

Recallforpairofr.v.’s X andY,covarianceisdefinedas cov(X,Y) = [(X – [X])(Y – [Y])]

ForX = [X1, …, Xn]T,covariancematrix summarizes covariances acrossallpairsofvariables:
4 = [(X – [X])(X – [X])T]
4 isn q n matrixs.t. 4ij = cov(Xi, Xj)

___parameters

___params
0 0

___params
0 0

GMMsasGenerativeModel

‡ Therearek components
‡ Componentj
± hasassociatedmeanvectorNj andcovariancematrix4j
± generatesdatafromN(Nj, 4j)
‡ Eachexamplex(i) is
generatedaccordingto followingrecipe:
± pickcomponentj atrandom withprobabilityGj
± samplex(i) ~ N(Nj, 4j)

G1 G2 N2

N1

x

G3

N3

BasedonslidesbyAndrewMoore

GMMsasGenerativeModel
Wearegiventrainingset{x(1), …, x(n)} (w/olabels) Wemodeldatabyspecifyingjointdistribution
p(x(i), z(i)) = p(x(i)) | z(i)) p(z(i))

Here,fork components, z(i) ~ Multinomial(G)
x(i) | z(i) = j ~ N(Nj, 4j)

Gj = p(z(i) = j) Gj p 0,

Goals: Determinez(i) (softclusterassignments) DeterminemodelparametersGj,Nj,4j (1 b j b k)
Note: z(i) arelatent r.v.’s (theyarehidden/unobserved)
Thisiswhatmakesestimationproblemdifficult

BasedonnotesbyAndrewNg

GMMOptimization
Assumesupervisedsetting(knownclusterassignments) MLEforunivariateGaussian
sumoverpointsgenerated fromthisGaussian
MLEformultivariateGaussian

BasedonslidesbyDavidSontag

GMMOptimization
Whatifunobserveddata?
Nowwhat?
BasedonnotesbyAndrewNg

BasedonslidesbyAndrewMoore

Expectation Maximization

ExpectationMaximization
LearningGoals
‡ DescribewhenEMisuseful ‡ DescribethetwostepsofEM ‡ PracticeEMonatoyproblem
ExpectationMaximization
‡ Clevermethodformaximizingmarginallikelihoods
‡ Excellentapproachforunsupervisedlearning ‡ Cando“trivial”things(upcomingexample) ‡ Oneofmostgeneralunsupervisedapproacheswithmany
otheruses(e.g.HMMinference) Overview ‡ Beginwithguessformodelparameters ‡ Repeatuntilconvergence
± Updatelatentvariablesbasedonourexpectations[EͲstep] ± Updatemodelparameterstomaximizeloglikelihood[MͲstep]
BasedonnotesbyAndrewNgandslidesbyAndrewMoore

SillyExample

Leteventsbe“gradesinaclass” component1=getsanA

P(A)=½

component2=getsaB component3=getsaC component4=getsaD

P(B)=p P(C)=2p P(D)=½ – 3p

(note0 b p b 1/6)

Assumewewanttoestimatep fromdata.Inagivenclass,therewere

a A’s,b B’s,c C’s,d D’s.

WhatistheMLEofp givena,b,c,d?

soifclassgot

a bc d

14 6 9 10

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]

SameProblemwithHiddenInformation

Someonetellsusthat #ofhighgrades(A’s+B’s)=h #ofC’s=c #ofD’s=d
WhatistheMLEofp now?

Remember P(A)=½
P(B)=p P(C)=2p P(D)=½ – 3p

Wecananswerthisquestioncircularly:
EXPECTATION Ifweknowvalueofp, wecouldcomputeexpected valuesofa andb.

MAXIMIZATION Ifweknowexpectedvaluesofa andb, wecouldcomputemaximum likelihoodvalueofp.

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]

EMforOurSillyExample
‡ Beginwithinitialguessforp ‡ IteratebetweenExpectationandMaximizationtoimproveour
estimatesofp anda &b

‡ Definep(t) =estimateofp ontth iteration

b(t) =estimateofb ontth iteration

‡ Repeatuntilconvergence

EͲstep

= [b|p(t)]

MͲstep

= MLEofp givenb(t)

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]

EMConvergence
‡ Goodnews:convergingtolocaloptimaisguaranteed ‡ Badnews:localoptima

Aside(ideabehindconvergenceproof)

‡ likelihoodmustincreaseorremainsamebetweeneach

iteration[notobvious]

‡ likelihoodcanneverexceed1[obvious]

‡ solikelihoodmustconverge[obvious]

t p(t)

b(t)

00

0

Inourexample,supposewehad h = 20, c = 10, d = 10
p(0) = 0

1 0.0833 2.857 2 0.0937 3.158 3 0.0947 3.185 4 0.0948 3.187

Errorgenerallydecreasesbyconstantfactoreachtimestep 5 0.0948 3.187

(e.g.convergenceislinear)

6 0.0948 3.187

BasedonslidesbyAndrewMoore[ClusteringwithGaussianMixtures]

Categories

Preparing to load PDF file. please wait...

0 of 0
100%
Gaussian Mixture Models, Expectation Maximization