An Implementation of Deep Belief Networks Using Restricted


Download An Implementation of Deep Belief Networks Using Restricted


Preview text

University of Rhode Island
[email protected]
Open Access Master's Theses 2016
An Implementation of Deep Belief Networks Using Restricted Boltzmann Machines in Clojure
James Christopher Sims University of Rhode Island, [email protected]
Follow this and additional works at: https://digitalcommons.uri.edu/theses Recommended Citation Sims, James Christopher, "An Implementation of Deep Belief Networks Using Restricted Boltzmann Machines in Clojure" (2016). Open Access Master's Theses. Paper 804. https://digitalcommons.uri.edu/theses/804 This Thesis is brought to you for free and open access by [email protected] It has been accepted for inclusion in Open Access Master's Theses by an authorized administrator of [email protected] For more information, please contact [email protected]

AN IMPLEMENTATION OF DEEP BELIEF NETWORKS USING RESTRICTED BOLTZMANN MACHINES IN CLOJURE BY JAMES CHRISTOPHER SIMS
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE
UNIVERSITY OF RHODE ISLAND 2016

MASTER OF SCIENCE THESIS OF
JAMES CHRISTOPHER SIMS

APPROVED:

Thesis Committee: Major Professor

Lutz Hamel

Gavino Puggioni

Resit Sendag

Nasser H. Zawia DEAN OF THE GRADUATE SCHOOL

UNIVERSITY OF RHODE ISLAND 2016

ABSTRACT In a work that ultimately heralded a resurgence of deep learning as a viable and successful machine learning model, Dr. Geoffrey Hinton described a fast learning algorithm for Deep Belief Networks [1]. This study explores that result and the underlying models and assumptions that power it. The result of the study is the completion of a Clojure library (deebn) implementing Deep Belief Networks, Deep Neural Networks, and Restricted Boltzmann Machines. deebn is capable of generating a predictive or classification model based on varying input parameters and dataset, and is available to a wide audience of Clojure users via Clojars1, the community repository for Clojure libraries. These capabilities were not present in a native Clojure library at the outset of this study. deebn performs quite well on the reference MNIST dataset with no dataset modification or hyperparameter tuning, giving a best performance in early tests of a 2.00% error rate.
1https://clojars.org/deebn

ACKNOWLEDGMENTS I’d like to first thank Dr. Lutz Hamel for his tremendous help and support of this work, and his patience during my many questions. His guidance from topic selection and refinement to technical feedback was invaluable and made this study possible. I’d like to thank my committee members Dr. Gavino Puggioni and Dr. Haibo He for their help in technical review of this study, and Dr. Resit Sendag for agreeing to serve as a committee member on such short notice!. I’d also like to thank Dr. Cheryl Foster for serving as Chair of my thesis committee. I’d also like to thank Lorraine Berube, who helped me in a myriad of different ways throughout my time as a student at the University of Rhode Island. She was always available with a helpful smile, and I am grateful for it. I’d like to finally thank my loving wife Kasey, who was a constant and unending source of support and determination. She was there every day to keep me on track with patience and encouragement.
iii

Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

1 Introduction

1

1.1 Summary of Remaining Chapters . . . . . . . . . . . . . . . . 2

2 Review of Literature

4

2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Leading to a Deep Belief Network . . . . . . . . . . . . 4

2.2 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . 5

2.2.1 Conditional Independence . . . . . . . . . . . . . . . . 5

2.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.3 Markov Random Fields . . . . . . . . . . . . . . . . . . 6

2.3 Energy-Based Models . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Learning a Markov Random Field . . . . . . . . . . . . 8

2.3.2 Boltzmann Machines . . . . . . . . . . . . . . . . . . . 9

2.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 9

2.4.1 The Perceptron . . . . . . . . . . . . . . . . . . . . . . 11

iv

2.4.2 Activation Functions . . . . . . . . . . . . . . . . . . . 11 2.4.3 Cost Function . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.4 Backpropagation . . . . . . . . . . . . . . . . . . . . . 14 2.5 Clojure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Deep Learning Models

16

3.1 Restricted Boltzmann Machines . . . . . . . . . . . . . . . . . 17

3.1.1 Training a Restricted Boltzmann Machine . . . . . . . 17

3.1.2 Contrastive Divergence . . . . . . . . . . . . . . . . . . 19

3.2 Deep Belief Networks . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Greedy By-Layer Pre-Training . . . . . . . . . . . . . . 20

3.3 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Training a Better Restricted Boltzmann Machine . . . . . . . 24

3.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.2 Momentum . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.3 Monitoring the Energy Gap for an Early Stop . . . . . 26

3.5 Training a Better Neural Network . . . . . . . . . . . . . . . . 26

3.5.1 Cross-Entropy Error . . . . . . . . . . . . . . . . . . . 27

3.5.2 L2 Regularization . . . . . . . . . . . . . . . . . . . . . 27

3.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.6.1 Deep Boltzmann Machines . . . . . . . . . . . . . . . . 28

3.6.2 Deep Autoencoders . . . . . . . . . . . . . . . . . . . . 28

4 Implementation

30

4.1 deebn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

v

4.2 An Abundance of Matrix Operations . . . . . . . . . . . . . . 31 4.2.1 core.matrix . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.2 vectorz-clj . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.3 Clojure Records . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Restricted Boltzmann Machines . . . . . . . . . . . . . . . . . 32 4.3.1 Creating a Restricted Boltzmann Machine . . . . . . . 32 4.3.2 Learning a Restricted Boltzmann Machine . . . . . . . 33
4.4 Deep Belief Networks . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.1 Creating a Deep Belief Network . . . . . . . . . . . . . 34 4.4.2 Learning a Deep Belief Network . . . . . . . . . . . . . 35
4.5 Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . 35 4.5.1 Creating a Deep Neural Network . . . . . . . . . . . . 36 4.5.2 Learning a Deep Neural Network . . . . . . . . . . . . 36
4.6 Using the Library . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Performance

39

5.1 Preliminary Performance on MNIST Dataset . . . . . . . . . . 39

5.2 Cross-Validated Results . . . . . . . . . . . . . . . . . . . . . 40

6 Conclusion

42

6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.1.1 Java Interoperability . . . . . . . . . . . . . . . . . . . 42

6.1.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . 43

6.1.3 Using Different Matrix Libraries . . . . . . . . . . . . . 43

6.1.4 Persistent Contrastive Divergence . . . . . . . . . . . . 44

vi

6.1.5 Mutation and Performance . . . . . . . . . . . . . . . . 45 6.2 A Stepping Stone . . . . . . . . . . . . . . . . . . . . . . . . . 45

A Algorithms

46

A.1 Deep Belief Network Learning . . . . . . . . . . . . . . . . . . 46

A.2 Restricted Boltzmann Machine Learning . . . . . . . . . . . . 47

vii

List of Figures
2.1 Markov random field . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Artificial Neural Network with one hidden layer . . . . . . . . 10 2.3 A single perceptron . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 The logistic function . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 The hypertangent function . . . . . . . . . . . . . . . . . . . . 13 3.1 Restricted Boltzmann Machine . . . . . . . . . . . . . . . . . 17 3.2 Deep Belief Network . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 An infinite logistic belief net with tied weights [1] . . . . . . . 22 3.4 Pre-training and fine-tuning a deep autoencoder [9] . . . . . . 29 6.1 Exact log likelihood with 25 hidden units on MNIST dataset [42] 44
viii

Preparing to load PDF file. please wait...

0 of 0
100%
An Implementation of Deep Belief Networks Using Restricted