One of the Programming Assignments (PA) for the Neural Networks for Machine Learning course on Coursera is to just investigate the effect of various parameters on a Neural Network's (NN) performance. The input is the MNIST handwritten digits dataset provided as part of the MATLAB starter code, for which I substituted the simplified version available at the UCI Machine Learning Repository. As in previous posts where I have used Coursera PAs as inspiration for my own learning, I will formally state the obvious - the data and approach are very different, and hence is likely to produce incorrect results for the PA.
The NN itself consists of an input layer of 64 neurons, corresponding to each of the pixels in the 8x8 handwritten digit, a hidden layer of sigmoid activation units, and an output of 10 softmax activation units corresponding to the digits 0-9. There are 3,823 records in the training set and 1,797 records in the testing set. I split up the training set 50/50 for training and cross-validation. I then varied various common NN tunable parameters and observed their effect on the error rate. The wikibooks link provides a good overview of these tunable parameters. Code for creating and evaluating the NN with various tunable parameters is shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | // Source: src/main/scala/com/mycompany/scalcium/langmodel/EncogNNEval.scala
package com.mycompany.scalcium.langmodel
import java.io.File
import scala.collection.JavaConversions._
import scala.io.Source
import scala.util.Random
import org.encog.Encog
import org.encog.engine.network.activation.ActivationSigmoid
import org.encog.engine.network.activation.ActivationSoftMax
import org.encog.mathutil.randomize.RangeRandomizer
import org.encog.ml.data.MLDataSet
import org.encog.ml.data.basic.BasicMLData
import org.encog.ml.data.basic.BasicMLDataSet
import org.encog.neural.networks.BasicNetwork
import org.encog.neural.networks.layers.BasicLayer
import org.encog.neural.networks.training.propagation.back.Backpropagation
class EncogNNEval {
val Debug = false
val encoder = new OneHotEncoder(10)
def evaluate(trainfile: File, decay: Float, hiddenLayerSize: Int,
numIters: Int, learningRate: Float, momentum: Float,
miniBatchSize: Int, earlyStopping: Boolean):
(Double, Double, BasicNetwork) = {
// parse training file into a 50/50 training and validation set
val datasets = parseFile(trainfile, 0.5F)
val trainset = datasets._1; val valset = datasets._2
// build network
val network = new BasicNetwork()
network.addLayer(new BasicLayer(null, true, 8 * 8))
network.addLayer(new BasicLayer(new ActivationSigmoid(), true, hiddenLayerSize))
network.addLayer(new BasicLayer(new ActivationSoftMax(), false, 10))
network.getStructure().finalizeStructure()
new RangeRandomizer(-1, 1).randomize(network)
// set up trainer
val trainer = new Backpropagation(network, trainset, learningRate, momentum)
trainer.setBatchSize(miniBatchSize)
var currIter = 0
var trainError = 0.0D
var valError = 0.0D
var pValError = 0.0D
var contLoop = false
do {
trainer.iteration()
if (decay > 0.0F) trainer.setLearningRate(
(1.0 - (decay * currIter / numIters) * learningRate))
// calculate training and validation error
trainError = error(network, trainset)
valError = error(network, valset)
if (Debug) {
Console.println("Epoch: %d, Train error: %.3f, Validation Error: %.3f"
.format(currIter, trainError, valError))
}
currIter += 1
contLoop = shouldContinue(currIter, numIters, earlyStopping,
valError, pValError)
pValError = valError
} while (contLoop)
trainer.finishTraining()
Encog.getInstance().shutdown()
(trainError, valError, network)
}
def parseFile(f: File, holdout: Float): (MLDataSet, MLDataSet) = {
val trainset = new BasicMLDataSet()
val valset = new BasicMLDataSet()
Source.fromFile(f).getLines()
.foreach(line => {
val cols = line.split(",")
val inputs = cols.slice(0, 64).map(_.toDouble / 64.0D)
val output = encoder.encode(cols(64).toInt)
if (Random.nextDouble < holdout)
valset.add(new BasicMLData(inputs), new BasicMLData(output))
else trainset.add(new BasicMLData(inputs), new BasicMLData(output))
})
(trainset, valset)
}
def error(network: BasicNetwork, dataset: MLDataSet): Double = {
var numCorrect = 0.0D
var numTested = 0.0D
val x = dataset.map(pair => {
val predicted = network.compute(pair.getInput()).getData()
val actual = encoder.decode(pair.getIdeal().getData())
if (actual == predicted.indexOf(predicted.max)) numCorrect += 1.0D
numTested += 1.0D
})
numCorrect / numTested
}
def shouldContinue(currIter: Int, numIters: Int, earlyStopping: Boolean,
validationError: Double, prevValidationError: Double): Boolean =
if (earlyStopping)
(currIter < numIters && prevValidationError < validationError)
else currIter < numIters
}
|
The first experiment is to vary the learning rate with and without momentum. The NN is trained for 70 iterations for each case. The second experiment is a repeat of the first, but uses early stopping to stop training if the cross validation error starts to increase. The third experiment tests the effect of weight decay, ie, lowering the learning rate at each iteration by a fixed amount. I keep the momentum at 0 in this case. The fourth experiment tests the effect of varying the number of hidden units on error rate, keeping all other parameters constant. The final experiment is to run the test for many more iterations with optimal values for parameters discovered in the previous experiments. Here is the code for the unit test.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | // Source: src/test/scala/com/mycompany/scalcium/langmodel/EncogNNEvalTest.scala
package com.mycompany.scalcium.langmodel
import java.io.File
import java.io.FileWriter
import java.io.PrintWriter
import org.junit.Test
class EncogNNEvalTest {
val trainfile = new File("src/main/resources/langmodel/optdigits_train.txt")
val testfile = new File("src/main/resources/langmodel/optdigits_test.txt")
@Test
def testVaryLearningRateAndMomentum(): Unit = {
val results = new PrintWriter(new FileWriter(
new File("results1.csv")), true)
val nneval = new EncogNNEval()
val weightDecay = 0.0F
val numHiddenUnit = 10
val numIterations = 70
val learningRates = Array[Float](0.002F, 0.01F, 0.05F, 0.2F, 1.0F,
5.0F, 20.0F)
val momentums = Array[Float](0.0F, 0.9F)
val miniBatchSize = 10
val earlyStopping = false
var lineNo = 0
for (learningRate <- learningRates;
momentum <- momentums) {
runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit,
numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
lineNo == 0)
lineNo += 1
}
results.flush()
results.close()
}
@Test
def testVaryLearningRateAndMomentumWithEarlyStopping(): Unit = {
val results = new PrintWriter(new FileWriter(
new File("results2.csv")), true)
val nneval = new EncogNNEval()
val weightDecay = 0.0F
val numHiddenUnit = 10
val numIterations = 70
val learningRates = Array[Float](0.002F, 0.01F, 0.05F, 0.2F, 1.0F,
5.0F, 20.0F)
val momentums = Array[Float](0.0F, 0.9F)
val miniBatchSize = 10
val earlyStopping = true
var lineNo = 0
for (learningRate <- learningRates;
momentum <- momentums) {
runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit,
numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
lineNo == 0)
lineNo += 1
}
results.flush()
results.close()
}
@Test
def testVaryWeightDecay(): Unit = {
val results = new PrintWriter(new FileWriter(
new File("results3.csv")), true)
val nneval = new EncogNNEval()
val weightDecays = Array[Float](10.0F, 1.0F, 0.0F, 0.1F, 0.01F, 0.001F)
val numHiddenUnit = 10
val numIterations = 70
val learningRate = 0.05F
val momentum = 0.0F
val miniBatchSize = 10
val earlyStopping = true
var lineNo = 0
for (weightDecay <- weightDecays) {
runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit,
numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
lineNo == 0)
lineNo += 1
}
results.flush()
results.close()
}
@Test
def testVaryHiddenUnits(): Unit = {
val results = new PrintWriter(new FileWriter(
new File("results4.csv")), true)
val nneval = new EncogNNEval()
val weightDecay = 0.0F
val numHiddenUnits = Array[Int](10, 50, 100, 150, 200, 250, 500)
val numIterations = 70
val learningRate = 0.05F
val momentum = 0.0F
val miniBatchSize = 10
val earlyStopping = true
var lineNo = 0
for (numHiddenUnit <- numHiddenUnits) {
runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit,
numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
lineNo == 0)
lineNo += 1
}
results.flush()
results.close()
}
@Test
def testFinalRun(): Unit = {
val nneval = new EncogNNEval()
val weightDecay = 0.0F
val numHiddenUnit = 200
val numIterations = 1000
val learningRate = 0.1F
val momentum = 0.9F
val miniBatchSize = 100
val earlyStopping = true
val scores = nneval.evaluate(trainfile, weightDecay, numHiddenUnit,
numIterations, learningRate, momentum, miniBatchSize, earlyStopping)
// verify on test set
val testds = nneval.parseFile(testfile, 0.0F)
val network = scores._3
val testError = nneval.error(network, testds._1)
Console.println("Train Error: %.3f, Validation Error: %.3f, Test Error: %.3f"
.format(scores._1, scores._2, testError))
}
def runAndReport(nneval: EncogNNEval, results: PrintWriter,
trainfile: File, weightDecay: Float, numHiddenUnit: Int,
numIterations: Int, learningRate: Float, momentum: Float,
miniBatchSize: Int, earlyStopping: Boolean,
writeHeader: Boolean): Unit = {
val scores = nneval.evaluate(trainfile, weightDecay, numHiddenUnit,
numIterations, learningRate, momentum, miniBatchSize, earlyStopping)
if (writeHeader)
results.println("DECAY\tHUNITS\tITERS\tLR\tMOM\tBS\tES\tTRNERR\tVALERR")
results.println("%.3f\t%d\t%d\t%.3f\t%.3f\t%d\t%d\t%.3f\t%.3f"
.format(weightDecay, numHiddenUnit, numIterations, learningRate,
momentum, miniBatchSize, if (earlyStopping) 1 else 0,
scores._1, scores._2))
}
}
|
I used matplotlib to chart the results for each of the four experiments described above. Here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | # Source: nneval_charts.py
import pandas as pd
import matplotlib.pyplot as plt
import os
DATA_DIR = "/path/to/data/files"
def draw_4chart(xs, ys1, ys2, ys3, ys4, title, xlabel, ylabel, legends):
plt.plot(xs, ys1, label=legends[0])
plt.plot(xs, ys2, label=legends[1])
plt.plot(xs, ys3, label=legends[2])
plt.plot(xs, ys4, label=legends[3])
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.legend()
plt.show()
def chart1():
rdf = pd.read_csv(os.path.join(DATA_DIR, "results1.csv"),
sep="\t", header=False)
# split into momentum groups
rdf0 = rdf[rdf["MOM"] == 0.0]
xvals = rdf0["LR"].values
yvals0_tr = rdf0["TRNERR"].values
yvals0_vl = rdf0["VALERR"].values
rdf1 = rdf[rdf["MOM"] > 0.0]
yvals1_tr = rdf1["TRNERR"].values
yvals1_vl = rdf1["VALERR"].values
draw_4chart(xvals, yvals0_tr, yvals0_vl, yvals1_tr, yvals1_vl,
"Error vs Learning Rate and Momentum",
"Learning Rate", "Error Rate",
["Trn Err (Mom=0)", "CV Err (Mom=0)",
"Trn Err (Mom=0.9)", "CV Err (Mom=0.9)"])
def chart2():
rdf = pd.read_csv(os.path.join(DATA_DIR, "results2.csv"),
sep="\t", header=False)
# split into momentum groups
rdf0 = rdf[rdf["MOM"] == 0.0]
xvals = rdf0["LR"].values
yvals0_tr = rdf0["TRNERR"].values
yvals0_vl = rdf0["VALERR"].values
rdf1 = rdf[rdf["MOM"] > 0.0]
yvals1_tr = rdf1["TRNERR"].values
yvals1_vl = rdf1["VALERR"].values
draw_4chart(xvals, yvals0_tr, yvals0_vl, yvals1_tr, yvals1_vl,
"Error vs Learning Rate & Momentum (w/Early Stopping)",
"Learning Rate", "Error Rate",
["Trn Err (Mom=0)", "CV Err (Mom=0)",
"Trn Err (Mom=0.9)", "CV Err (Mom=0.9)"])
def draw_2chart(xs, ys1, ys2, title, xlabel, ylabel, legends):
plt.plot(xs, ys1, label=legends[0])
plt.plot(xs, ys2, label=legends[1])
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.legend()
plt.show()
def chart3():
rdf = pd.read_csv(os.path.join(DATA_DIR, "results3.csv"),
sep="\t", header=False)
xvals = rdf["DECAY"].values
yvals1 = rdf["TRNERR"].values
yvals2 = rdf["VALERR"].values
draw_2chart(xvals, yvals1, yvals2,
"Error vs Weight Decay (w/Early Stopping)",
"Weight Decay", "Error Rate", ["Trn Err", "CV Err"])
def chart4():
rdf = pd.read_csv(os.path.join(DATA_DIR, "results4.csv"),
sep="\t", header=False)
xvals = rdf["HUNITS"].values
yvals1 = rdf["TRNERR"].values
yvals2 = rdf["VALERR"].values
draw_2chart(xvals, yvals1, yvals2,
"Error vs #-Hidden Units (w/Early Stopping)",
"#-Hidden Units", "Error Rate", ["Trn Err", "CV Err"])
chart1()
chart2()
chart3()
chart4()
|
The first three results mostly coencide with our intuition that the graphs should look like a hockey stick. However, in case of the number of hidden units, it seems like the error rate is lowest with 10 hidden units.
Finally, the final run completed with a training error of 0.267, validation error of 0.269 and a test set error of 0.257.
For me, this exercise was a way to get to understand the various knobs you can turn to get a NN to perform better, as well as a way to familiarize myself more with the Encog library. The NN here still needs to be tuned quite a bit - although the results are not terrible, handwritten digit recognition is a well-studied problem in this area and accuracies are in the high 90% range (ie under 0.1 error rates).
No comments:
Post a Comment
Comments are moderated to prevent spam.