Salmon Run: Experiments in Tuning Neural Networks

One of the Programming Assignments (PA) for the Neural Networks for Machine Learning course on Coursera is to just investigate the effect of various parameters on a Neural Network's (NN) performance. The input is the MNIST handwritten digits dataset provided as part of the MATLAB starter code, for which I substituted the simplified version available at the UCI Machine Learning Repository. As in previous posts where I have used Coursera PAs as inspiration for my own learning, I will formally state the obvious - the data and approach are very different, and hence is likely to produce incorrect results for the PA.

The NN itself consists of an input layer of 64 neurons, corresponding to each of the pixels in the 8x8 handwritten digit, a hidden layer of sigmoid activation units, and an output of 10 softmax activation units corresponding to the digits 0-9. There are 3,823 records in the training set and 1,797 records in the testing set. I split up the training set 50/50 for training and cross-validation. I then varied various common NN tunable parameters and observed their effect on the error rate. The wikibooks link provides a good overview of these tunable parameters. Code for creating and evaluating the NN with various tunable parameters is shown below:

// Source: src/main/scala/com/mycompany/scalcium/langmodel/EncogNNEval.scala
package com.mycompany.scalcium.langmodel

import java.io.File

import scala.collection.JavaConversions._
import scala.io.Source
import scala.util.Random

import org.encog.Encog
import org.encog.engine.network.activation.ActivationSigmoid
import org.encog.engine.network.activation.ActivationSoftMax
import org.encog.mathutil.randomize.RangeRandomizer
import org.encog.ml.data.MLDataSet
import org.encog.ml.data.basic.BasicMLData
import org.encog.ml.data.basic.BasicMLDataSet
import org.encog.neural.networks.BasicNetwork
import org.encog.neural.networks.layers.BasicLayer
import org.encog.neural.networks.training.propagation.back.Backpropagation

class EncogNNEval {
  
  val Debug = false
  val encoder = new OneHotEncoder(10)

  def evaluate(trainfile: File, decay: Float, hiddenLayerSize: Int, 
      numIters: Int, learningRate: Float, momentum: Float, 
      miniBatchSize: Int, earlyStopping: Boolean): 
      (Double, Double, BasicNetwork) = {
    // parse training file into a 50/50 training and validation set
    val datasets = parseFile(trainfile, 0.5F)
    val trainset = datasets._1; val valset = datasets._2
    // build network
    val network = new BasicNetwork()
    network.addLayer(new BasicLayer(null, true, 8 * 8))
    network.addLayer(new BasicLayer(new ActivationSigmoid(), true, hiddenLayerSize))
    network.addLayer(new BasicLayer(new ActivationSoftMax(), false, 10))
    network.getStructure().finalizeStructure()
    new RangeRandomizer(-1, 1).randomize(network)
    // set up trainer
    val trainer = new Backpropagation(network, trainset, learningRate, momentum)
    trainer.setBatchSize(miniBatchSize)
    var currIter = 0
    var trainError = 0.0D
    var valError = 0.0D
    var pValError = 0.0D
    var contLoop = false
    do {
      trainer.iteration()
      if (decay > 0.0F) trainer.setLearningRate(
        (1.0 - (decay * currIter / numIters) * learningRate))
      // calculate training and validation error
      trainError = error(network, trainset)
      valError = error(network, valset)
      if (Debug) {
        Console.println("Epoch: %d, Train error: %.3f, Validation Error: %.3f"
          .format(currIter, trainError, valError))
      }
      currIter += 1
      contLoop = shouldContinue(currIter, numIters, earlyStopping, 
        valError, pValError)
      pValError = valError
    } while (contLoop)
    trainer.finishTraining()
    Encog.getInstance().shutdown()
    (trainError, valError, network)
  }

  def parseFile(f: File, holdout: Float): (MLDataSet, MLDataSet) = {
    val trainset = new BasicMLDataSet()
    val valset = new BasicMLDataSet()
    Source.fromFile(f).getLines()
      .foreach(line => {
        val cols = line.split(",")
        val inputs = cols.slice(0, 64).map(_.toDouble / 64.0D)
        val output = encoder.encode(cols(64).toInt)
        if (Random.nextDouble < holdout)
          valset.add(new BasicMLData(inputs), new BasicMLData(output))
        else trainset.add(new BasicMLData(inputs), new BasicMLData(output))
      })
    (trainset, valset)
  } 
  
  def error(network: BasicNetwork, dataset: MLDataSet): Double = {
    var numCorrect = 0.0D
    var numTested = 0.0D
    val x = dataset.map(pair => {
      val predicted = network.compute(pair.getInput()).getData()
      val actual = encoder.decode(pair.getIdeal().getData())
      if (actual == predicted.indexOf(predicted.max)) numCorrect += 1.0D
      numTested += 1.0D
    })
    numCorrect / numTested
  }

  def shouldContinue(currIter: Int, numIters: Int, earlyStopping: Boolean,
      validationError: Double, prevValidationError: Double): Boolean = 
    if (earlyStopping) 
      (currIter < numIters && prevValidationError < validationError)
    else currIter < numIters  
}

The first experiment is to vary the learning rate with and without momentum. The NN is trained for 70 iterations for each case. The second experiment is a repeat of the first, but uses early stopping to stop training if the cross validation error starts to increase. The third experiment tests the effect of weight decay, ie, lowering the learning rate at each iteration by a fixed amount. I keep the momentum at 0 in this case. The fourth experiment tests the effect of varying the number of hidden units on error rate, keeping all other parameters constant. The final experiment is to run the test for many more iterations with optimal values for parameters discovered in the previous experiments. Here is the code for the unit test.

// Source: src/test/scala/com/mycompany/scalcium/langmodel/EncogNNEvalTest.scala
package com.mycompany.scalcium.langmodel

import java.io.File
import java.io.FileWriter
import java.io.PrintWriter

import org.junit.Test

class EncogNNEvalTest {

  val trainfile = new File("src/main/resources/langmodel/optdigits_train.txt")
  val testfile = new File("src/main/resources/langmodel/optdigits_test.txt")
  
  @Test
  def testVaryLearningRateAndMomentum(): Unit = {
    val results = new PrintWriter(new FileWriter(
      new File("results1.csv")), true)
    val nneval = new EncogNNEval()
    val weightDecay = 0.0F
    val numHiddenUnit = 10
    val numIterations = 70
    val learningRates = Array[Float](0.002F, 0.01F, 0.05F, 0.2F, 1.0F, 
                                     5.0F, 20.0F)
    val momentums = Array[Float](0.0F, 0.9F)
    val miniBatchSize = 10
    val earlyStopping = false
    var lineNo = 0
    for (learningRate <- learningRates;
         momentum <- momentums) {
      runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit, 
        numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
        lineNo == 0)
      lineNo += 1
    }
    results.flush()
    results.close()
  }
  
  @Test
  def testVaryLearningRateAndMomentumWithEarlyStopping(): Unit = {
    val results = new PrintWriter(new FileWriter(
      new File("results2.csv")), true)
    val nneval = new EncogNNEval()
    val weightDecay = 0.0F
    val numHiddenUnit = 10
    val numIterations = 70
    val learningRates = Array[Float](0.002F, 0.01F, 0.05F, 0.2F, 1.0F, 
                                     5.0F, 20.0F)
    val momentums = Array[Float](0.0F, 0.9F)
    val miniBatchSize = 10
    val earlyStopping = true
    var lineNo = 0
    for (learningRate <- learningRates;
         momentum <- momentums) {
      runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit, 
        numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
        lineNo == 0)
      lineNo += 1
    }
    results.flush()
    results.close()
  }
  
  @Test
  def testVaryWeightDecay(): Unit = {
    val results = new PrintWriter(new FileWriter(
      new File("results3.csv")), true)
    val nneval = new EncogNNEval()
    val weightDecays = Array[Float](10.0F, 1.0F, 0.0F, 0.1F, 0.01F, 0.001F)
    val numHiddenUnit = 10
    val numIterations = 70
    val learningRate = 0.05F
    val momentum = 0.0F
    val miniBatchSize = 10
    val earlyStopping = true
    var lineNo = 0
    for (weightDecay <- weightDecays) {
      runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit, 
        numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
        lineNo == 0)
      lineNo += 1
    }
    results.flush()
    results.close()
  }
  
  @Test
  def testVaryHiddenUnits(): Unit = {
    val results = new PrintWriter(new FileWriter(
      new File("results4.csv")), true)
    val nneval = new EncogNNEval()
    val weightDecay = 0.0F
    val numHiddenUnits = Array[Int](10, 50, 100, 150, 200, 250, 500)
    val numIterations = 70
    val learningRate = 0.05F
    val momentum = 0.0F
    val miniBatchSize = 10
    val earlyStopping = true
    var lineNo = 0
    for (numHiddenUnit <- numHiddenUnits) {
      runAndReport(nneval, results, trainfile, weightDecay, numHiddenUnit, 
        numIterations, learningRate, momentum, miniBatchSize, earlyStopping,
        lineNo == 0)
      lineNo += 1
    }
    results.flush()
    results.close()
  }
  
  @Test
  def testFinalRun(): Unit = {
    val nneval = new EncogNNEval()
    val weightDecay = 0.0F
    val numHiddenUnit = 200
    val numIterations = 1000
    val learningRate = 0.1F
    val momentum = 0.9F
    val miniBatchSize = 100
    val earlyStopping = true
    val scores = nneval.evaluate(trainfile, weightDecay, numHiddenUnit, 
      numIterations, learningRate, momentum, miniBatchSize, earlyStopping) 
    // verify on test set
    val testds = nneval.parseFile(testfile, 0.0F)
    val network = scores._3
    val testError = nneval.error(network, testds._1)
    Console.println("Train Error: %.3f, Validation Error: %.3f, Test Error: %.3f"
      .format(scores._1, scores._2, testError))
  }
  
  def runAndReport(nneval: EncogNNEval, results: PrintWriter, 
      trainfile: File, weightDecay: Float, numHiddenUnit: Int, 
      numIterations: Int, learningRate: Float, momentum: Float, 
      miniBatchSize: Int, earlyStopping: Boolean,
      writeHeader: Boolean): Unit = {
    val scores = nneval.evaluate(trainfile, weightDecay, numHiddenUnit, 
      numIterations, learningRate, momentum, miniBatchSize, earlyStopping) 
    if (writeHeader)
      results.println("DECAY\tHUNITS\tITERS\tLR\tMOM\tBS\tES\tTRNERR\tVALERR")
    results.println("%.3f\t%d\t%d\t%.3f\t%.3f\t%d\t%d\t%.3f\t%.3f"
      .format(weightDecay, numHiddenUnit, numIterations, learningRate, 
        momentum, miniBatchSize, if (earlyStopping) 1 else 0, 
        scores._1, scores._2))
  }
}

I used matplotlib to chart the results for each of the four experiments described above. Here is the code:

# Source: nneval_charts.py
import pandas as pd
import matplotlib.pyplot as plt
import os

DATA_DIR = "/path/to/data/files"

def draw_4chart(xs, ys1, ys2, ys3, ys4, title, xlabel, ylabel, legends):
    plt.plot(xs, ys1, label=legends[0])
    plt.plot(xs, ys2, label=legends[1])
    plt.plot(xs, ys3, label=legends[2])
    plt.plot(xs, ys4, label=legends[3])
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.legend()
    plt.show()
    
def chart1():
    rdf = pd.read_csv(os.path.join(DATA_DIR, "results1.csv"), 
                      sep="\t", header=False)
    # split into momentum groups
    rdf0 = rdf[rdf["MOM"] == 0.0]
    xvals = rdf0["LR"].values
    yvals0_tr = rdf0["TRNERR"].values
    yvals0_vl = rdf0["VALERR"].values
    rdf1 = rdf[rdf["MOM"] > 0.0]
    yvals1_tr = rdf1["TRNERR"].values
    yvals1_vl = rdf1["VALERR"].values
    draw_4chart(xvals, yvals0_tr, yvals0_vl, yvals1_tr, yvals1_vl, 
               "Error vs Learning Rate and Momentum", 
               "Learning Rate", "Error Rate", 
               ["Trn Err (Mom=0)", "CV Err (Mom=0)",
                "Trn Err (Mom=0.9)", "CV Err (Mom=0.9)"])

def chart2():
    rdf = pd.read_csv(os.path.join(DATA_DIR, "results2.csv"), 
                      sep="\t", header=False)
    # split into momentum groups
    rdf0 = rdf[rdf["MOM"] == 0.0]
    xvals = rdf0["LR"].values
    yvals0_tr = rdf0["TRNERR"].values
    yvals0_vl = rdf0["VALERR"].values
    rdf1 = rdf[rdf["MOM"] > 0.0]
    yvals1_tr = rdf1["TRNERR"].values
    yvals1_vl = rdf1["VALERR"].values
    draw_4chart(xvals, yvals0_tr, yvals0_vl, yvals1_tr, yvals1_vl, 
               "Error vs Learning Rate & Momentum (w/Early Stopping)", 
               "Learning Rate", "Error Rate",
               ["Trn Err (Mom=0)", "CV Err (Mom=0)",
                "Trn Err (Mom=0.9)", "CV Err (Mom=0.9)"])

def draw_2chart(xs, ys1, ys2, title, xlabel, ylabel, legends):
    plt.plot(xs, ys1, label=legends[0])
    plt.plot(xs, ys2, label=legends[1])
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.legend()
    plt.show()
    
def chart3():
    rdf = pd.read_csv(os.path.join(DATA_DIR, "results3.csv"), 
                      sep="\t", header=False)
    xvals = rdf["DECAY"].values
    yvals1 = rdf["TRNERR"].values
    yvals2 = rdf["VALERR"].values
    draw_2chart(xvals, yvals1, yvals2, 
                "Error vs Weight Decay (w/Early Stopping)", 
                "Weight Decay", "Error Rate", ["Trn Err", "CV Err"])    

def chart4():
    rdf = pd.read_csv(os.path.join(DATA_DIR, "results4.csv"), 
                      sep="\t", header=False)
    xvals = rdf["HUNITS"].values
    yvals1 = rdf["TRNERR"].values
    yvals2 = rdf["VALERR"].values
    draw_2chart(xvals, yvals1, yvals2, 
                "Error vs #-Hidden Units (w/Early Stopping)", 
                "#-Hidden Units", "Error Rate", ["Trn Err", "CV Err"])    

    
chart1()
chart2()
chart3()
chart4()

The first three results mostly coencide with our intuition that the graphs should look like a hockey stick. However, in case of the number of hidden units, it seems like the error rate is lowest with 10 hidden units.

Finally, the final run completed with a training error of 0.267, validation error of 0.269 and a test set error of 0.257.

For me, this exercise was a way to get to understand the various knobs you can turn to get a NN to perform better, as well as a way to familiarize myself more with the Encog library. The NN here still needs to be tuned quite a bit - although the results are not terrible, handwritten digit recognition is a well-studied problem in this area and accuracies are in the high 90% range (ie under 0.1 error rates).