Network architecture selection

For Convolutional Neural Networks architectures following parameters are crucial:

  • number of layers, which defines network depth
  • kind of layer, as one of the following:
    • convolutional, for which number of filters that will be learned by network, filter size, padding and stride must be specified,
    • subsampling, with pooling operation (MAX, AVG) and filter size,
    • local response normalization,
    • dense, which is classic fully-connected layer, usually with some dropout regularization,
    • output, with softmax function, which produces a distribution over class labels
  • network input parameters, image width, height and number of channels (3 for rgb images)

Here, several different architectures are tested:

  • [CONV(3x3, 64) -> NORM -> SUB(2x2, MAX)] 2 -> [CONV(3x3, 64)] 2 -> SUB(2x2, MAX) -> FC(384)

  • [CONV(3x3, 64) -> SUB(2x2, MAX)] * 3 -> NORM -> FC(384)

  • [CONV(3x3, 32) -> SUB(2x2, MAX)] * 2 -> CONV(3x3, 64) -> SUB(2x2, MAX) -> NORM -> FC(384)

  • CONV(5x5, 25) -> NORM -> SUB(2x2, MAX) -> CONV(3x3, 50) -> SUB(2x2, MAX) -> NORM -> FC(400)

where CONV stands for convolutional layer, SUB subsampling layer, NORM - local response normalization layer and FC fully connected dense layer. Numbers in brackets refer to the most important parameters on each layer.

Definitions are stored in helper class, and look as follow:

    public static NetworkModel net1 = (learningRate, width, height, channels, numLabels) -> {

        int iterations = 1;

        int layer = 0;
        MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .iterations(iterations)
                .regularization(true).l1(0.0001).l2(0.0001)//elastic net regularization
                .learningRate(learningRate)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(Updater.NESTEROVS).momentum(0.9)
                .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
                .useDropConnect(true)
                .leakyreluAlpha(0.02)
                .list()
                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .nIn(channels)
                        .padding(1, 1)
                        .nOut(64)
                        .weightInit(WeightInit.RELU)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new LocalResponseNormalization.Builder().build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(1, 1)
                        .nOut(64)
                        .weightInit(WeightInit.RELU)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new LocalResponseNormalization.Builder().build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(0, 0)
                        .nOut(64)
                        .weightInit(WeightInit.RELU)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(0, 0)
                        .nOut(64)
                        .weightInit(WeightInit.RELU)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())
                .layer(layer++, new DenseLayer.Builder().activation("relu")
                        .name("dense")
                        .weightInit(WeightInit.NORMALIZED)
                        .nOut(384)
                        .dropOut(0.5)
                        .build())
                .layer(layer++, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(numLabels)
                        .weightInit(WeightInit.XAVIER)
                        .activation("softmax")
                        .build())
                .backprop(true)
                .pretrain(false)
                .cnnInputSize(width, height, channels);
        return builder.build();
    };

    public static NetworkModel net2 = (learningRate, width, height, channels, numLabels) -> {

        int iterations = 1;

        int layer = 0;

        MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .iterations(iterations)
                .regularization(true).l1(0.0001).l2(0.0001)
                .learningRate(learningRate)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(Updater.NESTEROVS).momentum(.9)
                .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
                .useDropConnect(true)
                .leakyreluAlpha(0.02)
                .list()
                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(1, 1)
                        .nOut(64)
                        .weightInit(WeightInit.VI)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(1, 1)
                        .nOut(64)
                        .weightInit(WeightInit.VI)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(0, 0)
                        .nOut(64)
                        .weightInit(WeightInit.VI)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new LocalResponseNormalization.Builder().build())

                .layer(layer++, new DenseLayer.Builder().activation("relu")
                        .name("dense")
                        .weightInit(WeightInit.VI)
                        .nOut(384)
                        .dropOut(0.5)
                        .build())
                .layer(layer++, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(numLabels)
                        .weightInit(WeightInit.VI)
                        .activation("softmax")
                        .build())
                .backprop(true)
                .pretrain(false)
                .cnnInputSize(width, height, channels);
        return builder.build();
    };

    public static NetworkModel net3 = (learningRate, width, height, channels, numLabels) -> {

        int iterations = 1;

        int layer = 0;

        MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .iterations(iterations)
                .regularization(true).l1(0.0001).l2(0.0001)
                .learningRate(learningRate)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(Updater.NESTEROVS).momentum(0.9)
                .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
                .useDropConnect(true)
                .leakyreluAlpha(0.02)
                .list()
                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(1, 1)
                        .nOut(32)
                        .weightInit(WeightInit.VI)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(1, 1)
                        .nOut(32)
                        .weightInit(WeightInit.VI)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(0, 0)
                        .nOut(64)
                        .weightInit(WeightInit.VI)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new LocalResponseNormalization.Builder().build())

                .layer(layer++, new DenseLayer.Builder().activation("relu")
                        .name("dense")
                        .weightInit(WeightInit.VI)
                        .nOut(384)
                        .dropOut(.5)
                        .build())
                .layer(layer++, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(numLabels)
                        .weightInit(WeightInit.VI)
                        .activation("softmax")
                        .build())
                .backprop(true)
                .pretrain(false)
                .cnnInputSize(width, height, channels);
        return builder.build();
    };

    public static NetworkModel net4 = (learningRate, width, height, channels, numLabels) -> {

        int iterations = 1;

        int layer = 0;

        MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .iterations(iterations)
                .regularization(true).l2(0.0005)
                .learningRate(learningRate)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(Updater.NESTEROVS).momentum(.9)
                .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
                .useDropConnect(true)
                .leakyreluAlpha(0.02)
                .minimize(false)
                .list()
                .layer(layer++, new ConvolutionLayer.Builder(5, 5)
                        .nIn(channels)
                        .padding(2, 2)
                        .nOut(25)
                        .weightInit(WeightInit.RELU)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new LocalResponseNormalization.Builder().build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())

                .layer(layer++, new ConvolutionLayer.Builder(3, 3)
                        .padding(1, 1)
                        .nOut(50)
                        .weightInit(WeightInit.RELU)
                        .activation("leakyrelu")
                        .build())
                .layer(layer++, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                        .kernelSize(2, 2)
                        .build())
                .layer(layer++, new LocalResponseNormalization.Builder().build())

                .layer(layer++, new DenseLayer.Builder().activation("relu")
                        .name("dense")
                        .weightInit(WeightInit.NORMALIZED)
                        .nOut(400)
                        .dropOut(0.5)
                        .build())
                .layer(layer++, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .nOut(numLabels)
                        .weightInit(WeightInit.XAVIER)
                        .activation("softmax")
                        .build())
                .backprop(true)
                .pretrain(false)
                .cnnInputSize(width, height, channels);
        return builder.build();
    };

NetworkModel is Java 8 functional interface:

    @FunctionalInterface
    public interface NetworkModel extends Serializable {
        MultiLayerConfiguration apply (double learningRate, int width, int height, int channels, int numLabels);
    }

results matching ""

    No results matching ""