The Theory and Practice of Neural Networks…

“You can talk for years about the theory of tying your shoes and still not be able to do it.”

-Terry Meeks, 1984

Introduction:

In their paper “Nonlinear signal processing using neural networks: Prediction and system modeling”, Alan Lapedes and Robert Farber explored training neural networks to predict the behavior of various non-linear data sets.  As I continue to explore deep learning, I thought it might be good for me to review some of their work.  Lapedes and Farber’s early work on training neural networks still applies to the disciplines of data science and deep learning.  In the process of reviewing the work of Lapedes and Farber, I also hope to repeat some of their work using modern tools, including brushing off my python skills, which I haven’t used for a few years.  I am also looking forward to working with Google’s TensorFlow and the Keras python framework.

Something to Predict:

Before we start to dig into neural networks, let’s set the stage a bit with an overview of what we’re trying to accomplish.  The goal of Lapedes and Farber’s work is to develop a mechanism that can predict the behavior of a nonlinear system.  The first nonlinear system explored by Lapedes and Farber in their paper is a sequence of numbers called “The Feigenbaum Map“.  The Feigenbaum Map is familiar to many people who have explored chaotic and/or dynamical systems, and is often accompanied by the following diagram:

Feigenbaum Map:  x(t+1) = 4 * x(t) * r * ( 1 – x(t) )

LogisticMap_BifurcationDiagramThis fun little formula has many interesting behaviors.  The key point for our exploration is that if we fix r, the Feigenbaum map starts to display some very non-linear behavior.  Consider the sequence of points for the map with r=4 over time:

Screen Shot 2017-05-31 at 7.25.29 AM

It may be obvious at this point that the sequence of numbers generated by the map seems fairly random.   To compute the sequence, you pick a number between 0 and 1.  (I picked 0.23 above.)  The next value in the sequence then becomes four times that number, multiplied by the value of 1 minus the number, or   x(t+1) = 4 * x(t) * ( 1 – x(t) ) .

Now, imagine that we don’t know the formula, and all we know are the data points. Getting from the data points back to the formula is very difficult.  There are a number of ways to do it, including polynomial regression.  However, even polynomial regression requires that you somehow suspect that the underlying system involves a polynomial of some degree.  When you have a sequence of data that is fairly evenly distributed, that can be very difficult.  Lapedes and Farber were able to show that a neural network could be trained to very accurately predict the behavior of the sequence. Let’s take a look at how they did it.

The Network:

The network that Lapedes and Farber used to “learn” the Feigenbaum Map looked like this:

Screen Shot 2017-05-31 at 7.58.57 AM

The above diagram shows the connections between the nodes.  It does not show, however, the weights associated with each connection, the activation functions in the nodes, or the biases associated with each activation function.  One interesting aspect of this simple network that Lapedes and Farber used is the “short circuit” between the input and output nodes.  Lapedes and Farber simply state in their paper that they “…chose a network architecture with 5 hidden units as illustrated.”  They don’t go into detail as to why they chose to include the connection directly from the input to the output node.  We will return to this observation later.  For now, however, we will just go with it.

The Process:

The process for working with neural networks typically is as follows:

  1.  Obtain a set of data that includes known inputs and outputs.  This is sometimes referred to as “The Training Set”.
  2. Construct a neural network that has a number of input nodes and output nodes that match the training set.  (For our example, we are using 1 input node, and one output node because our data has an expected output based on a single input data value.)
  3. Pass in the training set data and make adjustments to the network so that the network output matches the training set output to within some limit, or continue training for a set period of time.
  4. Once the network has been trained, you feed it new data that you don’t know the expected values.  The network then makes predictions based on its training.

This is just one simple way to use neural networks.  More common uses also include using neural networks to categorize and recognize patterns.  We will visit those topics in future blog posts.

Training and Back Propagation of Error:

So, what was that step 3 again?  Let’s consider a very simple neural network:

Screen Shot 2017-05-31 at 8.26.57 AM
figure 1

This network has the following parts:

  • Input (I):  This is the input value for the network from our training set.
  • Weight (w):  This is the weight that we multiply the input by.
  • Bias (b):  The bias is used to tune the activation function.
  • Activation Function (f):  The activation function takes as inputs, the input, weight, and bias.  For our purposes, we can use the sigmoid function of:

Screen Shot 2017-06-02 at 6.48.33 AM

  • Output (O):  The output of the network.
  • Target (T):  This is what we WANT the network to return.  It is our training value.
  • Error (E):  Error.  The is how far off the network was from our desired target.

We can think of the output of the network as follows:

O = f( (i*w) – b )        (formula 1)

And we can think of the error of a single pattern as:

E = ( T – O )^2         (formula 2)

Another way to think of this is just trying to minimize the distance between O and T.  We square the difference because we want our Error to be positive, and we will try and get the positive value as close to zero as possible.

So, substituting in our values for the above O, we have the following.

E =( T – f( (i*w) – b ) )^2     (formula 3)

Now, the cool part.  If our function, f, is a continuous, differentiable function, we can use a technique called gradient descent to minimize the error.  To do that, we simply take the partial derivatives of the above equation with respect to w and b.  For the partial derivative with respect to the weight, w, looks like this:  (Note:  we need to use the chain rule here).

Screen Shot 2017-06-01 at 3.11.50 PM
(formula 4)

Similarly, the formula for the partial derivative of the error function with respect to the bias looks like this:

Screen Shot 2017-06-01 at 3.13.33 PM
(formula 5)

In more complex neural networks, we can generalize the equations over the various nodes.  And, we can walk our way back up the network as we compute the various partial derivatives.  In this way, we are back-propagating the partial derivatives of the error function with respect to the weights and biases of the network.

In the case of the simple one node network in figure 1, we can actually visualize what we are doing as follows: What we have in formula 3 is a function of two variables w and b.  The values of T and I are the input and target values provided by our training set.  What we are trying to do is pick values of w and b such that E is as small as possible.  If we consider the function of w and b and the output E as a 3-dimensional graph, what we are trying to do is find the point on the surface that is the lowest.  (In this diagram, the Z values represent the output of our function or E.  The X and Y axis correspond to our input values of w and b.

Screen Shot 2017-05-31 at 8.50.31 AM

Our function is not as smooth as above, and it changes with every input-output pair.  That said, the graph does give an intuitive feel for what we are trying to do.

Keep in mind that we are trying to find the minimum of the above graph.  Because our error function is a continuous function, and we can take the derivative of it, we use gradient descent to find the direction to head in order to move towards the steepest “down”.  The partial derivatives that we calculated in the above equations point us in the direction of the minimum.

You can think of it like placing a marble on the graph, and seeing which way it would roll down to the lowest point.  In the above graph, the marble would roll right to the bottom.  This is a trivial case.  However, what if our error graph looked like this:

Screen Shot 2017-05-31 at 8.59.37 AM

In this case, we might literally get stuck in one of the wrinkles of the surface and not be at the global minimum.  We will incorporate the idea of using a “momentum” that will help us “roll out” of a local minimum until we settle at (what we hope) is the actual true minimum of the function.  The other technique that we can use is to pick up the marble and place it at a different spot on the graph.  We would do this by simply randomizing the weights and biases.  I like to think of that technique like pounding on the side of a pinball machine.

OK.  Enough of this trivial case.  Let’s go back to Lapedes and Farber’s first neural network.  It has considerably more weights and biases than our single node network.  Lapedes and Farber describe in their paper how to calculate all of the partial derivatives with respect to weights and biases.  That was then, this is now.

TensorFlow is a relatively new framework that takes care of a lot of the coding around the partial derivatives and the gradient descent algorithm required to train neural networks.  It offers many different activation functions and a number of different optimization algorithms in addition to gradient descent.  Another useful framework written on top of TensorFlow is Keras.  The Keras framework makes constructing and training neural networks even easier.

Code:

When I did my neural network research in graduate school, I wrote everything in C.  Back then, we didn’t have many of the more modern languages like .NET, or Java.  C was the “new hotness.”

Much of the programming work done today in the field of “Data Science” is done in python.  There are several python packages including “numpy” that make dealing with mathematical and scientific programming easier.  My original neural network code was several pages long.  The code for constructing and training a network similar to Lapedes and Farber’s neural network is much easier today thanks to TensorFlow and Keras. Let’s take a look:

# *****************************************
# A simple neural net using
# Keras and TensorFlow for
# training a neural netowrk to learn
# the Feigenbaum Map
#
# June 1, 2017
# Miles R. Porter, Painted Harmony Group
# This code is free to use and distribute
# *****************************************

import numpy
import pandas
from keras.models import Model
from keras.layers import Dense, Input
from matplotlib import pyplot

# load dataset
dataframe = pandas.read_csv("logistics.csv", delim_whitespace=True, header=None)
dataset = dataframe.values

# split into input (X) and output (Y) variables
X = dataset[:, 0]
Y = dataset[:, 1]

#Set up the network
def baseline_model():
    inputs = Input(shape=(1,))
    x = Dense(30, activation='sigmoid')(inputs)
    predictions = Dense(1, activation='sigmoid')(x)
    model = Model(inputs=inputs, outputs=predictions)
    model.summary()
    model.compile(optimizer='rmsprop',
                  loss='mean_squared_error',
                  metrics=['accuracy'])
    return model

# Do the work
model = baseline_model()
results = model.fit(X, Y, epochs=1000, batch_size=2, verbose=2)
print("Training is complete.\n")

# Make predictions
print "The prediction is:\n"
newdata = numpy.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
prediction = model.predict(newdata, batch_size=1, verbose=1)
print prediction

# Plot the predicted and expected results
expected = newdata * 4.0 * (1.0 - newdata)
pyplot.plot(prediction, color="blue")
pyplot.plot(expected, color="green")
pyplot.show()

Results:

The above program references a logistics.csv file.  That file is just a two column file that contains a sequence of numbers that start with 0.23 and follow the Feigenbaum Map like so…

0.23    0.7084
0.7084 0.82627776
0.82627776 0.5741712933
0.5741712933   0.977994477
0.977994477    0.08608511987
0.08608511987  0.314697888
...

After we have trained our network on the sample data, we can run some new values through it.  Because we are training to a known functions, we are able to compute just how accurate the predictions are:

Screen Shot 2017-06-01 at 7.30.21 AM

If we play around with the network parameters, like the number of nodes, and the number of passes through the training set (epochs), we can improve the accuracy:

Screen Shot 2017-06-01 at 7.43.22 AM.png

In fact, if we play around with the number of layers in the network, we can start to get some pretty amazing accuracy…

Screen Shot 2017-06-02 at 1.14.20 PM
Neural Network with 2 hidden layers of 10 nodes each, trained over 10,000 epochs with a batch size of 50.

Is this perfect? No.  But often these types of predictions don’t need to be.  The important thing here is that we are able to train a neural network to estimate the values of a seemingly random sequence.  In a sense, the network is able to find hidden meaning in what would appear at first to be random noise.  Again, it is true that you could use techniques like polynomial regression to discover or closely model the sequence.  After all, the underlying polynomial expression is simply f(x) = 4x(1-x).  Keep in mind, however, that our neural network doesn’t know that the underlying polynomial is a second-degree polynomial… or even that there is a underlying polynomial.

A valid argument that you can encounter with this approach is that it is entirely possible that the training algorithm for the network could get stuck on a local minimum and never actually converge.  In that respect, it is possible that you may NOT be able to train a neural network to learn a non-linear system, particularly if you make bad choices in the parameters that define the network.  That said, there is still significant value in neural networks.  While they may not be able to predict every sequence, or find logic where there is none, they CAN find logic in systems that seem random.  An algorithm doesn’t need to be perfect in order to provide value.  If it is a cloudy day, do you bring an umbrella, even though it may not rain?

There are a number of mathematical theorems that exist that address the convergence of training neural networks including A Generalized Convergence Theorem for Neural Networks by Jehoshua Bruck and Joseph W. Goodman.

A Little Black Magic Never Hurts:

In looking at the network that we trained above, you may notice a couple of things.  First, Lapedes and Farber’s network had an extra connection from their input node to the output node.  Secondly, Lapedes and Farber used a neural network that had a single hidden layer with 5 nodes.  Why 5 and not 3 or maybe 7?  The networks that I have trained above have either 10 or 30 hidden nodes and no connections between the input and output layers.  There is some black magic involved to know how to construct the networks in that way.  As we have also seen above, it is possible to also add multiple hidden layers to neural networks for various purposes.  As I continue to explore the topic of deep learning, I plan to blog about how to go about using various network topologies in order to learn different training sets.  Also, it is possible to vary the structure of the network as part of training.  I hope to touch on that as well.

Further Work and Looking Ahead:

The problem of “learning” the Feigenbaum map is just the beginning of what neural networks can do.  Lapedes and Farber go further in their article as they explore training neural nets on the Mackey-Glass Equation, and modeling adaptive control systems.  (Lapedes and Farber p.19-20)

Also, as I have mentioned before, much of the new work involving neural networks involves classification of data.  This not only includes simple time series data (for example, classifying ECG signals in an attempt to uncover heart arrhythmias) but also scenarios like classifying sounds and audio signals, photographs, movie clips, etc.  In those cases, we might be interested in discovering when observed data differed from a trained network prediction.  For instance, what if you trained a neural network to estimate the oil pressure of an engine based on the engine RPMs.  It would then be possible to continuously estimate the oil pressure based on the engine RPMs, and raise an alert when the actual pressure significantly diverged from the network’s prediction.

A good overview of deep learning in general includes Deep Learning – MIT Technology Review.

Another great resource is the Udacity.com course on deep learning:

https://www.udacity.com/course/deep-learning–ud730

And the book Deep Learning by Ian Goodfellow, Yoshua Gengio, and Aaron Couville:

https://github.com/HFTrader/DeepLearningBook

I hope you have enjoyed this post, and will check back for more interesting topics in software engineering, IoT, and deep learning.

Dedication:

Screen Shot 2017-06-01 at 10.28.58 AM

I was going to start this blog post with a song reference, but I just found out that one of my teachers in high school recently died.  Terry Meeks taught humanities, was a writer, a musician and a seeker of truth and beauty.  Though I haven’t seen her in years, the memories of her in front of the class teaching us how to write, to love music, and to appreciate art and literature are still vivid.  She was one of a kind.

I remember doing a presentation in her class that involved bouncing laser beams off speaker cones to create a sort of spirograph laser show.  I remember her being as interested in the trigonometry of sine and cosine waves as she was in enjoying the Bach Brandenburg Concerto…  which she was probably hearing for the thousandth time.  Mrs. Meeks was always open to attempts to bridge the gap between art and science.

Mrs. Meeks (Terry, as she instructed us to call her when we graduated from high school) had a big impact on me.  From her, I learned a little about how to write.  I also learned to love the search for beauty and truth.

Deep Learning

Way back in the day, when I was a grad student at Colorado State University, I took a class called Pattern Analysis from Michael Kirby (http://www.math.colostate.edu/~kirby/).  A part of this class was dedicated to the study of neural networks and specifically looking at some of the work of Lapedes and Farber of Los Alamos Labs. (https://papers.nips.cc/paper/59-how-neural-nets-work.pdf)

Whenever I think about the topic of neural networks, it is hard not to have the image of Star Trek’s Commander Data come to mind.

Data.jpg

Now that we have that out of the way…

The concept of neural networks dates back to the 1970s and before, and the basic idea is not overly complex.  A digital neural network mimics the structure of a biological neural with a collection of nodes (neurons) and edges (synapses).  As such, a digital neural network can be considered mathematically to be a directed graph.  Unlike a biological neural network, however, a digital neural network is frequently constructed in a very ordered, layered, architecture.

Data flows through a neural network from its “input” layer through various “hidden layers” and eventually appears at its “output layer”.  As the data flows through the network, “activation functions” at each node calculate the output of the node based on the sum of that nodes inputs.  Data that flows along the edges (synapses) of the network are multiplied by a weight prior to being added to the downstream nodes.

 

 

texStackExchange.jpg
Borrowed from Stack Exchange

 

Nodes and edges are combined into a larger network similar to what we can see below.

 

BigNet.jpg
Borrowed from Stack Exchange

Neural networks can be “trained” based on a known set of input and expected outputs. A sample from a training set gets passed into the network, and then the error is calculated based on the output of the neural network, and the desired output.  The error is then a mathematical function of the biases and weights associated with the nodes and edges of the network, and that function can be minimized using numerical analysis techniques like gradient descent.

 

Back in the late 80s and early 90s, the limits of computation power and smaller datasets limited the size and trainability neural networks.  Neural networks had a decline in popularity in the 90s and 2000.  Recently, however, advances in computational power, cloud computing and large amounts of data generated by social media and services like YouTube have sparked a huge resurgence in the popularity of digital neural networks as a computational and analytical tool for detail with data.  Today neural networks play a large part in many speech recognition and image recognition applications.  Over the summer, I will be continuing to explore this area and generate more blog posts.

I plan to follow the basic outline of the online this course:

https://www.udacity.com/course/deep-learning–ud730

And also use portions of this book:

https://github.com/HFTrader/DeepLearningBook:

My goal is to use this blog to track my research in this area.  While Deep Learning does not directly deal with IoT, I believe that the two can be used together.  I am still thinking about how this may come about…  Maybe Commander Data will make a re-appearance in this blog…

The Minitron!

 

IMG_3625

A few posts back, I started writing about building a miniature Jumbotron…  or Minitron.  I have finally managed to get enough of the pieces together to call this thing done.  There are still some rough edges, but it is close enough that I am satisfied that I completed what I set out to do and am now ready to move on.  🙂

Quick Background on Minitron

Minitron is a very small scrolling textual display.  The project was inspired by something that I saw at the 2016 Embedded Software Conference in Minneapolis.  I think that there may be several uses for the project, including not just scrolling text messages, but also possibly displaying things like news headlines, stock ticker symbols, or just about any kind of textual information that you want in semi-realtime.  Backing the hardware components is a small app running on Heroku.  That app allows users to sign-up, register their own “Minitron”, and define the messages they might want to display on the display.  The app allows for up to 12 messages to be predefined for each minitron.  The Minitron periodically communicates with the web app running on Heroku to update the message it is displaying.  The hardware includes a single push button that allows users to select which message they want to appear on their Minitron.

HardwareFirmware Components

The Minitron hardware consists these basic components:

IMG_3613

  1.  An Arduino Uno.
  2. An Adafruit Charlieplex Display (https://www.adafruit.com/product/2947)
  3. An Adafruit Charlieplex Display driver board (https://www.adafruit.com/product/2946)
  4. A Sparkfun Arduino ESP8266 WIFI Shield (https://www.sparkfun.com/products/13287)
  5. A tactile pushbutton (not shown).
  6. Various hookup wires (not shown).

Smoke Rising from the Bench…

The first step in building the minitron is to solder together all the components.  First, solder the backpack onto the charlie-plex display.  In order to do this, you should first solder the connectors to the display, and then solder on the backpack.  The tricky part of this process is to make sure that the pins are straight.  You can use a breadboard to hold the headers as shown below…  just be careful not to melt the breadboard.

Once you have the front soldered, I recommend soldering a header on the charlieplex backpack before you solder the backpack to the display.  See the image below.  This will allow you to plug the display assembly directly into a breadboard, or you can use the header to plug wires in.  The later will come in handy if you decide that you want to have the display in a “landscape” orientation vs “portrait”, or if you want to run some wires so that you can position the display in a different location.

Once the headers have been soldered to the chaliexplex display, solder the backpack onto the back.  Make sure that you can read the printing on the back.  In other words, pin 1 on the display should match up with pin A1  on the driver board, and pin 9 on the display should match up with pin B9 on the driver board.   The finished display and driver board with the header should look like this:

Now that we have the display ready to go, we need to assemble the Sparkfun ESP8266 Wifi shield.  This consists of soldering on the headers to the shield.  When soldering the headers on, make sure that the female end is pointing up, and that on the digital side, you leave the RX, TX (two left most pins with the silkscreen oriented up) and the SDA and SCL pins open.  See below:

 

IMG_3623 2

Once the WiFi shield is complete, attach it to the Arduino, and then wire as shown below.

Screen Shot 2017-05-21 at 1.10.10 PM

 

NOTE:  THE SPARKFUN ESP8266 WIFI ARDUINO SHIELD IS NOT PICTURED.  Simply plug the shield into the Uno as you would expect.  The headers are exactly the same.  Also, note that the Charlieplex daughter board (https://www.digikey.com/catalog/en/partgroup/is31fl3731-adafruit-16×9-charlieplexed-pwm-led-matrix-driver/59819?WT.srch=1&gclid=CPSi9Pvj_NMCFR62wAodANoMkQ) is shown, but not the actual display.  That needs to be assembled and plugged into the driver as described above.

Software Components

Screen Shot 2017-05-21 at 3.54.22 PM

To understand the software involved with the Minitron, let’s start with a basic user workflow.

  1.  A user logs into the Minitron administrative app via a browser (minitron.herokuapp.com)
  2. After registering, the user registers a device name on the site.  Once the name has been successfully recorded, a device code is returned to the user.  This code is then used in the Arduino program (sketch) that will run on the users Minitron.
  3. The user is able to use their browser and the Arduino web app to program 12 different messages (0-11), that they will want to display on their minitron.
  4. The user programs their minitron Arduino using the program provided in the GIT repository,  and fills in the SSID and password of their Wifi network, along with the device code they obtained from the website.
  5. Once programming is complete, the minitron makes a call back to an endpoint in the web application that includes the device code and a message number.  The server responds with a message that the minitron then displays.
  6. If the user wishes to change the message displayed on the minitron, they simply hold down the button on the minitron.  The minitron checks to see if the button is being pressed every three times it displays a message.  If the button is down, the minitron starts to display the number 0-12 in sequence.  When the user sees the number of the message they want to be displayed, they let up on the button.

There are two basic pieces of software involved with this system: the Minitron Arduino program, and the web application running on Heroku.  Both are stored in the Minitron GIT repo.

https://github.com/fractalbass/minitron

Arduino Code:

The Arduino code, which is in the GIT repo above will need to be flashed onto the Arduino.  In order to do that, you will need the  Arduino IDE.  There is a web-based tool for Arduino.  I have not used it, so cannot comment on it either way.  You will need to have the ESP8266 Shield libraries and the Adafruit Charlieplex Driver library installed in the Arduino IDE.  You can find information about those on the links at the beginning of this article for those products.

You will need to make some modifications to the code before you upload it to your Arduino.  Those modifications include setting your wireless SSID and Password, as well as the device code for your minitron.  Refer to the workflow described for more information on the device code.  You will need to should register your Minitron first on the web app before program the Arduino.  Registering the Minitron with the web app will generate a device code that you need for the programming step.  Below are the lines that you will need to modify in the Arduino program.

...
// Replace SSID and PWD with the appropriate values for
// your WiFi network.
const char mySSID[] = "PUT_YOUR_SSID_HERE";
const char myPSK[] = "PUT_YOUR_NETWORK_PASSWORD_HERE";

// Replace DEVICE_CODE with your device code.
const char deviceCode[] = "PUT_YOUR_DEVICE_CODE_HERE";
...

Web Application Code:

The web application code is included in the GIT repo mentioned above.  If you simply want to build your own Minitron, you don’t need to worry about it.  The app is up and running on Heroku and available for you to use…  at least until too many people start to hit it and I need to start paying for the app.  If that happens, I would be surprised.  That said, my plan is simply to email folks that have registered devices, and ask them to make a donation to help pay for the next level of Heroku.

The web app is written in Groovy/Springboot.  It involves a very basic Postgres database.  I tried to test drive the app…  that said I developed it solo.  Without a pair to keep me honest, I am afraid that there may be some holes where I didn’t do a good job of test driving things.  Feel free, if you would like, to fork the app and make a pull request for any changes/enhancements.

One thing you may notice in the app is that I have included a docker image that can be used to bring up the underlying Postgres database for integration testing.  Please check out my recent blog post https://pragmaticiot.wordpress.com/2017/03/05/springboot-integration-testing-with-dependencies-running-in-docker/ for more information on that subject.  Credit should go to Thom Dieterich for teaching me a ton about using docker in this fashion.  (As well as for teaching me tons of stuff about groovy, pairing, and XP in general.)  Though I didn’t work with him on this project, his ideas and practices have heavily influenced the web server portion of Minitron.

Thom is a wizard dev, and I had the great pleasure of working with him at Bluestem Brands and PeopleNet over the past couple of years.  Thanks, Thom.

The code for the Minitron application is free to whoever wishes to use it.  Please feel free to download/clone/fork the repo.  All I ask is that you give credit (or blame) when referencing the code or app…  and you consider offering me (and Thom) a super high paying gig at some point in the future.  🙂

Conclusion:

There are several interesting things that I learned while working on this project.  They include:

  1.  The Sparkfun ESP8266 Wifi Shield is a great way to used ESP8266 with your Arduino projects.  I tried, initially, to use an Adafruit FeatherIO board, and a separate ESP8266.  While the FeatherIO board worked great with the charlie plex displays, using it with the ESP8266 was a pain.  In general, I have not had much luck with the raw ESP8266 devices.  The Sparkfun Shield, however, was a dream to work with.
  2. This app demonstrates:
    1. Test driven development.
    2. Programming a RESTful API with SpringBoot and Groovy.
    3. Docker (see my previous post.)
    4. Arduino development, and some basic Arduino circuit design
    5. Basic electrical component assembly.
  3. This app DOES NOT DO JUSTICE TO SECURITY.  There are a number of shortcomings to this project in terms of security.  User passwords are encrypted in the database, however, messages to and from the server are not.  This is a major problem, and as a result, this application should NOT BE USED for any kind of medical, financial, or any use case, unless the security issues are addressed.  Specifically:
    1. Messages should be encrypted.
    2. Arduino code should be required to provide additional authentication information, rather than just a device code.
    3. The web application should be modified to be

I hope you enjoyed this post and will consider building your own Minitron!!!

Springboot Integration Testing with Dependencies Running in Docker


Intro

Way back in the beginning of this blog, I wrote about the definition of the term “Pragmatic”.  From the all-knowing Google, we have this definition:


prag·mat·ic
praɡˈmadik/
adjective

– dealing with things sensibly and realistically in a way that is based on practical rather than theoretical considerations.


screen-shot-2017-03-06-at-7-14-22-am
Getting Started with Docker

One of the best ways to achieve this definition is in software is to be able to demonstrate what our code does.  If we can demonstrate that our code works, it becomes a real thing and is no longer theoretical.  If we can make sure that our code works in a way that is easily repeatable, all the better.  I have, in past post, written about the value of testing and I won’t go into it again here.  The important point is that a sure path to writing pragmatic software is to write tests that can be repeated.

 

In recent years, Docker container have become increasingly popular and, fortunately, increasingly stable and easier to use.  Docker containers can be thought of as a sort of light-weight virtual machine that helps developers standardize dependencies and environments that their code runs in.  While Docker containers can be used in a number of different ways, one way that I am particularly fond of is using them to spin up application dependencies while running local integration tests.dependencies

To be clear, there is still some effort that goes with using docker containers.  They are not a silver bullet, and they are not magic.  For example, they don’t somehow magically provide your development machine with more CPUs or more memory that it already has.  Also, spinning up the containers also raise additional considerations, not the least of which is managing networking and ports.  That said, I believe that getting docker containers work as part of our integration testing strategy is not a bad idea.  So, let’s take a little bit of a look at how to go about doing that.

The next few steps will discuss installing and running Docker.  For a more complete overview, check out https://docs.docker.com/engine/getstarted/.

Install Docker for Mac

The first step in working with Docker on the Mac (my platform of choice) is to install Docker.  Fortunately, this has become much much easier with the advent of the native release of Docker for Macs.  Prior to this last release, Docker and a Mac meant using docker-machine and virtual box.  Essentially this was running docker in a VM on Mac OS X.  It was a bit complicated.  Now, however, installing docker is very simple.  For more info, check out:  https://www.docker.com/get-docker

Create Dockerfile

Once you have docker on your machine, there are basically two steps involved in using it.  The first step is to build a docker image.  Images are built based on definitions that are stored in a “Dockerfile”.  For the minitron app, I am using Postgres as my database engine.  As it turns out, there is a pre-build Postgres docker file that I can use…

https://hub.docker.com/_/postgres/

Once I have the file, the next step is to build the docker image.

Note:  I could have also downloaded an image from Docker Hub.  See the tutorial mentioned above for examples and info on how to do that.

Build Docker Image

A “Dockerfile” defines how a docker image should be constructed.  Once an image is built, we can re-use that image as much as we like.  This makes using docker fairly speedy.  In order to build the Postgres image from its docker file, we can issue a command like this:

(From the same directory as the Dockerfile I am using.)issue


> docker build .

This will kick off the process of building the new image.  It will take a while, but only the first time we wish to build these images.

In a bit, I’ll discuss a bit more about how we are going to use this docker image as part of our integration testing.  However, just running postgres is not going to be enough to support our integration tests.  The tests will depend on having database tables created as well.  Here we have a couple choices.  We could, if we choose, build the creating of the tables into the docker image that we are using.  We could also use the initialization process of the docker images when they are run via a technique called entry-points.  Lastly, we could handle the creation of the database tables our selves in code.  Since this is a fairly small application, I am going to manage the database structure in the code.  That said, I am not suggesting that is always the best way to go.  Database schema management is an issue all unto itself and is beyond the scope of this blog post.  One thing that I will mention is that there has been increasing discussion and use of Liquibase as a tool for managing database schemas.  I’ll let you read more about it here…  http://www.liquibase.org/

Write the Database Creation Script

For the minitron database, we will have essentially 3 tables.  Each table will have a primary key (that we will generate in code via UUID) and possibly some uniqueness constraints on columns.  I wrote a very simple helper class that connects to postgres and then reads thru and executes sql statements in a file.  (I could have used the psql CLI for that, but since I am going to be creating and tearing down the database from tests, this seemed to be a reasonable approach.)  So, here is what our databaase schema looks like:


DROP TABLE IF EXISTS mtUser;

CREATE TABLE mtuser(
   userId text,
   email text,
   password text,
   PRIMARY KEY( userId )
);

CREATE UNIQUE INDEX user_key ON mtuser USING btree (userId);
CREATE UNIQUE INDEX email_key ON mtuser USING btree (email);

DROP TABLE IF EXISTS device;
CREATE TABLE device(
   deviceId text,
   deviceCode text,
   userId text,
   PRIMARY KEY( deviceId )
);

CREATE UNIQUE INDEX device_key ON device USING btree (deviceId);
CREATE UNIQUE INDEX deviceCode_key ON device USING btree (deviceCode);

DROP TABLE IF EXISTS message;
CREATE TABLE message(

   deviceId text,
   channel integer,
   message text,
   messageId text,
   PRIMARY KEY( messageId )
);

CREATE UNIQUE INDEX message_key ON message USING btree (messageId);

Switching back to docker…

If you recall from above, I created a docker image that holds my postgres server.  In order to test my script above, I can fire up docker from a command line and use a GUI client ( or CLI) to see if my database creation script works.  I can do that with this command (which is all on the same line…)


docker run –name some-postgres -e POSTGRES_PASSWORD=secret -d -p 5432:5432 postgres


Now from my GUI (in my case, Postico) I can debug the script above.

screen-shot-2017-03-05-at-2-36-47-pm

Start Docker from a Springboot Integration Test

At this point, I have my Dockerfile, my image, and a script to create the database.  All I need to do now is to incorporate all of this into a test.  Easy right?  Well….

Actually it is not too  bad.  Let’s start with the test:


@DockerDependent
@SpringBootTest(classes = MinitronApplication.class,
webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
class UserControllerSpec extends Specification {

@Shared
def RESTClient restClient

@Value(‘${local.server.port}’)
int port //random port chosen by spring test

def setupSpec() {
DatabaseUtil dbUtil = new DatabaseUtil()
dbUtil.createDatabase()
}

def setup() {
UserDao userDao = new UserDao()
userDao.clean()

restClient = new RESTClient(“http://localhost:${port}/”)
restClient.handler.failure = { resp -> resp.status }
}

def ‘I can save a user.'() {
given:
restClient != null

when:
def resp = restClient.post(path: ‘user/mporter@paintedharmony.com’,
body: ‘thisIsAPassword’,
requestContentType: ContentType.JSON)

then:
resp.status == 201

}


A couple of notes:

The @DockerDependent annotation does all of the heavy lifting in order to fire up my database.  I am using a docker-compose.yml file to specify the parameters of my database much in the same way that I used command line switches when I did docker run.  My docker-compose.yml looks like this:

version: '3'
services:
# ------------------------------------------
  postgres:
    image: asimio/postgres:latest
    ports:
      - "5432:5432"
    environment:
      POSTGRES_PASSWORD: secret
      POSTGRES_USER: postgres

Because this is a yaml file, the spacing is very important!

One the database is up and my annotation is finished, I initialize the database with the help fo the databaseUtil file.

Finally, when we are all finished, the @DockerDependent annotation tears down everything and makes sure that docker is done.

I have found that, when playing with docker, it is very important to keep track of what docker containers are running, and also what networks have been configured.  To that end, it is somethings helpful to have a script that can clean up docker stuff that gets left hanging around.  Here is one that I am currently using from time to time…

#!/bin/bash

echo “stopping and removing ALL docker containers”
for dockerid in $(docker ps -qa); do
echo -n “stopping ”
docker stop $dockerid
sleep 2
echo -n “removing ”
docker rm $dockerid
done

echo “cleaning out volumes”
docker volume rm $(docker volume ls -q)

echo “cleaning out dockertest networks”
docker network rm $(docker network ls -qf ‘name=dockertest*’)

Conclusion

Docker is a very powerful tool that can really help with developing services and microservices.  While it plays a huge role in the container world, it can also be an appropriate technology to use to spin up dependencies required by integration tests.

As always, source code for this app (which is a work in progress) can be found here:

https://github.com/fractalbass/minitron

All Under One Roof.

vikings-stadium-topping-2I live in the western part of Minneapolis metro area, and I don’t often head to the eastern part of our fair city.  Because of this, I am often struck by things that have been added/changed in the time since my last visit.  Like…  the huge new sports stadium on the eastern side of downtown Minneapolis.  I am not going to comment (much) on building such giant structures with public money.  Regardless of what you think about such things, it is hard not to be impressed with the size and rate at which the new stadium was built.

Having the ability to bring things up and down quickly as we develop software applications is a tremendous advancement in the past several years.  Containers and Docker provide us with excellent tools to test and even deploy applications without worrying about specific concerns of hardware, or with dealing with the old-day concerns of giant virtual machine sizes.  Check out the tutorials on the Docker site, if you are interested in more details.  Inside that fancy new stadium, you will quickly notice this…

071616-n-mcb-stadiumminnesota01

The new Vikings (or should I say  US Bank) Stadium Jumbotron.  Being that I like to build things, I decided to try and build my own…  but on a much smaller scale.  I give you the “minitron”…

img_3381-2And, yes, that is a small part of an “S” scrolling by on the screen.  The idea for this little guy goes beyond just the “charlieplexed” LED display and an Arduino Uno.  There is also a SparkFun ESP8266 module involved and, of course, a nice little Heroku app backing it.  The goal of the project is to create more of these little minitrons, and allow them to call back into the Heroku app to pick up messages to display.  This is all fronted by a web application that allows users to log in, register their own minitron (each one comes with its own preset device code) and enter in up to 16 different messages that can scroll by on the screen.  The user can also custom-draw a 9×16 pixel image that will be displayed on the screen too.

So, how do we go about building this kind of thing?  Lets start with the web application.

I have written a very simple Springboot 1.4 application that will be hosted on Heroku…  because it is free.  I am storing the data in Postgres (because that is a no-brainer when dealing with Heroku.)  I will post more about the hardware and Arduino code later, but for now, let’s start with some thoughts regarding testing.

If you have read any of my previous blog posts, you know that I am a pretty big fan of TDD.  Let’s briefly touch on the different kinds of tests that I have been working on; DAO, Service, and Integration tests.  I will also mention a few words about using docker for test dependencies.

DAO Tests

As with most web-apps, the minitron web app needs to access a data repository.  In this case, that repository is Postgres.  There are a number of ways to accomplish this with spring.  For the case of the minitron app, I have decided to go with a pretty basic implementation of JDBC for accessing the data because the data store will only have 3 tables, and the object to relation mapping is extremely simple.  In order to access the database, I will be using preparedStatements.  This provides me with an excellent opportunity to test those classes.  Here is a sample test:

class DeviceDaoSpec extends Specification{

    DeviceDao deviceDao
    Connection connection

    def setup() {
        deviceDao = new DeviceDao()
        connection = Mock(Connection)
        deviceDao.conn = connection
    }

    def 'I should be able to save a device'() {
        given:
        UUID deviceId = UUID.randomUUID()
        UUID userId = UUID.randomUUID()
        Device device = new Device(userId: userId, deviceCode: "xyz123", deviceId: deviceId)
        PreparedStatement preparedStatement = Mock(PreparedStatement)

        when:
        deviceDao.save(device)

        then:
        1 * connection.prepareStatement("insert into device (deviceCode, deviceId, userId) values (?,?,?)") >> preparedStatement
        1 * preparedStatement.setString(1,"xyz123")
        1 * preparedStatement.setString(2, deviceId.toString())
        1 * preparedStatement.setString(3, userId.toString())
        1 * preparedStatement.execute()
        0 * _
    }

Clearly, this is a Spock test.  (http://spockframework.org/spock/docs/1.1-rc-3/index.html)  I have recently been using Spock more and more for unit testing.  (Which is what this test is) because of the great built-in mocking features included with Spock.  Note that the test follows the standard “Given, When, Then” format, making it painfully easy to read.  A quick note about the 1 * and 0 * stuff included in the then section…  for lines with a 1 *, the test will assert that that statement gets called only one.  The 0 * _ asserts that nothing else gets called.  I really find this useful when testing these types of systems.  It is very easy to test that something subsequent thing happens when you call a method.  Testing that nothing else did, can be a bit more tricky…  the 0 * _ feature of Spock mocks make this much easier.

Service Tests

The service layer tests for the application mirror pretty much the DAO layer, except that rather than confirming that we are generating the expected prepared statements, in the DAO layer we are testing slightly higher level business logic.  In the minitron application, there is a pretty close matching between the Service and DAO level tests and methods.

Integration Tests

Integration testing gets a bit more interesting.  I could, and maybe should, write tests for the individual controllers, and mock out the behavior of the underlying service layer.  I decided to skip those tests for now, as the same code will end up being tested with a full-blown integration test.  Those integration tests will us RESTClient to send data to the controllers via the running application, and assert on expected data returned…  and possibly on the presence, or lack of data in the underlying database.  I initially started working on this with the database actually running locally.  That is not a bad approach, and gives us the ability to move quickly.  However, it is not the greatest approach either in the sense that we now have to worry about cleaning and restoring the database as part of the test framework.  Also, we will require any subsequent system that wishes to run the tests (other developers, or CI/CD servers) to run those dependencies as well.  Enter Docker….

Docker

To solve the problem of having a dependent system database system up and running, I have decided to create an annotation that I can use with my springboot integration tests that will bring up and initialize a Postgres database (with the appropriate users, tables, etc) inside of a docker container.  Stay tuned to my next blog post for more details on those steps.

In the meantime…  get out and enjoy the wonderful weather this weekend.  Who says we need an enclosed stadium in Minneapolis, anyway?!

Note:  as this project continues, please feel free to check out the public GIT repo:

https://github.com/fractalbass/minitron

 

Keep it Simple… and Safe

Ikken Hissatsu is a philosophy that is a part of Shotokan Karate and other martial arts.  It translates to something similar to “Kill with one strike”.  It is not a complicated idea.  Basically, it means execute every strike (or kick) with the intention of ending the fight.  In other words, don’t get fancy.  In software development, there is a similar saying…  Just do the simplest thing that can possibly work (if you’re not sure what to do yet).  The other day, I came across a coding problem called “looksay”.  Consider the following sequence.

1
1,1
2,1
1,2,1,1

The challenge is to come up with the next line in the sequence.  The pattern is derived based on “saying” what is on the previous line.  For instance, the second line is “1, 1”, or “one one” because there is one “1” on the previous line.  The third line is “2, 1” because there are two ones on the second line.  You can see my solution here:

https://looksay.herokuapp.com/

My solution is not the only way to way to solve this problem.  There is a way to solve this with a regex expression all in one line.  The problem with that solution is that most developers would have a heck of a time figuring out what the regex did.  One line of regex is an elegant solution…  but absolutely NOT the simplest thing that could possibly work.

Now, consider the opposite problem…  I was talking to a fellow developer just the other day about a job candidate he had just interviewed.  One of the questions my co-worker asked the candidate was “How do you avoid SQL injection in Java?”  The answer my co-worker was looking for was “with prepared statements.”  The candidate he was interviewing didn’t know the answer.  He also didn’t get the gig.

A little closer to home, I came across some code where I was seriously tempted to leave something too simple.  Consider the following method from the EKG Field Monitor project.  This is from a DAO that saves strips to the database:

public List<Strip> getStrips(UUID monitorId){
    jdbcTemplate = new JdbcTemplate(dataSource);
    String sql = "select * from strip where monitorid = ?";
    List<Strip> stripList = jdbcTemplate.query(
            sql, new Object[] { monitorId },
            new RowMapper<Strip>() {
                public Strip mapRow(ResultSet rs, int rownumber)
                        throws SQLException {
                    Strip s = new Strip();
                    s.setMonitorId(UUID.fromString(rs.getString("monitorId")));
                    s.setUploaded(rs.getDate("uploaded"));
                    s.setID(UUID.fromString(rs.getString("id")));
                    return s;
                }
            });
    return stripList;
}

This code uses a prepared statement to help prevent a SQL injection attack.

I could potentially have problems if the code looked like this:

 String sql = "select * from strip where monitorid = '" + monitorId + "'";

For more information about SQL Injection, check out this great resource:  https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet

OWASP is a wonderful resource on security issues related to software development, and I.T.

So, back to the main point of this post.  I believe that we should always strive to keep our code as simple and readable as possible.  As developers, we need to balance that simplicity with safety.  Using prepared statements for SQL calls is not the absolute simplest thing that can work…  however it is the safest thing that can work.  Furthermore, it really isn’t all that complicated.  Using regex to solve complicated parsing problems can result in elegant solutions.  Unfortunately, it is often not the simplest thing that can work.  As my friend likes to say… “You solved it that way?  Great.  Now you have 2 problems.”

In software development, like Shotokan Karate, the prettiest solution may not be the right one.  A simple solution may not get the job done either.  Solving problems completely, and in an understandable way leaves you in a good spot…  just in case you (or the next person to come along) need a second or third attempt to end the battle.

Chef Tell, Code Smells and Entropy

Chef Tell

Chef Tell

Back in the day, my family used to watch Chef Tell on TV. Freidman Paul Erhardt was Tell’s real name.  He was a german chef that had a syndicated series of cooking shows.  I remember watching him on a short segment of the  local evening news in Denver.  Tell had a wonderful quote that he used often, and always had my family, and particularly my mom, in stitches.  “It smells so good, I wish they had Smell-o-vision!”  I was sorry to learn as I was looking him up that he died in 2007.  He was a character, and always appeared to have so much joy in what he was doing on TV.

Smelly Code

I am going to write about a different kind of smell today.  Martin Fowler credits Kent Beck for coining the term “code smell”.  Fowler used the term in his book, Refactoring.

I had one code smell in particular after my last post that has been bothering me a bit.  Below is a small piece of the DDL (Database Definition Language) from the EKG Field Monitor app.  This code is responsible for creating the table that holds the EKG Monitor information:

 

CREATE TABLE monitor
(
    id INTEGER default nextval('monitor_id_seq'::regclass) NOT NULL UNIQUE,
    uuid VARCHAR(40) NOT NULL
);

Code smells are often the first indication of deeper problems.  The smell that I noticed was related to having two “IDs” in this table essentially doing the same thing.  After thinking about it a bit, I started to realize several other issues.  Isn’t there a bit more to a monitor than just the “ID”.  Also, why am I using a VARCHAR(40) to store a UUID?  Isn’t there a built-in type for that?

There is no shortage of post out there about using UUID or GUID over autogenerated keys.  Here is an example that dives into some of the performance considerations and myths of using UUIDs.

After doing some research and refactoring, my code to create the monitor table now looks like this:

CREATE TABLE monitor
(
    id UUID NOT NULL UNIQUE,
    description VARCHAR(250)
);

That’s a lot nicer to look at.  I continued refactoring and removing the auto-generated IDs from the tables and correcting the database creation script.  The script is now a lot shorter and cleaner…

drop table if exists public."sample" cascade;
drop table if exists public."sampleset" cascade;
drop table if exists public."strip" cascade;
drop table if exists public."monitor" cascade;
drop sequence if exists public.monitor_id_seq cascade;
drop sequence if exists public.strip_id_seq cascade;
drop sequence if exists public.sampleset_id_seq cascade;
drop sequence if exists public.sample_id_seq cascade;

CREATE TABLE monitor
(
    id UUID NOT NULL UNIQUE,
    description VARCHAR(250)
);

CREATE TABLE strip
(
    id UUID NOT NULL UNIQUE,
    uploaded TIMESTAMP DEFAULT now(),
    monitorid UUID NOT NULL,
    CONSTRAINT strip_monitor_id_fk FOREIGN KEY (monitorid) REFERENCES monitor (id)
);

CREATE TABLE sampleset
(
    id UUID NOT NULL UNIQUE,
    stripid UUID NOT NULL,
    CONSTRAINT strip_set_id_fk FOREIGN KEY (stripid) REFERENCES strip (id)
);
CREATE UNIQUE INDEX sampleset_id_uindex ON sampleset (id);

CREATE TABLE sample
(
    val INTEGER NOT NULL,
    id UUID NOT NULL,
    time INTEGER NOT NULL,
    samplesetid UUID NOT NULL,
    CONSTRAINT sample_sampleset_id_fk FOREIGN KEY (samplesetid) REFERENCES sampleset (id)
);

Micro and Macro Refactoring

When I ran my tests again, all kinds of things broke.  That is not surprising, and it is a GOOD thing.  I had written my tests before to help me better structure my code.  Now my tests help me with regression.   As my app continues to evolve, I will continue to need to go back and re-work things that  have done in the past.  In a small way, I am doing lots of little cycles of writing tests, fixing code, in much the same way that on a macro scale, I am developing features, and evolving the app.  It is all very fractal.

Mandel_zoom_07_satellite

Entropy and Overcoming Disorder

The second law of thermodynamics that basically states that all closed systems tend toward maximizing disorder.  Welcome to entropy.

Coding and software development are a constant battle against this law.  But, just as planes and birds can fly in defiance of gravity, our computer code does not have to become a jumbled mess.  However, just as birds and planes have to expend energy to overcome gravity, we also have to spend energy as developers to overcome the tendency of our code and our systems to become jumbled messes.

Writing and running tests are a tremendous tool that we can use to help be as efficient as we can in keeping things organized.  It is probably worth noting that clean organized code is not just something to do because it looks nice.  Clean code is easier to understand thatn messy code.  Code that is easier to understand is less likely to break.  And, when the code does break, clean code is much easier to fix.  Finally, fixing clean code takes much less time.

entropy
Chaos by Jean-claude Berens