One of the most effective ways to bring a software project to success is to aggressively pursue simplicity throughout the entire design. Simplicity, as Rich Hickey and the dictionary define it, means not intertwined. It’s a piece of criteria than can be objectively assessed. The best designers have a multitude of different techniques at their disposal to achieve this goal at every layer of a system.
At the highest level, usage of queues, transactions, commutativity, and immutable data structures help us to architect beautiful systems compromised of multiple programs. Further down the stack, pure functions, communicating sequential processes, and software transactional memory guide us to the promised land of decomplected. Perhaps even a little below this level lies the challenge of effectively designing for error handling, logging, monitoring, metrics, and conditions. What constructs does the masterful designer turn to for fending off this potential complexity?
I think the answer to this question is that many of us have simply given up. I’ve seen too many software projects littered with functions like the following:
It’s my contention that the small things often end up drowning us in complexity. This hypothetical function, while only 10 lines, is hopelessly complected in at least 7 ways. Aside from actually executing a query and returning its results, it also performs logging, precondition validation, records metrics, logs error states, handles error conditions, and determines error values.
In my opinion, most developers choose to ignore the littering that ensues from this approach because aspects like logging and metrics tend to usually not affect the output of the function. I’m not saying this is actually the case, because excessive logging can fill up the local disk, or metric calls can raise socket errors. But the vast majority of the time, less skilled developers are operating in the closed mode and care only about the fact that the output is typically as expected.
We know better.
"But no matter what technology you use …, the complexity will eventually kill you. It will kill you in a way that will make every sprint accomplish less - most sprints be about completely redoing things you’ve already done. And the net effect is that you’re not moving forward in any significant way." - Rich Hickey
To these challenging aspects, I have devised a solution that dramatically reduces complexity. About a year and a half ago, I invented and open sourced Dire. Dire provides decomplected, ad-hoc error handling, conditions, and a few other helpful constructs. Dire was, in my opinion, a step in the right direction.
Perhaps the most interesting thing one can do with Dire is conditional loading of function modifiers. This technique pleasantly lets you compose loggers, exception handlers, conditions, and other common aspects. The recent API addition for removing these function modifiers means that runtime decisions can be made to add or drop aspects.
This technique was criticized by others. With the flexibility of runtime composition, we lost temporal control over who, and when, these aspects were being added or removed. I spent a couple of months in search of a remedy to this incidental complexity. I never came up with an answer.
Sometime later, I was happy to hear that Stuart Sierra opened sourced a library called Component. Stuart’s library provides just enough structure to control the stateful parts of a program in a way that’s particularly easy to understand. Conceptual components can be composed, dependencies made explicit, and set up/tear down convenient from the REPL. Stuart unknowingly finished the puzzle that tortured me.
The following Gist shows how we can capture each aspect as a Component record (it’s on GitHub too). This allows for aspects to be composed and returns the temporal control that we previously lost. I’ve used this technique of Dire in combination with Component many times over the last few months with great success. In particular, this makes a stellar way to slot Riemann metrics reporting into your program without mucking up your application logic.
I hope this helps you build genuinely simpler programs.
My name is Michael Drogalis. I’m an independent software engineering consultant and contractor. Interested in working with me? Send a tweet over to @MichaelDrogalis or an email to mjd3089 at rit dot edu. I’d be delighted to hear from you.
There are many techniques for parallelizing the execution of programs. We’ll be looking at just one in this post - pipelining. I’ve constructed a little idiom you can follow when this design technique is appropriate.
Pipelining is a method that can be applied to tasks that meet two criteria:
1. The task be broken into subtasks.
2. No two subtasks can be executing at the same time.
The Wikipedia article about pipelining as a concept is pretty good. Optionally you can just keep reading, and it’ll probably make more sense if it doesn’t now.
Here are three functions that we chain together. Very straightforward:
There’s no need to parallelize that. Let’s imagine that those functions perform side effects that take a while:
It takes 3.5 seconds to execute this chain of functions.
What if function ‘m’ could be executing in parallel while ‘n’ and ‘o’ are too? More work can be accomplished faster. The only thing we want to avoid is ‘m’ running more than once at a single point in time. The same constraint applies to ‘n’ and ‘o’.
And we see the output here, showing how function ‘m’ can do a lot more work, with function ‘o’ catching up at the end:
It works by making a channel for each function. We then spin up infinitely looping go-blocks that wait for something on the channel. The function is applied to the channel contents, and placed on the next channel. We return the head channel from the pipeline function so we have a reference to the channel to feed values into.
Pull out this template as needed. It’s on GitHub. Tweet at @MichaelDrogalis.
Few things are more satisfying than composing two solid libraries together to bring applications to greater heights. I’m elated to announce that thanks to the work of Dylan Paris, Dire now integrates with Slingshot.
The 0.5.0 release readily allows applications to catch maps thrown by Slingshot throw+ calls. Gone are the days when one was restricted to only catching based off exception types. More interesting is the resolution of the following situation.
In a typical try/catch scenario, application code must make decisions in the case of failure which exception type to raise up the stack. This complects application logic with error handling strategies, and often conflates recovery logic too.
In previous versions of Dire, the best that one could do in Clojure is to throw a n ExceptionInfo map up the stack, use a Dire hook to catch the exception type, and dispatch on a multimethod inside the handler. Sort of okay, but not great.
With 0.5.0, we can now dispatch based on predicates. This is tremendously powerful. It pulls error decision handling logic back in line with the Open/Closed principle. That is: application logic, the decision of how to react to failure, and the actual reaction to failure are decomplected. Applications can behave the Erlang way. Let it fail, they say! With enough context being raised up the stack, we can delay the decision about what the error actually is by using predicates, ultimately deciding much later in the processs of call chain unwinding.
Onward we work, constructing ever more simple programs.
Software development is fundamentally about taking things apart.
I’ve been taught this concept along with a handful of other core
principles by Rich Hickey and the Clojure community over the last few years. I wanted to put it all together in a medium for my own learning and as an example to others (especially non-Clojurians) about how to build genuinely simple systems.
It’s my pleasure to unveil the Rush Hour platform. Rush Hour exposes the facilities to create highly accurate vehicle traffic simulations. It ships with a rules language to describe chains arbitrarily complex intersections with traffic lights and stop signs. Additionally, it visualizes the simulation using a dynamic heat map drawn on Google Maps.
It’s architected in a way that pays homage to the way I’ve been taught to build systems. I chose to build a traffic simlation because it’s a domain that’s highly familiar to most people, which minimizes the amount of learning you have to do to understand what it *is*, and maximizes the amount of learning you can do about its underlying principles. I’ll now
describe Rush Hour’s architecture.
The big idea
Rush Hour is composed of three major components: the simulation (Sim), Asphalt, and a web service called the Triangulation service. It’s quite simple.
The Sim is a loop that transitions from values to values. This transition is computed in parallel using Clojure Reducers. A storage abstraction of a few small protocols sits in front of an in-memory data structure that acts as a datastore for the Sim to use. The schema for city streets, rules about traffic, and timing of traffic lights are housed in this data store. The Sim exposes the current state outside the loop over an agent. One observer of the agent watches and serves up changes over a websocket streaming API.
Asphalt is a client of the Sim streaming API. It receives values emitted by the Sim. Asphalt analyzes the values one at a time and uses the Triangulation web service to determine a set of coordinates that describe the location of all the cars on the road. Asphalt itself exposes a streaming API too. A ClojureScript program listens to Asphalt’s streaming API and draws a dynamic heat map of the cars on Google Maps.
And that’s it. Here’s a live demo of a few blocks in Philadelphia. Caution: the rendering of the heat map is somewhat intense. It’s not mobile friendly.
Let’s zoom in on why this is interesting.
The value of values
As usual, a system based around immutable values is the way to go.
The Sim is process that’s about transforming one snapshot of the world into another. That’s all it does. It accomplishes this by applying a pure function to a snaphot to produce another snapshot. It’s a plain Jane infinite loop.
Since it’s all immutable, we can parallelize the computation with Clojure Reducers. That means the Sim makes good use of being hosted on beefier machines. All the logic to transition states is pure, and hence easy to reason about, easy to test, and so on. No concurrent semantics, locks, promises, deals with the devil, etc. Recreating state between components is a breeze since it’s just data - no objects, custom types, connections, or any of that yuckiness.
Just like Datomic and Simulant, we move time out of the equation.
There’s no notion of time in the code that computes the next state. Time passage is simulated at the tail end of the simulation loop with a single sleep. That value can be adjusted to control the rate at which the simulation runs as compared to ‘real time’. Each time the clock ‘ticks’, every entity in the system gets an opportunity to change state. One clock tick per second runs the simulation in ‘normal time’. Making the clock tick faster results in time moving faster within the Sim. Clocks with nonconstant properties can be used - I chose not to do this.
The Sim also pushes out another form of complexity. There is no notion of addressing or coordinates. We get much better reach by describing streets as lines of a certain length, and car position as a point on the line. This enables one to describe fictitious streets. No need to talk to Google Directions API in this component.
In order to be useful, we need a tiny amount of mutable state. The agent that holds the current snapshot takes care of this. There is a little more mutability though. I made the design choice that 3rd parties be able to “inject” traffic into the Sim at runtime. To accommodate this, each street has a j.u.c. blocking queue associated it. Anything sitting on the queue gets pulled into the street just before the end of the transition function. I consider this very controlled mutability, though. It’s uniformly operated on and has tightly isolated scope.
Data all the way down
"Data - please! We’re programmers! We’re supposed to write data processing programs. There’s all these programs, and they don’t have any data in them. They have all these constructs you put around it, globbed ontop of data. Data is actually really simple. There’s not a tremendous number of variations in the essential nature of data." - Rich Hickey
Data is king. Rush Hour has a rather small code base for how much it accomplishes. This can mostly be credited to the aggressive use of data.
Rules are data
Descriptions of laws of traffic for each intersection are maps.
They can be created in any language, by a human or program.
Rules use unification for a declarative style of conveying the laws of traffic. This is extremely powerful, as it obviates conditionals that would otherwise run rampant throughout the program.
Schema is data
The lanes themselves and how they connect are also maps. This makes them amenable to static analysis by a tool in any language. Additionally, integrating with Rhizome to create graphical pictures of roads is a cinch.
Duration is data
Time is pushed out of the equation by transitioning purely from values to values. But - traffic lights don’t update uniformly across the city at each clock tick. This fact can be conveyed as data to keep its meaning evident.
Navigation is data
Descriptions of individual intersections are isolated to facts about itself. To connect the streets of one intersection to another, it’s all data all the day.
Directions are data
We want to give a realistic depiction of how people drive around the city - not choosing streets at random. This gets represented as a weighted map to bias choices about where to go next.
From another angle, since it’s all data, we get terrific reuse out of all the constructs mentioned above.
Another more subtle point is the style of testing that this sort of thing allows. While I didn’t employ it, the door is wide open to generative testing because it’s all well-specified data.
Taking things apart
"I think one of the most interesting things about design is that people think its about generating this intricate plan - but designing is not that. Designing is fundementally about taking things apart. It’s about taking things apart in such a way that they can be put back together." - RH
This project was an exercise in taking things apart. I sincerely believe that this is the most import concept to understand as a designer.
Just at the edge of the Sim’s boundary is the mutable agent that contains the latest snapshot. It’s here that we can add a component that does one thing well. The streaming API watches for changes to the agent and pushes values to consumers. It’s involvement with the Sim ends there.
Similarly, other components can watch the agent without getting in the way. Implementing a HornetQ queueing communication protocol is an open operation. The same goes for other communication protocols -
they all get grouped together in one spot. Communication protocols are another thing you can take apart.
There are more things that we make simples out of. From the domain, we take apart streets, lanes, lights, light sequences, traffic rules, directions, and intersection connections. From the solution side, we take apart storage choices with protocols, clock implementations, navigation algorithms, communication mechanisms, and visual maps. All decomplected.
To the extent that just about everything is immutable, we can cache relentlessly. Rush Hour uses Elasticache and Clojure memoization (in-memory caching). This dramatically decreases network traffic, to the point where when all caches are hot, very little data crosses the wire at all.
"The other great thing about about conveyor belts and queues is that.. What do they do? What’s their job? They move stuff. What’s their other job? There’s no other job. That’s all they do." - RH
Rich has been saying this one for years. Put queues between the major components of your architecture. The architectural agility gained from having independence in the identity and availability of communicating parties is huge. Rush Hour’s simulation walls off consumers by exposing a single agent, and opening up its streaming and (future) HornetQ API off of that. This lets developers make use of the Sim from outside Clojure. Language boundaries are transcended.
Queues clear the way for an open, pluggable system. Adding durability of snapshots via Datomic and monitoring via CloudWatch or Riemann (or both!) are tasks that require no modification to existing
code. The power of queues lets us add more consumers (at runtime!) to react to change. It’s really great - Rush Hour has tons of connection points to build on, giving it a large surface area. If you want an open system, you do it like this.
Human vs. machine interfaces
"But of course, we also start to see the levels, right? If you look at the back of one of these modules, there’s another piece of design there. And these … are analogue circuits that determine what the module does. The other thing that’s interesting … is that each of these knobs has a corresponding jack. In other words - there’s a human interface, and a machine interface. … And the machine interfaces were there all the time. In fact, they were first. And then the human interfaces come. … You can always build a human interface ontop of a machine interface, but the other is often disgusting." - RH
The machine interfaces for Rush Hour are in place. Those multithousand line EDN files serve as a great machine interface, and not a bad human interface. I wrote them by hand. I didn’t particularly enjoy it, but it was easy to reason about. The pieces are in place for a human-interface to be built ontop of the machine interface.
"If you have designs and they specify things well, and you have some automated way to go from that specification to a test - that’s good testing. Everything else is backwards." - RH
"Test systems - not functions." - Timothy Baldridge
The Sim uses scenario based testing, and does very little
unit level work. It’s the same idea that core.async uses for testing. It has hundreds of lines of code that don’t have unit tests, and instead uses overarching tests to verify behavior. We don’t care to test implementation details. It’s a brittle thing to maintain tests for. If you can write overarching tests that make it easy too figure out what’s broken when tests fail, you are golden.
Coltrane couldn’t build a website in a day
"There are people who can make music by waving through hands through the air." - RH
I think one of the more interesting ancedotes about this project is that in its 5 month development effort, the entire first month was spent on the hammock. I spent a lot of time in June laying in Rittenhouse park working out the complexity of the problem space. Coding didn’t proceed until I was finished taking things apart, and ready with a solution to put the pieces back together.
I was sort of forced into this period of hammock time. I moved to Philadelphia at the beginning of June and didn’t have internet for a few weeks. With my evenings completely free of distractions after work, I had time to diassemble the problem space. The hammock period was by far the most difficult phase, and bordered on excruciating at some points. It’s hugely frustrating to remain patient and work past the feeling of producing no code. That is, of course, merely the voice of insecurity tempting you to act early. Take the time to hammock. It sets the stage for an incredible show.
For a few months, I was a man possessed building this system. It’s not perfect (or perhaps even useful!), but I hope it serves an example for you to learn and teach others.
A huge thank you to a few people that helped me create this. Timothy Baldridge gave me plenty of guidance in building the Sim. James Drogalis did the math for me to compute coordinates based off of simulation data.
Tweet at @MichaelDrogalis or email at mjd3089 at rit dot edu, or mdrogalis on #clojure IRC. There’s not much in the way of documentation, because I honestly don’t expect anyone to use this project so much as use it as a guide for learning. If you want to build on it, I’d be happy to hook you up with some docs.
When I’m debugging, I really dislike having to type println’s or prn’s of incoming arguments to functions and return values to see what data is flowing through my program. So I wrote a very light weight library that has a function to be employed at the repl to do it for me. I call it Night Vision:
See the GitHub page for getting set up. It’s a 20-line library, so forks to make it better are easy and welcome.
Consider the following behavior (function ‘make-key’ not shown):
Can you explain this behavior?
Debugged in about 3 minutes. First 2 minutes were making sure my eyes weren’t playing tricks on me.
I authored Dire about 8 months ago. It’s been pretty successful since then.
But, I’ve had to do some logging lately that even Dire wasn’t very good at. Consider this code snippet:
Well, when I sent it in for code review to my teammate, he responded “We’re going to want some logging here. Echo what we’re executing, and echo the result.” Sure, those things are important to know:
Yuck! Am I stuck? It appears so. I can’t write a hook around the beginning of the function, nor the end. Logging right in the center, where there should only be application logic goodness.
And then, it hit me. Even I, the author of this library, didn’t see it until now. You can hook other people’s functions! Add logging, preconditions, and exception handling to third party code without touching it:
Beware that those ‘!’ hooks alter the var root of the function. Use the supervisory functions to not alter the original function.
Tweet me with your thoughts at @MichaelDrogalis.
When solving a concurrent problem in Clojure, it’s often useful to add a watch to a piece of mutable state. When the state changes, we react - typically with side effects.
More than a handful of times, I’ve wanted to watch for the first change only. After I see the first change, I don’t want to watch the piece of state anymore. An elementary-school example:
Notice how the remove-watch is the first thing to get invoked in the callback. It should do what I described above - I only want to see one change. Hmmmm.. Got’cha! Clojure dispatches watches, potentially in parallel, post-transaction.
If you fire off many changes to the teacher ref on different threads, throw-spit-ball-at will get invoked more than once. This can be a disaster for highly concurrent programs (e.g. no sleeping inbetween actions, maximum thread interleaving)
How do we fix this? Wooooooosh core.async to the rescue!
We use a channel to communicate between the reactive watcher and the side-effects we want to create.
I put “true” on the channel - you could put any value. In the go block below, we consume an element off the queue. Side effects are performed once, and I’m done watching the reference.
Boom. Problem solved.
In the middle of May 2013, I gave a talk for the Society of Software Engineers about Clojure, time, and perception. This is my attempt to condense Rich’s compelling arguments of reasoning about software to a strongly object oriented crowd. I hope it can serve the same purpose and persuade others to Do the Right Thing.
I made a small number of speaking errors during this talk. Please feel free to reply with corrections as I don’t enjoy listening to the sound of my own voice to find them.
These ideas are not necessarily 100% reflective of Rich’s, and I may have accidentally or purposefully deterred in some areas. Consult the original conference talks that I mention where ambiguity arises.
Also, apologies for the absence of video.
I’ve been doing some radically different design after creating Dire a couple of months ago. My stance on local exception handling departs from what almost anyone else is doing right now - and there’s been some interesting repercussions because of it.
The style of design that I’ve been pushing is one of airtight separation of concerns. Clojure’s runtime capabilities have allowed me to transparently alter control flow for exception handlers, assertions, and hooks. The beauty is that all the machinery to perform the aforementioned logic can be grouped into their own files - and loaded only when you want them. It’s like dependency injection at a far more flexible level.
After a few weeks of applying these concepts to a larger code base, I’m noticing an interesting phenomena at the code level. Rich Hickey noted that design is about taking the problem apart as much as possible. My conjecture is that when you’re aggressive about dismantling the problem space, the code in each module will display similar visual characteristics.
Here are excerpts of some files in the (closed source) project I’m working on. I unimaginatively changed the functions to be about cows and chickens - but you get the point. We’re concerned about structural appearance - not what the code actually does.
This first one is a file whose only concern in the world is logging. Notice how all the forms pretty much look the same.
Here’s a bit of a file whose only concern is reacting to thrown exceptions. They all look pretty similar!
Yet another instance - a file whose only job is to strip information off incoming API requests and pass them along for further destructuring:
Last example. All of the Datomic queries get their own file:
All of the functions in each file are participating at the same level of abstraction, and hence look roughly the same. There’s no surprise conditionals or loops any of them. They all flow together, and can be reasoned about with ease. This is a serious reduction in cognitive complexity.
If I could make an analogy, it’s a bit like a pipe organ. All of the pipes have roughly the same shape, but some of them are of different length and width. The groups of pipes correspond to the Clojure files. Each pipe represents a function. Very similar to those around it, but just a little different to make it useful.
Keep in mind, these were excerpts from a few files in a much larger project. I’m having a blast with this code base, and I think what I’ve pointed out is a green light that the design is solid.
To summarize, my conjecture is this: Thoughtful abstractions in design will lead to code that is visually similar in each module. If you notice this in your code base, you’re probably doing a great job with the design.
Hit me up at @MichaelDrogalis with your thoughts. (It’s a conjecture - I could be totally wrong!)
Also, give Dire a try for logging in your system. I’m a little biased since I’m the author!