Friday, 1 August 2008

The Product based build

Anyone who has worked on a project of significant size would have grappled with maintaining a build which continuously spins out of control. It feels like targets become confused, randomly change or seem to be repeated multiple times with only slight variations: "run", "run-without-tests", "run-wth-tests", "redeploy-run" etc. etc. It doesn't take long for the build to become unwieldy.

People who use the xAnt family or others with XML based languages such as MSBuild will have felt the pain more than those using dynamic language libraries such as Rake as XML is predisposed to becoming quickly unwieldy, is difficult to refactor and expensive to extend (basically XML sucks as a programming language). Yet, XML or not, most build frameworks share a common flaw: they are task based. The build becomes defined in terms of a set of steps to be executed and in which order and fairly soon you find you want subtle variations in those steps which result in more tasks.

But builds aren't linear; we don't always want to run the same dependent steps to achieve the same end result which is why we end up with masses of subtly different targets. One approach to reduce this is to remove dependancies from targets and have high level targets which simply list a set of steps. This can lead to flexible systems, as targets can easily be stitched up arbitrarily from the command line as needed, but it requires an intimate knowledge of how the build needs to be executed (which breaks encapsulation) and moves even further down the road of defining a list of steps.

So how should a build work? In this post I hope to outline what I believe to the be core metaphors and functions of an effective build system.

Product based
If we go back to basics and look at the purpose of a build we find it is to produce specific products: a deployable artifact, verification (by running tests), a deployed application, installers etc. Instead standard build tools promote thinking in terms of linear dependent tasks, for example creating an installer would be defined as "clean, compile, test, create-installer", the build is described in terms of steps, most of which say nothing of the products created.

A product based build is different: the build is defined in terms of products which are built by producers (or factories to use the GoF pattern). Products represent the artifacts of the build (binaries, installers, deployed application) and producers contain the logic on how to assemble the products.

Dependancy based
Similar to task based builds, product based builds are dependancy based. Where product based builds differ is they view the build process as a factory line where each product is produced from other dependent products. For example the "installer" product has a producer which packages binaries from the "verified-binaries" product which has a producer which runs tests from a "binaries" product which has a producer that compiles the code from the "fresh-destination" product which has a producer which creates new, clean directories.

Describing dependancies in terms of products rather than tasks creates a more descriptive build where each product can express its dependancies in a specific way, for example an "acceptance-verified-binaries" product produced from "unit-verified-binaries" expresses a different set of rules from being produced from "binaries".

Encapsulated
When we run the build we shouldn't have to understand the steps required to reach the end result but because of the nature of task based builds they often require an intimate knowledge of their internal workings: for example "run" may depend on "clean, compile, test" but "run-again" may have no dependancies but instead depends on a previous build. By focusing on products this thinking is broken. As an example consider the following: in a task based build you can retrieve a set of targets, with their descriptions, from the command line. To begin to use the build you must understand how the targets relate and what steps are being executed for each target, and from the descriptions (or the task names) work out what the end result may be. A product based build would give you a list of products but not the steps required to produce them. This allows the user to quickly understand the goal of the build and how to achieve the desired product without requiring any knowledge of how it is produced.

Non-repetitive
Task based build systems are prone to repeating already satisfied dependancies, for example if you ran "build compile, create-installer" and the "create-installer" target had a dependancy on "compile" most build systems would re-execute the "compile" target wasting your time. To work around this you must create more tasks and again require an intimate knowledge of the build. Instead when running "produce binaries installer" the product build would be satisfied that it had already produced binaries when it requires them for the input of the installer.

Order independent
As task based builds execute as a series of steps they are order dependent. Running "build compile test" has different results from "build test compile" (which would either fail or give invalid results depending on whether a previous build had compiled). As a product based build focuses on products and not steps it is order independent so specifying the following would give equal results:

  > produce binaries verification
  > produce verification binaries

Presuming that the "verification" product relies on the "binaries" product with the first case the build system would re-use the one produced from the already executed "binaries" request and in the second case it would be satisfied that during the production of the "verification" product it had already been produced.

Polymorphic
By focusing on products rather than tasks a product based build system can become polymorphic. This allows for a very flexible build removing the need for the subtle variations on targets which plague task based builds. For example you could define a product of "running-application" which is produced from the "binaries" product. The "binaries" product could be inherited by two other products "verified-binaries" and "unverified-binaries". By giving the "running-application" product an input of the base "binaries" product it can be produced from both "verified-binaries" and "unverified-binaries". This allows the user to easily control the build:

  > produce verified-binaries running-application
or
  > produce unverified-binaries running-application
or (as we are order independent):
  > produce running-application verified-binaries
or (which would use the default binaries product)
  > produce running-application

We can also extend "verified-binaries" to "unit-verified-binaries" or "acceptance-verified-binaries" etc. creating support for a build pipeline, or create a "reused-binaries" product to avoid recompiling. By allowing a build to become polymorphic it becomes both highly flexible and intent is clearer.

Stateful
Though not absolutely critical state is something most build systems lack. Every time a build is run the results of the previous build are completely discarded and any valid products are re-created. State is a difficult concept to implement in task based builds as tasks would have the responsibility of deciding whether the artifacts are still valid, this is further complicated by the fact that multiple tasks may depend on the same artifacts and would therefore have to make the same decision again. In a product based build each product can decide whether it is valid and the build system would simply reproduce it if it isn't. Furthermore the build system can analyze the product line and understand that if a product further down the line (say binaries) is invalid then all products produced from it need to be reproduced.

To conclude by changing the build metaphor from tasks to products we can solve many of the issues around build systems and create clearer, more flexible and more meaningful builds for our applications.

1 comment:

amisai said...

Very interesting post.

As I was reading through, I was thinking in maven pom.xml as the descriptor of a product, but I believe your idea goes further. I hope you keep thinking about this and we can see results.

About Me

My photo
West Malling, Kent, United Kingdom
I am a ThoughtWorker and general Memeologist living in the UK. I have worked in IT since 2000 on many projects from public facing websites in media and e-commerce to rich-client banking applications and corporate intranets. I am passionate and committed to making IT a better world.