Jupiter Moonbeam & the Geeks from Cyberspace: 2008

Monday 1 December 2008

Examples as tests

We often talk about tests being documentation but what about using tests for usage examples for APIs? It's something I rarely (if ever) see and yet tests are far more succint and effecient at communicating behaviour.

To demonstrate this I am going to take a very simple function from MSDN on the C# ++ operator . Here is the extract of MS' example:

The increment operator (++) increments its operand by 1. The increment operator can appear before or after its operand:
RemarksThe first form is a prefix increment operation. The result of the operation is the value of the operand after it has been incremented.
The second form is a postfix increment operation. The result of the operation is the value of the operand before it has been incremented.
Numeric and enumeration types have predefined increment operators. User-defined types can overload the ++ operator. Operations on integral types are generally allowed on enumeration.Example
// cs_operator_increment.cs
using System;
class MainClass
{
    static void Main()
    {
        double x;
        x = 1.5;
        Console.WriteLine(++x);
        x = 1.5;
        Console.WriteLine(x++);
        Console.WriteLine(x);
    }
}
Output
2.5
1.5
2.5

Although we are accustomed to these types of examples have a think about how hard you have to work to undertand it. The example on it's own gives no indication of what behaviour it is demonstrating (which is explained in complex english in the preceding text) and the call and the result are nowhere near each other - how often do my eyes have to refocus (look at the first Console.Writeline, then the first line of the output, then the explanation, match them up, look at the second Console.Wr... phew)? And this is just a simple example with a few behaviours.
Here is the same thing using tests (I've tried to closely match MS' own text for test names):


using NUnit.Framework;

[TestFixture]
public class IncrementOperatorBehaviour
{

    [Test]
    public void IncrementAsPostfixIncrementsValueByOne
    {
      double x = 1.5;
      x++;
      Assert.AreEqual(2.5, x);
    }

    [Test]
    public void IncrementAsPrefixIncrementsValueByOne
    {
      double x = 1.5;
      ++x;
      Assert.AreEqual(2.5, x);
    }

    [Test]
    public void IncrementAsPrefixGivesResultAfterIncrement
    {
      double x = 1.5;
      Assert.AreEqual(2.5, ++x);
    }

    [Test]
    public void IncrementAsPostfixGivesResultBeforeIncrement
    {
      double x = 1.5;
      Assert.AreEqual(1.5, x++);
    }
}

How much clearer is that? You even have the satisfaction of seeing it all go green when you run them!

Sunday 23 November 2008

Agile welcomes friendly aliens

It is a common complaint amongst agilists that teams label themselves as Agile simply because they do all the practices such as TDD etc. yet they are not what they would call agile. They argue that the practices alone aren't enough, that there is something else, something unquantifiable which agile is all about: the so called 'spirit' of agile, the 'people factor', or to quote "[it] may look the same as an agile methodology, but it won’t feel the same”. What they are trying to describe is a lack of alienation.

Anyone who did philosophy or sociology - or was simply a sixth form socialist revolutionary - would probably have looked at Marx's theory of alienation. Whilst most people tend to think of Marx's economic theories the problem of alienation was of primary importance to Marx; to quote Ernesto "Che" Guevara:

"One of the fundamental objectives of Marxism is to remove interest, the factor of individual interest, and gain, from people’s psychological motivations"

Marx believed that all humans are naturally motivated by the creation of value and self development, thriving on challenges, regardless of class or situation. For example, take the stereotype of a football obsessed couch potato in a dead end job. Despite lacking in motivation in their work they seek pleasure from a study of football which is shared and developed with like minded friends. This study is so intense and focused it would put the most dedicated of academics to shame. How comes, under his own steam he can achieve so much yet is so despondent in his employment?

Marx argues the reason is that capitalism's methods of maximizing productivity actually create a system which alienates the worker and therefore fails in its aims. Inversely an environment without alienation is productive and produces value for all.

Marx described four types of alienation in labour:
1) From our product: we are separated from what we produce and its value.
2) From our productive activity: the product is outside of our control so work simply becomes a meaningless activity.
3) From other human beings (or other workers): we are isolated from each other, made to work independently, despite the fact that productivity is social in nature.
4) From our species being (or species essence): we are forced to specialize which removes the freedom to develop our capabilities therefore we loose stimulation and fail to be challenged.

Agile and Lean (though I am not sure if intentionally) recognize the problems of alienation and have strategies for removing them:

From our product
The importance of our role in providing value and creating a successful product are fundamental to Agile and Lean. Many practices focus directly on reducing the gap between the team and the end product such as keeping the customer close, delivering regularly and maintaing constant feedback. AgileLean is also focused on our role in creating value rather than simply meeting a set of personal objectives.

From our productive activity
AgileLean places us in control of the process and the system. Lean emphasis the importance we all have in the productive activity and empowers us to improve the process and eliminate waste. In Agile we are responsible and involved in all productive activities from planning and estimation, to picking up stories, to delivery and improving the agile process itself.

From other human beings
Agilists have always been keen on the social nature of development believing it to be imperative to the success of the project. Communication, knowledge sharing and working together is actively encouraged with techniques such as pairing, stand-ups etc. AgiLean environments are made up of people working closely together as a team rather than individuals in a group.

From our species being:
AgileLean is often critical of specialization. Instead agilean prefers cross-functional teams made up of multi (highly) skilled generalists. This creates an environment were people are challenged and are free to develop, on a day to day basis, in the directions they choose and not reduced to the criteria their performance management chart dictates.

The S and M labels often make people feel nervous, in some cases to the point of rejecting ideas completely because of the connotations. Socialism isn't short of its critics but agile and lean prove how we can address some of the most important concerns Marx had with the capitalist system in a way that makes business stronger (just look a Toyota's huge success in a very aggressive capitalist market compared to its more communist fearing cousins in the States). The success of agile and lean prove how much more productive we can be when we tackle the problems of alienation. This is why many of us find it so frustrating to see teams labelled agile who still operate in an alienated way.

Agile and lean teaches us that teams who recognize alienation and address it are ultimately more successful. By explicitly seeking to address alienation we can create healthier working environments which offer greater value at a higher rate of productivity. Now what right minded capitalist wouldn't want that?

Wednesday 19 November 2008

That's right: blame Agile

Before we start a small warning. This is a rant and as such contains certain properties common to rants such as emotionally overstating unimportant points, glossing over and confusing important ones, over decorative language and taking the odd innocent as victim. Bearing this in mind please remember though serious my tongue is sometimes placed firmly in cheek for dramatic effect.

There has been a great debate happening over the airwaves of the blogosphere recently. Sparked of by James Shore's post The Decline and Fall of Agile it has gotten to some people declaring the death of agile and others calling for the rejection of the term altogether [link]. It's as if the Nazi party has just announced that the Third Reich was built on Scrum and suddenly everyone is trying to distance themselves: "no we never agreed with that agile stuff, what we were talking about is something completely different".

Around the same time another debate is picking up steam started by Roy Osherove's post Goodbye Mocks, Farewell Stubs. Apparently people find testing too difficult to grok so we need to reject terms like mocks and stubs ('cos though accurate they are apparently confusing) and replace it with another: Fakes. Oh the glaring irony as one side of the movement is rejecting marketing terms because too many people who use it don't know what it means while the other side try to introduce more marketing language to attract the people who don't know what it means.

Uncle Bob has dropped in like Batman at the Jokers party and delivered his direct, powerful blow: "the problem is you're all f*ing lazy". Although I feel that Uncle Bob is much closer to the truth than many other commentators he's not really offering a realistic solution out of the mess.

So what exactly is the problem we've got here? Both of these posts have something in common. James Shore's issue stems from teams picking up words like Scrum and Sprint and forgetting that they have to write decent code as well. Roy Osherove's post stems from the fact that getting teams to even write tests is met with constant failure. Both posts talk about a general lack of engineering skill in IT. In reaction different parties are offering up different solutions but they all fail to address the core issue.

For those jumping the agile ship and getting onto the next coolest speedboat what exactly is that going to prove? Rather than improving the industry it leaves it helpless and sinking. This is a rotten attitude that resolves nothing: when every PHP and VB programmer converts to Ruby what you gonna do then? Oh it's OK Ola's writing Ioke and if that doesn't work we've always go Lisp (they'll never get that). Roy Osherove's solution of lowering barriers to entry is more commendable but will achieve just as little and is guilty of another great crime: dumbing down (though he's spot on about tool support). I don't think the solution to poor engineering skills are to withdraw important technical terms: can you imagine the Bar rebranding legal terms because, y'know Lawyers just find it too difficult to understand this stuff or Doctors calling for less names for the organs 'cos med students these days just can't get the point of having the terms stomach and intestine and they keep operating on the wrong one. Why can't we just call it the food processor? Jay Fields' novel idea is to ask them all to politely leave (50% of them in fact). And if they don't? Well then I guess we'll all get together and go around with a big seal club and sort it out that way (it's what I refer to as "The Great Culling" - there's those Nazi's marching past again).

The real problem, the root cause to all this is simple (and on this most agree especially Uncle Bob and Jay): the gross under valuing of the skills which enable you to do your job. The thing is it's nothing new. It was there before agile and TDD had the impact they do today (they just laughed then). Remember those not so long ago days when people thought the solution to delivering software wasn't skills or focus on quality but getting some glorified secretary to type up several hundred pages of requirements and then handing them to some supposed genius to design everything for the monkeys to punch in? Well we're still there people: it's just they thought the solution to the mess that disaster caused was to do something called iterations, standing up and getting the monkeys to skip around the genius guy wrapping him in his own ribboned UML diagrams chanting "architect, architect, architect".

Nothing's changed, agile or not. The majority of the industry still doesn't value the basic, fundamental skills it takes to write software of acceptable quality. It didn't before and it still doesn't now. Instead it's obsessed with solving the problem by bringing in the right highly paid manager with the right powerpoint presented methodology. Until the industry gets that ain't the way it will drag every shining beacon of light (agile, Ruby whatever) down into Hades with it.

Writing software is about people People! That's what agile and Lean and all their friends tell you again and again. The fundamental principles are about people, not iterations, continuous integration and story boards. Go check out the Agile Manifesto and tell me where you see any of that stuff? This bastard form of agile which only talks about tools and processes is like someone got the manifesto and spun it round (People OVER Process read it Again). The practices and tools are there to actively encourage developers who are skilled, committed and know what they're doing and make them more empowered and productive and ultimately deliver MORE value and success, they are not there to replace them.

So what is the solution? The simple truth is that those of us who understand the problem have got to keep doing a great job, we've got to help those who want to do a great job but aren't sure how to do a great job and we've got help those who don't understand how to do a good job and may never will and may not even want to. Sure it's depressing sometimes, it's frustrating but we can't just give up on the industry and jump away on the next term or simply dumb the whole thing down so the next generation can get it a little more but ultimately still fail to get it.

Agile was never, is never going to be an quick, faultless success forever preserving an eternal purity. Our industry is sick, real sick. It's a beer guzzling, chain smoking, couch potato who's been told by his doctor to eat healthily, give up smoking and booze and get some exercise but instead just swallows the agile pill and hopes that's enough to save his sorry fat arse. And surprise surprise when the casualties start coming in their gonna blame those pills.

We have generation after generation bought up on bad practices and incorrect assumptions (smoking isn't bad for you, exercise gives you heart attacks) and what's worse most of them don't even know how bad things are: they think this is NORMAL (sadly because IT IS), they've adjusted to a certain degree of failure. IT projects are like lifts: sure your pissed off there broken down but your not surprised, you even kinda expected it.

The mess is big and it's gonna take years to sort it out and a lot of hard work. And yes, along the way people are going to buy the agile weight loss pill from some dodgy site on the Internet without seeking professional advice, get it wrong. And why? Because they're desperate!

It was totally delusional to think agile was going to result in everyone suddenly rejecting their bad habits and come crawling on their knees and go XP Kosher. And when that didn't work well there's always the prophesy of the True agile warriors standing in the post-apocolyptic world looking down on the dying as the last breath of those Tragic Lost sigh the now infamous regretful words "we wish we'd read our Domain Driven Designs". Well don't worry because to become agile all you have to do is close your eyes and say "please Kent forgive us our hacks and deliver us not into Waterfall" and mean it, really mean it, and wait, above the mountain there what's that? is it who I think it is? It is: the heavens are open and Martin Fowler is ascending into them.

Let's get real here: developers who have never come across TDD before, who've never experienced it's value first hand, who've spent years doing things a certain way are struggling to grasp the concept and oh sorry that's surprising why? I guess you were doing TDD from that first Hello World in Pascal? And those people who go in and try Scrum for the first time and find they are riddled with technical debt because gave into pressure and didn't invest in quality. Now stand up and be counted all you who never made that mistake.

I'm not saying that we should just let these things go, hold our noses, ignore the smells and step over the shit. Quite the opposite. But we've got to accept that we're not going to walk into a room of professionals and get them to convert overnight (or year or possibly even decade). How many years did it take before doctors accepted germ theory and started washing their hands?

Though there is hope: the most effective weapon we have is success. By being successful and then telling people about why and how we were successful. That's how this whole agile thing started, that's how it built up it's great reputation and that's how it's going to survive and get better. The more success the more people will start to realize that what they're doing now isn't as good as they think. Right now not testing, not caring about quality, high technical debt: that's the norm, that's expected. Keep being successful and people will start asking the others those awkward questions: why does this application crash when my last team were virtually bug free, why does it take a weekend to get a release out when my last team did it in a few minutes? Why does it take six months to implement a new feature when my last team took a week? What do you mean there are ramifications and complications? The more we deliver quality software the less room people will have to worm out of those questions and then the tipping point will come.

Let's not be mistaken, it's going to be hard work: but isn't that the whole point? Writing software is hard, agile doesn't change that and at first TDD certainly doesn't. Yes it's frustrating and somedays you really want to scream and kick in the can but what's the alternative IT ghettos sucking in grads and with no way out?

If, in the years to come, we want to work in a different industry then we've got to take some responsibility for helping create it.

Thursday 6 November 2008

Make me Whole

This is one of my favourite refactoring patterns and one I find I use a great deal so I thought I'd share it. The basic premise of this refactoring is to encapsulate logic into its own class using small, easy steps. The trick is each step should always leave the code in good working order i.e. the tests should always pass.

To help explain I'm going to walk through the refactoring by encapsulating some simple procedural logic and by making the concept explicit by giving it it's own class. Here's our starting point:

  def say_hello()
    if @friend == "Jeff" || @friend == "Robert" do
       put "Hello my good friend #friend"
    else
       put "Hello #friend"
    end
  end

The first thing we notice is there is some behaviour which depends on the state of friend (is the value "Jeff" or "Robert"). This is a good sign that there's a concept called Friend which needs encapsulating. So here's how:

Step One: Extract method

The first step is easy: extract the logic into it's own method like so:

  def say_hello()
    if is_good_friend do ...
  end

  def is_good_friend
    @friend == "Jeff" || @friend == "Robert"
  end

This is good, now the code is clearer to read and we've encapsulated the logic (albeit only in a method). That's already a big improvement.

Step Two: Introduce parameter

The problem is our method is dependent on its class for the @friend variable. We need to sever this connection so the method can stand on its own two feet. The simple way is to pass friend through as a parameter.

  def say_hello()
    if is_good_friend(friend) do ...
  end

  def is_good_friend(friend)
    friend == "Jeff" || friend == "Robert"
  end

That's better: is_good_friend is decoupled from the class. Its free!

We still need to get that method onto a new class. The problem is it can't be done in one easy non-breaking step. So we're going to have to take a tiny intermediate step to get there:

Step Three: Make method static

By making the method static it severs all connections to the original class: it is explicitly non-dependant on any instance variables the original class has.

  def say_hello()
    if self.is_good_friend(friend) do ...
  end

  def self.is_good_friend(friend) ...

Making the method static is also a good safety check to ensure that you've not got any references to the original class and it is properly isolated before moving it.

Step Four: Move method

Now the method is static move it to the new class:

  def say_hello()
    if Friend.is_good_friend(friend) do ...
  end

  class Friend
    def self.is_good_friend(friend) ...
  end

Excellent, we're really getting somewhere! Though now we've got a static method sitting on a helper class. Yuk: let's get that sorted before someone copies us and that nasty anti-pattern starts to proliferate through our code base:

Step Five: Create instance

This is where the real magic starts to happen: we create an instance of our new class and use it to store the value of friend:

  class Friend
    @value

    def self.is_good_friend(old_friend)
      friend = Friend.new(old_friend)
      friend.value == "Jeff" || friend.value == "Robert"
    end 
  end

Still, this is a bit ugly. What we really need to do is get the client to do the work of creating the instance.

Step Six: Introduce parameter

Rather than have our static method new up Friend get the client to do it and pass it across:

  def say_hello()
    if(Friend.is_good_friend(@friend, Friend.new(@friend)) ...
  end

  class Friend
   ...
    def self.is_good_friend(old_friend, friend)
      friend.value == "Jeff" || friend.value == "Robert"
    end
  end

Step Seven: Remove parameter

The old_friend parameter is redundant and ugly: let's get rid of it!

  def say_hello()
    if(Friend.is_good_friend(Friend.new(@friend)) ...
  end

  class Friend
    ...
    def self.is_good_friend(friend) ...
  end

Excellent, this is starting to look a lot better now. Though look at that repetition: the static method on Friend passes an instance of it's own class! Only one way to sort that out:

Step Eight: Make method non-static

Now the beauty of this is that if you are using a modern IDE it will work out that the method is being passed an instance of its own type and it will magically do this:

  def say_hello()
     if(Friend.new(@friend).is_good_friend) ...
  end

  class Friend
     def is_good_friend
         @value == "Jeff" || @value== "Robert" 
     end
  end

Brilliant: Friend is now nicely encapsulated and ready to grow into a responsible object with a bright future. From here we can start pushing more behaviour onto the object and eventually make @friend reference an instance of the Friend class.

Overall the pattern is very simple: extract the logic into its own method then work to detach it from the class by making in static, then simply move the static method and reattach it to the new class by returning it to an instance. It is also possible to handle more complex behaviour with several variables involved, simply pass them through as parameters and then push them onto the new instance.

Although this example is simple the pattern can be repeated again and again to build up classes into a number of methods (sometimes these classes can again be refactored into smaller classes still).

Saturday 25 October 2008

if(conditional) push.down(to.origin)

Conditionals (ifs, switches) are a plague in many code bases and although there are campaigns to prevent their spread many developers are unclear on how to control these pests. Conditionals are one of the most prevalent, yet tolerated, sources of repetition and bugs. Why? Well let's start with one simple yet common example:

if(thing.is_valid) do
 other_thing.do_something_with(thing)
end

This sort of code spreads like a disease because every time the client calls do_something it must ensure it checks for validity. If we look closer at the object in use we find that calling do_something in an invalid state not only requires the same conditional but also throws an error:

def do_something_with(thing)
 raise "Thing must be valid" if not thing.is_valid
 // do lots of wonderful things with a valid thing
end

The repetition is not only in the client code but in tests:

def calls_other_thing_when_thing_is_valid
 mock_thing.expects(:is_valid).returns(true)
 mock_other_thing.expects(:do_something).with(mock_thing)
 ...
end

def does_nothing_when_thing_is_invalid
 mock_thing.expects(:is_valid).returns(false)
 ...
end

The simplest solution to remove the glut of these conditions (in terms of lines of codes, not calls) is to push it down off the client and to the object who knows his state. If we go back to the tests and use BDD we find that we would rather define the interactions as such:

def tells_thing_to_do_something_when_valid
 mock_thing.expects(:when_valid).with(block)
end

Thing now takes a block (in ruby), delegate (in C#) or anonymous class (in Java):

thing.when_valid({other_thing.do_something_with(thing)})

def when_valid(&block)
 &block.call if valid
end

Now the conditional is controlled by the object responsible for its state: we've removed an Ask and replaced it with a Tell. This is a vast improvement, all the repetition in the clients has gone and the chances of bugs due to accidental calls when in an invalid state are removed.

This presumes that Thing doesn't really do much but chances are it's got lots of behaviour and the chances are that behaviour changes based on being valid or not. Of course all the methods could delegate to its own when_valid method but I bet you also need a when_not_valid and oh boy that code's getting difficult to read. Also, the repetition, in terms of lines of code may be lower but how many times are conditionals executed during run time?

class Thing
 def when_valid(&block)
   &block.call if valid
 end

 def other_method
  if valid do
    // do some stuff
  else
    // do some different stuff
  end
 end
end

The conditional can still be pushed further down right to its origin: when Thing changes between a valid and invalid state. This can be achieved by using the Replace Conditional With Polymorphism refactoring to employ the state pattern:

NOTE: I want to keep this example clear for developers of other languages though there are cleverer ways of achieving this in Ruby

class Thing
 def validate
   valid = // check if valid
   @state_behaviour = valid ? Valid.new : Invalid.new
 end

 def when_valid(&block)
   @state_behaviour.when_valid(&block)
 end

 def other_method
   @state_behaviour.other_method
 end

 class Valid
   def when_valid(&block)
     &block.call
   end

   def other_method
      // do some stuff
   end
 end

 class Invalid
   def when_valid(&block)
      // ignore
   end

   def other_method
      // do some different stuff
   end
 end
end

Now there is only one point where the conditional is called and all of the behaviour for each state (valid or invalid) is in one easy to read place. Of course, this is just a simple example so just imagine what it would do for more complex solutions (and don't forget: the same can be applied to switches).

Friday 24 October 2008

NOJOs are no gos

Ivan Moore gives a specific example (and term) to Faux OOP anit-pattern the NOJO.

Friday 17 October 2008

Pynchon's Theorem

Here's a bit of Agile wisdom from Thomas Pynchon's postmodern classic Gravity's Rainbow (page 275 in the Penguin edition):

And yet, and yet: there is Murphy's Law to consider, that brash Irish proletarian restatement of Goedel's Theorem - when everything has been taken care of, when nothing can go wrong, or even surprise us...something will. [...] when the laws of heredity are laid down, mutants will be born.

Thursday 18 September 2008

Faux OOP

It may look like OOP and be written in an OOP language and it may have some of the characteristics of OOP such as classes and inheritance but it may, in fact, be Faux OOP.

Fortunately Faux OOP is easy to spot as it is actually just procedural code organized into objects. Tell tale signs of Faux OOP are a separation of data and behaviour with anemic data 'objects' manipulated by stateless methods (or procedures) in other classes. Faux OOP is often littered with classes named with agent nouns (*Manager, *Helper etc.) or *Service which tend to be stateless (and often singletons or static classes) which hold common variables (better known as global variables). The rest of the code is built from data classes which consist mainly of getters and setters and no, or rudimentary, behaviour which are poked and prodded by their *Manager counterparts. The end result is a program consisting of tasks which act upon data structures and use basic program flow (ifs, switches, loops) to execute the tasks in order.

Faux OOP is an anti-pattern promoted by some of the biggest players in the industry: Microsoft's N-Tier architecture, popular with VB and early .NET promotes Faux OOP by prescribing a Data layer, Business Layer and Presentation layer. The presentation layer binds to data objects provided by the data layer which are then validated and actioned via the business layer. The SOA craze has also helped promote the procedural style where people have reduced their systems to procedural service calls.

Though be careful, you may think you have Faux OOP but really you've got Faux Procedural.

Faux Procedural
It may look like procedural because the program has methods and data structures but if you look closer it may be sequential code organized into methods.

Tell tale signs of Faux Procedural are long methods with no distinct separation into common tasks. These long methods often contain significant repetition and the heavy use of temporary variables. Common tasks are not abstracted into smaller, reusable methods and common groups of variables have not been abstracted into data structures.

Fortunately, with heavy refactoring, Faux Procedural can be refactored into Faux OOP and Faux OOP can be refactored into Real OOP(tm).

Friday 12 September 2008

Agent nouns are code smells

Class names ending in agent nouns are a code smell. An agent noun is:

any noun that denotes someone or something carrying out the verb's action, typically words ending in -er or -or

Classes with agent nouns are, the majority of the time, a sign of procedural thinking - especially agent nouns such as "manager", "helper" etc. As the definition says it denotes the class is carrying out the verbs action. This is contrary to good OOP where the verbs belong to the nouns themselves; classes which represent things are responsible for their own doings, not someone else. This reflects the real world where I am a Person able to do Programmer tasks, the verbs of programming are executed by me not some CodeProgrammer object which sits next to my desk.

Agent nouns are useful for describing roles which makes them good for interface names. A SpellChecker interface on a Dictionary class gives clear definition of the role (to check spelling) and allows the Dictionary to implement the verbs (check_spelling "word"), likewise a SynonymProvider sits well on a Thesaurus class.

So beware agent nouns; they are a language trick which fools you into believing that a class is a first class concept when really it's stealing someone's verb.

Tuesday 12 August 2008

Brain Rules

I stumbled across the Brain Rules website while browsing Presentation Zen.

It's a great site and goes through a nice little set of flash slides on how our brains work and how to maximize performance. I'm sure knowledge based workers (such as developers) could learn a few tricks from this.

Friday 1 August 2008

The Product based build

Anyone who has worked on a project of significant size would have grappled with maintaining a build which continuously spins out of control. It feels like targets become confused, randomly change or seem to be repeated multiple times with only slight variations: "run", "run-without-tests", "run-wth-tests", "redeploy-run" etc. etc. It doesn't take long for the build to become unwieldy.

People who use the xAnt family or others with XML based languages such as MSBuild will have felt the pain more than those using dynamic language libraries such as Rake as XML is predisposed to becoming quickly unwieldy, is difficult to refactor and expensive to extend (basically XML sucks as a programming language). Yet, XML or not, most build frameworks share a common flaw: they are task based. The build becomes defined in terms of a set of steps to be executed and in which order and fairly soon you find you want subtle variations in those steps which result in more tasks.

But builds aren't linear; we don't always want to run the same dependent steps to achieve the same end result which is why we end up with masses of subtly different targets. One approach to reduce this is to remove dependancies from targets and have high level targets which simply list a set of steps. This can lead to flexible systems, as targets can easily be stitched up arbitrarily from the command line as needed, but it requires an intimate knowledge of how the build needs to be executed (which breaks encapsulation) and moves even further down the road of defining a list of steps.

So how should a build work? In this post I hope to outline what I believe to the be core metaphors and functions of an effective build system.

Product based
If we go back to basics and look at the purpose of a build we find it is to produce specific products: a deployable artifact, verification (by running tests), a deployed application, installers etc. Instead standard build tools promote thinking in terms of linear dependent tasks, for example creating an installer would be defined as "clean, compile, test, create-installer", the build is described in terms of steps, most of which say nothing of the products created.

A product based build is different: the build is defined in terms of products which are built by producers (or factories to use the GoF pattern). Products represent the artifacts of the build (binaries, installers, deployed application) and producers contain the logic on how to assemble the products.

Dependancy based
Similar to task based builds, product based builds are dependancy based. Where product based builds differ is they view the build process as a factory line where each product is produced from other dependent products. For example the "installer" product has a producer which packages binaries from the "verified-binaries" product which has a producer which runs tests from a "binaries" product which has a producer that compiles the code from the "fresh-destination" product which has a producer which creates new, clean directories.

Describing dependancies in terms of products rather than tasks creates a more descriptive build where each product can express its dependancies in a specific way, for example an "acceptance-verified-binaries" product produced from "unit-verified-binaries" expresses a different set of rules from being produced from "binaries".

Encapsulated
When we run the build we shouldn't have to understand the steps required to reach the end result but because of the nature of task based builds they often require an intimate knowledge of their internal workings: for example "run" may depend on "clean, compile, test" but "run-again" may have no dependancies but instead depends on a previous build. By focusing on products this thinking is broken. As an example consider the following: in a task based build you can retrieve a set of targets, with their descriptions, from the command line. To begin to use the build you must understand how the targets relate and what steps are being executed for each target, and from the descriptions (or the task names) work out what the end result may be. A product based build would give you a list of products but not the steps required to produce them. This allows the user to quickly understand the goal of the build and how to achieve the desired product without requiring any knowledge of how it is produced.

Non-repetitive
Task based build systems are prone to repeating already satisfied dependancies, for example if you ran "build compile, create-installer" and the "create-installer" target had a dependancy on "compile" most build systems would re-execute the "compile" target wasting your time. To work around this you must create more tasks and again require an intimate knowledge of the build. Instead when running "produce binaries installer" the product build would be satisfied that it had already produced binaries when it requires them for the input of the installer.

Order independent
As task based builds execute as a series of steps they are order dependent. Running "build compile test" has different results from "build test compile" (which would either fail or give invalid results depending on whether a previous build had compiled). As a product based build focuses on products and not steps it is order independent so specifying the following would give equal results:

> produce binaries verification
> produce verification binaries

Presuming that the "verification" product relies on the "binaries" product with the first case the build system would re-use the one produced from the already executed "binaries" request and in the second case it would be satisfied that during the production of the "verification" product it had already been produced.

Polymorphic
By focusing on products rather than tasks a product based build system can become polymorphic. This allows for a very flexible build removing the need for the subtle variations on targets which plague task based builds. For example you could define a product of "running-application" which is produced from the "binaries" product. The "binaries" product could be inherited by two other products "verified-binaries" and "unverified-binaries". By giving the "running-application" product an input of the base "binaries" product it can be produced from both "verified-binaries" and "unverified-binaries". This allows the user to easily control the build:

  > produce verified-binaries running-application
or
  > produce unverified-binaries running-application
or (as we are order independent):
  > produce running-application verified-binaries
or (which would use the default binaries product)
  > produce running-application

We can also extend "verified-binaries" to "unit-verified-binaries" or "acceptance-verified-binaries" etc. creating support for a build pipeline, or create a "reused-binaries" product to avoid recompiling. By allowing a build to become polymorphic it becomes both highly flexible and intent is clearer.

Stateful
Though not absolutely critical state is something most build systems lack. Every time a build is run the results of the previous build are completely discarded and any valid products are re-created. State is a difficult concept to implement in task based builds as tasks would have the responsibility of deciding whether the artifacts are still valid, this is further complicated by the fact that multiple tasks may depend on the same artifacts and would therefore have to make the same decision again. In a product based build each product can decide whether it is valid and the build system would simply reproduce it if it isn't. Furthermore the build system can analyze the product line and understand that if a product further down the line (say binaries) is invalid then all products produced from it need to be reproduced.

To conclude by changing the build metaphor from tasks to products we can solve many of the issues around build systems and create clearer, more flexible and more meaningful builds for our applications.

Sunday 27 July 2008

Decisions, Decisions, Decisions

During any development project thousands of decisions will be made. Every individual in a team will make large numbers of decisions whether about details such as variable names or cross cutting concerns such as architecture. The success or failure of a project is the sum of all these decisions.

There are two key factors in making decisions: information and risk. The more information we have the better informed the decision is and every decision made - and not made - carries a risk. Empirical methodologies (such as Agile and Lean) have contrasting methods to decision making than up-front plan based methodologies such as waterfall. Fixed-plan methodologies attempt to manage these factors by collecting as much information as possible and then committing to decisions before implementation starts where as empirical methodologies promote delaying decisions until there is enough information available.

Both approaches recognize the importance of information and of managing risk yet they have contradictory views on the best way of dealing with them. Where fixed-plan methodologies believe that by committing heavily to the informed decision prior to execution you reduce risk empirical methodologies accept that, in reality, every decision that you make ahead of time carries the greatest risk of all: it could be wrong. The chances of wrong decisions are increased because the information that fixed-plan methods derive their decisions from is essentially highly detailed speculation which in turn is based on a limited amount of real information which carries the risk of being incorrect or out-of-date or even irrelevant. The strength of commitment increases the impact of a wrong decision even more. Agile and Lean acknowledge this fact and tackle it head on by managing risk with two techniques: they delay decisions as much as possible until the maximum amount of real information is available - what Lean calls the last responsible moment (as Lean acknowledges there is also risk in delaying a decision too much) - and they attempt to de-risk decisions by lowering commitment - essentially by allowing change.

Agile and Lean acknowledge that upfront decision making carries the need to cover a great number of hypothetical eventualities than may or may not actually occur. The crystal ball gazing of fixed-plan methods is an attempt to de-risk the unavoidable gaps in information by trying to account for any possible variation with more pre-made decisions. Essentially imagine trying to plan out an entire chess game before you play it: every possible move your opponent does or doesn't make would have to be accounted for in the plan. The end result? An over complex and convoluted plan (which translates to over engineered, complex software). Again because Agile doesn't rely on pre-packed decisions, instead acting on the real information, it removes the need to make the "what if" decisions - this is the foundation of the maxim "You Ain't Gonna Need It".

Another aspect that traditional methods fail to tackle is decision paralysis: having one decision to make is hard enough but having to make all decisions infallibly and at once requires omnipotence. Although some developers believe they are gods Agile recognizes their limitations by giving techniques to allow focus on as few decisions at a time: TDD is a great example of this and so is the maxim "The simplest thing that could possibly work".

Agile and Lean promote both the overall reduction in decisions and minimizing the number of decisions to be made at one time (essentially by spreading them out across the project). Pushed to an extreme, decision making can be reduced to being purely reactive. The purely impractical "URL Driven Development" demonstrates this by only making decisions when the event which creates the need for a decision occurs. Essentially when someone hits a URL you write just enough HTML to satisfy their request. Despite being impractical it demonstrates the principles of Just In Time decisions well and should not be used to negate their value within the development cycle where decisions can safely be reduced to being purely reactive. Test Driven Development uses JIT decisions extensively: you write a test, an event occurs (unsuccessfully) which forces a decision to write just enough code to satisfy that one event, rerunning the test recreates the event and validates your decision and then you move on to add another event forcing another decision.

Aggressively minimizing the number of decisions which need to be made by simply delaying them or eliminating them altogether has an overall positive effect on a project. Ensuring that there are few small decisions which require minimum commitment and have the maximum amount of information increases the chances of making good decisions and reduces risk of committing to bad ones. In short: prefer fewer good decisions than many potentially bad ones.

Thursday 24 July 2008

Legacy Index

A legacy index measures the number of releases a software product has until it becomes legacy. It can be used to measure and communicate design quality: the more releases on a code base before change becomes difficult and it gets demoted to a legacy system then the better the design.

The target should be a high legacy index or to reach Legacy Max. An application that becomes legacy after its first release would be a Legacy Zero app. The lowest index is awarded to applications which become legacy before they are even released: Legacy minus One!

Applications can still have further releases once they've hit their legacy index: these releases would be identified by consisting mainly of bug fixes and only minor changes, this would be called "pushing the legacy index" as each release would not increase the index. Also an application may be able to reverse its legacy status by refactoring it to a point where change is again possible.

By measuring and plotting an applications legacy index against time you can clearly see the health and state of an application. If it has a steady and consistent incline then you know that the design is sound and the application is healthy but if the index has a trend of slowing or is continuously pushing the legacy index then it is clear that there is a problem.

Though quite humorous legacy indexes may have a genuine use in describing the state of an application, you can simply say "this is a Legacy Zero app" or "it's hit its legacy index" and all is clear.

Wednesday 11 June 2008

10 things that should get devs fired

Today is the final of The Apprentice and if this years four finalists are representative of Britain's next generation of top business leaders then we have something serious to worry about. To acknowledge this fact I thought I'd come up with a list of the top ten things that should get developers fired (though please take this in the humorous spirit is is written - I'm not actually advocating firing people).

So in typical hard-nose, no messing Sir Allen style, here's the list in no particular order (and if you have any good ones please comment):

1. "We're a Microsoft shop we only run Microsoft" (replace with any vendor)
You've limited all design and architectural decisions down to one vendor regardless of suitability, cost effectiveness or productivity pushing up the cost of projects and potentially being left with an unsuitable solution.

Risking the success of a project by restricting technology choice to one vendor: you're fired!

2. Prohibiting Open Source Software
Despite the fact that an Open Source product may be the best solution available a prohibition on OS software has prevented its use on projects.

Dismissing viable solutions with potential benefits to your business: you're fired!

3. Single language
You've limited the performance of your development by prescribing that all development be done in one single language. The fact that the language may not be suitable for the job, will perform poorly or be more expensive is ignored over complying with an arbitrary 'strategic' decision to unify all development.

Running up costs because you can't choose the best tool for the job: you're fired!

4. "I'm a Java developer I don't do Ruby" (replace with any languages)
Your language defines your role and you only work in that one core language. You have no interest in other languages and believe your language is the one true language.

I have no place in my development team for one-trick pony developers: you're fired!

5. Documented not automated
You'd rather produce a 15 page document with screenshots on how to deploy your application than spend less time automating it. You place value on creating loads of documentation rather than producing things that actually work.

Wasting money on something that will be immediately out-of-date and no-one will read: you're fired!

6. No source control
Projects or critical dependancies have never been added to source control or even worse there's no source control at all.

This is wholly unacceptable: you're fired!

7. Artifacts built off of developer boxes
Deployment means opening up Visual Studio, pressing F5, zipping up the dlls and handing them over to IS to install (with a 15 page document).

You are a cowboy, this is simply unprofessional and amateurish: you're fired!

8. No automated tests
You never write automated tests and simply rely on the old "run and click about" or the "run the test console and check the results" methods of testing.

No way of verifying your changes, there's no room for hackers: you're fired!

9. No CI
There is no visibility of the state of the code base and as long as it runs on your machine then that's OK by you.

No way of understanding the status of the code base: you're fired!

10. You're an architect
What more can I say? You probably disagree with this whole list especially because it doesn't come with a Visio diagram and can't be orchestrated in BizTalk. You're fired!

Friday 23 May 2008

It's ladies night

More on women in IT there's a great article on the Guardian about the Geek Girl Dinners movement.

Friday 25 April 2008

The Snake and The Emerald in The Mesh

Back in the first days of the new millenia when .NET pamphlets first hit the desks of Microsoft shops the world over one of the great and exciting features being sold was multilanguage development. Microsoft painted a world where the UI team could beaver away happily and productively in VB.NET while the back end team got hard core writing business logic in C#. Microsoft even talked about other vendors writing languages for .NET including Perl (which was actually one of the languages supported by the original ASP through the Windows Scripting Engine) and Microsoft Marketing painted a picture of a world blind to the prejudices of different languages.

Fairly soon the dust settled over the pamphlets and everyone forgot about the great multi-language dream, .NET was as polorized as British politics: there was Labour and Tory, C# and VB, no other language got a look in and those that existed were merley academic exercises of no importance to your MVP. There were many critics of the importance of multi-language support in .NET and they were feeling proven right.

At the same time another important trend was occuring: the rise of static languages: Java, C#, VB.NET are all strongly typed. Now even VB was strongly typed MS developers felt confident to dismiss dynamic typed languages as toys for script kiddies, web designers and network engineers. Grown-up enterprise development required things like type safety and checking.

So now the world is strongly typed and .NET settles around VB and C# and Java is just Java, the battle lines are clear: who is the king of the strongly typed. Then all of a sudden, out of nowhere, the alpha geeks decide that dynamic languages are the thing: Ruby, Python and Groovy, people are re-realising that dynamic languages can do very, very cool things which strongly typed languages struggle to express.

A new level starts in the arms race between Sun and Microsoft as people start trying to implement dynamic languages on top of the JVM and CLR. Java gets Jython and JRuby and a glut of projects attempt dynamic languages on the CLR including Microsoft's own IronPython but then all of a sudden people realize something: the CLR wasn't designed to do dynamic languages: sure you can build languages on top of the CLR but gone is the dream of having C# assemblies seemlessly call Python assemblies without even knowing it.

Fortunately Microsoft had the guts (or rather Jim Hugunin did) to sort this problem out and get .NET back to where it should be and the Dynamic Language Runtime was born. The DLR sits ontop of the CLR and allows all the dynamic goodness of languages like Ruby and Python (and now VB 10) to work as equals alongside C# and VB.NET code. The DLR will allow a new phase of the polyglot dream to emerge: C# calling Python calling Ruby calling VB and every combination in between.

Microsoft are going to start pushing IronPython and dynamic languages hard. It is one of the main features of Silverlight 2.0 and being expanded to ASP.NET. Interestingly MS is positioning dynamic languages firmly in the web and UI arena (which makes sence as this is where they are traditionally strong). In reality dynamic languages aren't restricted to the UI layer and entire applications can be written in them, though I think Microsoft aren't pushing this point because they don't want to scare or alienate their core developer community. Any how the people who would want to do will try and do it.

There are a few interesting issues arising out of the dynamic languages race:
When is Ruby/Python not Ruby/Python?
As languages run on the the CLR can talk to the .NET framework and other .NET libraries they break compatability between other interpreters: though it's still legal Ruby/Python code it isn't going to run on any other interpreter. DLR languages can also be compiled to .NET assemblies breaking all compatability with anyone else. Some see a risk here of MS "restandardising" pointing to HTML as an example.
Why run IronRuby/Python when you could just Ruby/Python?
Technically there are a few answers about integration with .NET etc. but they are pretty weak when it comes to writing apps in one language. Leveraging the power of dynamic languages from within a .NET app (i.e. polyglots) is the main selling point, otherwise, for those who are simply going to write pure Ruby/Python code, then the big thing is ratification: for those .NET shops out there who still believe the MS FUD of previous years simply putting MS in front of the language makes it acceptable.
What about open source?
Both Python and Ruby are open source languages so isn't this a bit incompatable with MS? Interestingly both IronPython and IronRuby are open source (under the Microsoft Public License) and because the DLR came from the work on IronPython it is also open source. IronPython is hosted on CodePlex but IronRuby - in an attempt to engage the Ruby community - is hosted on RubyForge.

Tuesday 18 March 2008

Polymonoism (AKA the legend of cross platform .NET)

After using IntelliJ for the last nine months or so I have realised how much Visual Studio sucks (even with Resharper). The problem I have with VS is the IDE seems to be focused on everything but writing code and I feel there is only ten percent of it I use and that only does ten percent of what I can do with IntelliJ. I, like many others, really wish there was an IntelliJ plugin for .NET (in particular C#) or even a whole IntelliC, however the news on the street is despite JetBrains hinting rather explicitly that such a thing was in the pipe line they've shelved the idea (BOOO).

I can't blame JetBrains for wanting to focus their efforts on Resharper: MS developers, as a herd, generally keep to the MS straight and narrow and fear wandering from the vendor's path of least wisdom. It makes no commercial sence to develop an alternative IDE from Visual Studio when most MS devs won't even bother to look beyond the MS brochure. As a plugin Resharper is going to hit a bigger market.

This is a shame and a loss for choice: the Java community has healthy competition with its IDEs and plenty of innovation with it but, Borland aside, this culture doesn't exist in the mainstream MS world (hence why Visual Studio's 'refactoring' and 'testing' features are a joke).

The other issue is that Visual Studio only runs on one platform (Windows) for one vendor (MS funnily enough); from the ground up most Java IDEs support multiple platforms and multiple vendors of the JDK and compilers. Why is this of any relevance for .NET - who would want to develop it on anything else when it only runs on Windows anyway? Firstly I don't want to develop on Windows: I find it an awful OS for doing development on and I am far more productive on Linux or OSX (like many I have a preference for a different OS which happens not to be Windows: why should that even be a statement?), secondly I don't necessarily want to run my .NET applications on Windows: after the luxury of working with true multi platform languages such as Java and Ruby I find it really frustrating to have to ditch everything positive about working cross-platform (and that means working on Windows too) and be shoved into a corner where my choice is simply XP or one of the 26 variations of Vista.

Even though I may not express a preference for working on Windows or with Visual Studio but there are many things about .NET I really like: there are things about the cleanliness of the language and the SDK which trumps Java (generics anyone?). The real advantage Java has over .NET is the community around it and the way Sun has left people free to contribute to it, there is far more innovation coming from the Java community than from the .NET one (hell, the .NET community is just trying to port as many Java projects as quickly as it can). However the .NET community is growing (especially with the ALT.NET movement) and it isn't to hard to wriggle out of the MS straight jacket. All in all there's a real feeling of excitement around .NET at the moment despite MS's attempts to lump as many bad practices into it as they can. As a result .NET is a serious contender to Java except it falls flat on its face when it comes to cross platform support.

Well there may be some hope out there: mono is coming along nicely (with almost full 2.0 compatibility) and comes as part of Gnome (so for Linux it's already there) and there is a version for OSX. With a bit of support and real world proof mono has the potential to become a real Java killer.

The problem is you're not going to be able to compile to mono in Visual Studio and even if you could you can still only run VS on Windows solving only half the problem. The other problem is Mono IDEs suck: monodevelop, a sterling effort, is like going back in time 15 years. SharpDevelop (from which monodevelop was originally forked) would be cool but again it only runs on Windows (booooo to you icsharp).

A bit of research and I stumbled across a little IDE from Omniture call X-Develop. Not only does it run on Windows, OSX and Linux but it can handle C#, J#, Visual Basic and Java compiling to whatever framework you choose including mono. The word on the blogoshpere is that X-Develop is a usable, productive UI and despite not being as feature rich as IntelliJ (it's refactoring support is weaker) or Visual Studio (UI designers) is a slick, quick, lightweight, cross platform IDE, which includes all the essentials including basic refactoring (one up on VS), error highlighting (another up on VS) , debugging etc. The good news for fans of other IDEs is that you can choose to map keybingins from Visual Studio, IntelliJ, Eclipse and others.

The downside is you have to pay for it and at $499 per license it isn't the sort of thing you'd buy on a whim. Naturally I'd prefer it to be open source but I'm going to be realistic that people have to make money though a cut down OS version without the designers etc. for free may be a wise move to get people hooked. Omniture seem to be giving a serious proposition that no other vendor can match in terms of true cross platform, cross framework development and that is a fantastic thing.

The problem Ominiture are going to have is appealing to the Java and Linux community who do have a preference for free software. The problem mono has is that without a decent cross platform IDE it's not going to gain serious ground. I think both sides need to reach out and touch each other to ensure mono gets the real-estate it deserves. I for one would be seriously excited about developing true cross platform applications in C# on my Mac or Ubuntu. Hopefully it may be a future arriving quite soon and not just another broken pipe dream.

Monday 17 March 2008

The rise of the Nu-Geeks

At QCon last Thursday Kent Beck gave a keynote on Trends in Agile Development. There were lots of interesting slides about the rise in tests, quick releases and lots of other agileness but the most interesting aspect of the talk for me was the rise of the new generation of tech savvy business professionals. The old "wizards" detectable by their strange socially inappropriate behaviour are out as a generation of Nu-Geeks with social skills like listening, team work and emotional intelligence are rising to the challenge of making businesses happy.

I've never been one for the old skool geek, I'd go as far to say I am a NAG (Not A Geek) and have found myself often frustrated by people trying to tar me with the brush of demeaning stereotypes - the most extreme example being a senior manager who put the IT department in the basement believing we would be more comfortable far from the real world of human interaction (no clients were ever bought to the basement by the way: except if they were techies themselves) - so I identify strongly with this rise. This is one of the many reasons I was attracted to Agile, listening to Kent Beck talk reminded me that what I found refreshing about Extreme Programming Explained was the focus on the social side of development and the reason I signed the Agile Manifesto was the belief in "people over process".

This all supposes that the stereotype ever existed. The feminist and existentialist Simone de Beauvoir (friend and influencer of Jean-Paul Sartre) argued that stereotyping is a form of subjugation and always done in societies by the group higher in the hierarchy to the group lower in the hierarchy so that the lower group became the “other” and had a false aura of mystery around it. How accurately does that describe the IT industry with the exception that perhaps there is a bi-directional purpose to the stereotype: one from outside the group to keep the geeks in and the other from inside to keep the women out?

Hopefully as the geeky male image melts faster than the ice caps we will start to see more women join the fray. Recent news seems to offer a strong promise of this trend.

Girl's found computers too macho back before the turn of the millennium and research blamed the strong male images and metaphors such as pirates, ships and planes opposed to softer, feminine images (apparently teddy bears and flowers). Now the tables have turned and research by Tesco found that girls are more computer savvy than boys. So as the last bastion of strictly male territory falls so will the old stereotype of the geek-knight on a white mac coming to the rescue of the maiden caught by the evil Word dragons. Unfortunately this will leave a horde of male geeks without any chance of female contact.

The trend is fierce and girls are not only better at computers than boys but are also more prolific with it. Apparantly the growth of social networking and blogging is "almost entirely fuelled by girls". Gaming is another area which women are conquering with girl gamers being the fastest growing group in the entertainment industry. Not only that but there are more Nintendo DSs sitting snuggly in the hands of the fairer sex (54%) than the soft unworked hands of the coffee crunching, light fearing, cellar dwelling male geek of old (only 46%). Nintendo's own research show that it "continues to reach more women ... a significant percentage of all Touch Generations software buyers are female". No wonder the new face of Nintendo is Nicole Kidman.

ThoughtWorks is taking the challenge on to help resolve the problem. My suspicion is that Agile itself may be the greatest weapon in the battle: a development methodology with a more people centric, reality focused approach which values the softer skills and attempts to bridge the gulf between writing code and actually creating real things. It is those real things I am motivated by and I know that personally, without Agile, I would be struggling to maintain my sanity in the IT industry of old.

Monday 10 March 2008

"I'll show you mine if you show me yours" vs "You tell me and I'll tell you"

Avoiding breaking encapsulation can sometimes be difficult. The most common encapsulating breaking habit is using field accessors (properties/getters and setters), these can break encapsulation as the object looses control over how it's information should be interpreted and also a language is never built up. Here is an example: in a banking application the system needs to know if the account is overdrawn, it would be common to do:

if(account.balance < 0) ... do something

This code could be scattered across the system: however, what if a new business rule came in to say that if an account is frozen it shouldn't be marked as overdrawn? All those statements are incorrect and have to be changed. If behaviour had been put on the account it would have been simpler to do:

if(account.is_overdrawn) ... do something

Then the Account is in complete control over what it considers as being overdrawn. This is a simple example and easy to solve but what if there is an exchange of information between two objects? Breaking encapsulation seems a hard thing to avoid. For example:

if(loan.amount + account.balance < 0) ... do something

This is a classic case of "I'll show you mine if you show me yours": the loan exposes it's amount and the account exposes it's balance, both objects have lost control of their data. In the real world this would create scenarios of ridiculous bureaucratic proportions. Let's prove this with an application which models a host offering a guest a drink. Using the non-encapsulated method is the equivilant of having a third party ask both of you for private information and then doing the selection for you. Here it is in code:

class IllShowYouMineIfYouShowMeYours
 def main
  host.drinks.eachType | drinkType | do
   if(guest.drinks.containsType(drinkType)) do
    drink = drinks.get(drinkType)
  
    if(drink.is_consumed) do
     throw AlreadyConsumedException.new
    end

    guest.energy.add(drinks.get(drinkType).calories)
    drinks.remove(drink)
    drink.is_consumed = false
   end   
  end
 end
end

A better way to solve this is to use "You tell me and I'll tell you". In the real world the host would ask "what would you like to drink?" and the stock reply is "what have you got?" they would then tell you what drinks they have and you'd tell them which drink you want: neither of you expose any data: there is no case of the guest rummaging through the host's cupboards, the host can select their drinks themselves and the guest is allowed to decide what they'd like. By telling the Host to offer the drinks the Host can tell the Guest to choose one and encapsulation isn't broken. Here is the alternative:

class YouTellMeAndIllTellYou
 def main
  drink = host.give_drinks_to(guest)

  if(drink.is_consumed) do
  . . .
 end
end

class Host
 def give_drinks_to(guest)
  drink = drinks.get(guest.which_drink(drinks.getTypes))
 end
end

class Guest

 def which_drink(drinkTypes)
  drinkTypes.each |drinkType| { return drinkType if drinks_i_like.contains(drinkType) }
 end

end

There is room for further encapsulation: the consumption of the drink still relies on outside manipulation, we have another case of "I'll show you mine" but even worse as it's not only showing it but it's letting just about anyone touch it too! So let's tell the guest to take the drink and tell the drink to give nutrition to the guest.

class YouTellMeAndIllTellYou
 def main
  drink = host.give_drinks_to(guest)
 end
end

class Host
 def give_drinks_to(guest)
  drink = drinks.get(guest.which_drink(drinks.getTypes))
  guest.give(drink)
  drinks.remove(drink)
 end
end

class Guest
 .
 .
 .

 def give(drink)
  drink.consume(self)
 end

 def increase_energy(increase_by)
  calories = calories + increase_by
 end

end

def Drink
 def consume(consumer)
  if(is_consumed) do
   throw AlreadyConsumedException.new
  end 
 
  consumer.increase_energy(calories)
  is_consumed = true
 end
end

By encapsulating the behaviour we have given ourselves a number of advantages. The first and most obvious of them being testing: the encapsulated code is much easier to test as the behaviour of each class can be independantly tested and verified. We also stop spread: without those getters and setters it is very difficult to add behavior centric to those classes in any other parts of the application: all behaviour to do with the class is in one place, this means less repetition, less bugs and less maintenance. We also allow Single Responsibility: if we wanted to change the way the drinks worked (say from a list to a hash) then we can do safelty without breaking any code. Lastly we have code which supports polymorphism: for example if we wanted to add alcoholic drinks to the system, we can polymorphically add a different type of drink which causes a guests to become drunk:

class AlcoholicDrink
 def consume(consumer)
  if(is_consumed) do
   . . .
  end
  
  consumer.increase_alcohol_levels(units)
    . . . 
 end
end

The guest can also be made polymorphic:

def TeaTotalGuest
 def give(drink)
  if(drink.is_alchoholic) throw IDontDrinkException.new
 end
end

def LightWeight
 def increase_alcohol_levels(units)
  total_units += units
  if(total_units > 3) spew
 end
end

def PartyAnimal
 def increase_alcohol_levels(units)
  total_units += units
  if(total_units > 10) start_dancing
 end
end

All of the above polymorphic behavior can be easily added without ever changing the code of any consumers, in the non-encapsulated version there would be a nightmare or nested if's and switch statements which would make even our TeaTotalGuest dizzy and want to spew. Or to quote Neal Ford:
"getters/setters != encapsulation. It is now a knee-jerk reaction to automatically create getters and setters for every field. Don't create code on auto-pilot! Create getters and setters only when you need them 10 Ways to Improve Your Code".

As a final note to readers: please drink responsibly!

Friday 7 March 2008

Balancing Greed with Anemia

I am many things to many people. If I go to the doctors I am a patient, when at work I am an employee, when at home, with my wife, I am a husband. I cannot be all these things at the same time and I need to behave differently with different people and what's more my doctor isn't interested in me as an employee, my employee cares not for my detailed medical history and my wife certainly doesn't care what languages I can program in let alone the content of my blog!

Readers of Domain Driven Design will be familiar with the concept of Aggregates: simply put an aggregate provides a boundary around concepts which are closely related and provides an entry point to those other entities. If you were modelling people the Person class would be an aggregate, within the person class would be other classes and methods to do with my medical state, languages I can program in and package private methods between my wife and I. Using a behaviour centric domain finding we can write code like this:

class Client 
 def do
  john = employeeRespotory.find("John")

  if(john.is_not_skilled_in(:AnyJava)) do
   john.train(:BeginnersJavaCourse)
  end

  john.can_do_job(Job.new(:SqlServer, :DotNet))
 end
end

class Person < Skilled
 skills

  def is_skilled_in(skill) 
  if(skills.not_contains(skill.type)) do
   return false
  end

  return skills.get(skill.type).is_at_level(skill.level)
 end

 def is_not_skilled_in(skill)
  !is_skilled_in(skill)
 end

 boolean train(Training)  
  if(skills.not_contains(Training.skill) do
   skills.add(Training.skill)
  else
   skills.get(Training.skill).upgrade(Training.level)
  end
   end

 def can_do_job(Job)
  Job.matches_skills(this)
 end 
end

class Job
 def matches_skills(skilled)
  Skills.each | skill | do
   return false if skilled.is_not_skilled_in(skill)
  end
 end
end

class Test 
 def can_be_skilled_in_something
  skills = Skills{:BasicJava}
  testPerson = new TestPerson(skills)

  assert_true(testPerson.is_skilled_in(:BasicJava))
 end

 def can_be_trained_in_something
  testPerson = TestPerson.new(:NoSkills)
  skill = Skill.new(Java, Basic)

  testPerson.train(Training.new(skill))
  assert_true(testPerson.is_skilled_in(skill))    
 end
 ... plus all the other tests
end

The problem is if we placed all of this behaviour and data in one Person class it will quickly get on the big side. This is what I call a Greedy Aggregate: in the same way I don't try to be all things to all people a class shouldn't try to be all things to all clients.

The biggest criticism I have heard about placing behaviour on Domain objects is the problem of Greedy Aggregates: huge classes that end up with bags of methods and test classes as long as your arm. The solution often presented is to move to a Service paradigm instead. I agree with the complaint of Greedy Aggreates but not the solution as the service paradigm moves code away from being object orientated and towards procedural. The above in service orientated code would be:

class Client 
 def do
  john = employeeRespotory.find("John")

  if(employeeService.is_not_skilled_in(john, :Java)) do
   employeeService.train(john, :BeginnersJavaCourse)
  end

  employeeService.can_do_job(john, Job.new(:SqlServer, :DotNet))
 end
end

class EmployeeService 
        skillsRespository  
 def is_skilled_in(person, skill) do
  if(skillsRespository.not_contains(person.id, skill.type)) do
   return false
  end

  return skillsRespository.get(skill.type).is_at_level(skill.level)
        end

 def is_not_skilled_in(person, skill)
  !is_skilled_in(person, skill)
 end

 def train(person, training) do
  if(skillsRepository.not_contains(person.id, training.skill)) do
   skillsResponsity.add(person.id, Training.skill)
  else
   skillsResponsity.get(prson.id, taining.skill).upgrade(training.level)
  end
 end

 boolean can_do_job(person, job) {
  job.eachSkill |skill| do 
   return false if is_not_skilled_in(person, skill)
  end

  return true
 end
end

def Test
 def canBeSkilledInSomething
  testPerson = TestPerson.new
  skill = new Skill(Java, Basic)
  MockRepository {
   expect.contains(testPerson.id, skill.type)
    will.return(true)
   expect.get(testPerson.id, skill.type)
    will.return(new Skill(:Java, :Basic))
   
  }

  assert_true(employeeService.is_skilled_in(testPerson, :BasicJava))
 end

 def can_be_trained_in_something 
  testPerson = TestPerson.new
  Skill = Skill.new(:Java, :Beginner)

  MockRepository {
   expect.contains(testPerson.id, skill.type)
    will.return(false)
   expect.add(testPerson.id, skill.type)
   expect.get(testPerson.id, skill.type)
    will.return(new Skill(:Java, :Basic))
   
  }

  employeeService.train(testPerson, Training.new(skill))
  assert_true(employeeService.is_skilled_in(skill))    
 end
 ... plus all the other tests
end

The service client code feels more difficult to read: it flows less as a sentance making the Service the subject and not John who has disapeared somewhere in the parantheses. The Service code itself is more difficult to read than the behaviour class and the service test class is significantly more complex. Also the service requires a more complex repository and the Person class is reduced to being nothing more than a data holder. Another problem with service based is you often have to break encapsulation to get it to work and, as a result, you can end up with a lot of repetation in code.

I believe the issue with Greedy Aggregates mainly stems from making code centric to O/R mapping tools. To simply model our Person domain we end up with:

A Person object which maps to a Person table

A Medical History class with a table with a Person_Id foreign key

A Skill class with a table with a Person_Id foreign key

This makes it difficult to split the Person into smaller more specific classes as the O/R mapper requires a definate class to map to. In the above code we may want to split Person into two classes: one which represents the core aspects of a Person (name, age etc.) and another for their skills lets call it PersonWithSkills. In many O/R mappers this is difficult because we cannot create a mapping for PersonWithSkills as two classes cannot map to the same table. However we can create a repository which ties them together:

class PersonWithSkillsRepository
 def get(id)
  person = personRepository.get(id)
  skills = skillsRepostitory.get(person)
  return PersonWithSkills.new(person, skills) 
 end
end

This is how I've tended to advocate it in the past but recently I was playing with Ruby, and using its cool dynamic powers I approached it in a different way: essentially I merley extended the Person class using a module like so:

class Person 
 has_many :skills
 
 def with_skills
  self.extend PersonWithSkills
 end
end

module PersonWithSkills
 def is_skilled_in 
        ... # all skills based methods here 
end

The only issue with this method is all of the ActiveRecord declarations have to be on the Person class - there may be a way around this but I don't know Ruby well enough to say for sure. It also has the disadvantage of not working for any classes outside of the core package (though knowing Ruby there may be a way around this to). However if you want to do more complex logic between role based aggregates you can. For example:

class Person
 def as_traveller
  self.extend PersonAsTraveller
 end
end

module PersonAsTraveller < Locatable ... end

module TravellingPersonWithSkills < Skilled  
 def can_do_job(job)
  job.matches_location(self.as_traveller) && job.matches_skills(skills) 
 end
end

class Job
 def matches_location(locatable)
  return locatable.matches(location)
 end
end

Dynamic languages such as Ruby make it nice and easy to do the above as it has the ability to mix in methods. That doesn't mean we can't do something similar in a static langauge such as Java or C# it simply means that we have to jump through a few more hoops and use delegation instead.

So now that we are using behaviour based domain objects and moved away from services and have managed to cut those Greedy Aggregates down to size when is it right to use a service? A service will need to be introduced when there is something normal domain objects cannot be trusted to deal with something by themselves and require some co-ordination or greater authority to do it for them. The service must be stateless. A good example of this is transferring money between accounts: you don't want to leave the accounts to sort it amongst themselves so you will need a service to deal with the interaction:

class AccountTransferService
 def transfer(account_from, account_to, amount)
  if(account_from.has_funds(amount)) do
   account_from.debit(amount)
   account_to.credit(amount)
  end
 end
end

There are a number of ways of dealing with Greedy Aggregates in both dynamic and static langauges. What is important is being able to clearly identify aggregates and not be to restricted by the O/R mapper or simply give up and rely on services.

Saturday 19 January 2008

The One Minute Build

The project I am working on has a build which used to take under a minute and has slowly crept up to around one minute forty five seconds. Recently I have begun to feel the pain of this longer build; it is surprising just how much a paltry forty-five seconds increases those periods of sitting and waiting.

The build is an essential part of the code base, it is what gives you the developer and the rest of the team feedback on the state of your code. Continous Integration tell us that we should make as small changes as possible, verify them using the build, then check in (then verify the build on the build server). Under these rules we should probably aim for a optimum integration rate - the frequency of check-ins (per pair). I believe a optimum integration rate should be around once per hour.

A slow build can be a blocker to good CI practice: the longer a build takes to run the less often people will run it, the less they will check in and the bigger the change list they create. Martin Fowler's original article on CI states that "the XP guideline of a ten minute build is perfectly within reason" however I want to contest that guideline and bring it down to a minute. Why? because the difference between a build that takes under a minute and one that is over a minute is quite considerable in trying to keep your change list down.

On average a developer will run three builds per change, the first being on the change itself - which is usually run in the IDE and will often have some failing tests (though not always) - the second to verify the change and the third to verify the update. So the general rhythm goes: make a change, run the tests, fix the broken tests, verify the build on the command line, update, verify the build again, commit. Of course it is possible to make a change and have no tests failing (meaning two builds per change) but even then most devs will run the testsuite in the IDE and then verify the full build against the command line.

In the case of a merge conflict (either in code or failing tests on update) we add an extra two builds per conflict: the first to verify the fix, the second to verify the fresh update before check-in.

Three changes per build means that the total amount of time spent building is the build time multiplied by three: therefore any build which takes one minute will consume three minutes of developer productivity. If we check-in once every hour that is 3/60 or 5% of your devs time spent building. Take the build up to just two minutes and we have 6/60 or 10%. Take a build of ten minutes and we have 30/60 or 50% of devs time spent building! If there was a merge conflict on a ten minute build it will consume 50/60 or over 80% of that hour.

On a ten minute build a productivity ratio of 1:2 is not acceptable, so to compensate we adjust the optimum integration rate. This is achieved, essentially, by trying to bring the ratio back to an acceptable level, so if 10% loss (equivalent to a two minute build) is acceptable the check-in rate becomes once every five hours. Comparatively 10% on a one minute build is a check-in every 30 minutes verses once every five hours!

Decreasing check-ins is a dangerous game; the longer changed code is out of the repository the more out-of-date it is. The difference between your repository and your code becomes greater (assuming, of course, that you are making changes over that period) and the difference between your code and your fellow developers also becomes greater. By checking in frequently the codebases are constantly re-aligned minimizing the impact of future changes. The most obvious side effect to diverging code bases is the chance of conflict increases - small, frequent check-ins reduce this to an extent where it can become a non-issue - big check-ins mean big conflicts, big conflicts mean loss of productivity, stress, more time your code is out of sync and more times you have to run the build.

To demonstrate this lets create a concept of a "window of opportunity", this window is the space between doing the update and checking in and the opportunity is for another pair to check-in before you do! The size of this window is equivalent to the length of the build; we all know that if we are going to check in we try to grab this window (how often do we say "hang on a minute I've just got to run the tests and check in this build"?). The reason for this is we don't want to find that someone else has checked in between the last update and us running the build. The greater the length of the build, the greater the chance that someone else has committed (which increases based on the size of the team) and the more times you have to run your build.

There are other side-effects to long builds, you can't keep the change list small and people will find ways around running the build; from only running it once out of the three times (increasing the chance of broken builds) or avoiding full runs by only running the tests they believe to be impacted (again increasing the chance of broken builds). None of these is a good practice. Big builds also mean loss of attention: running the build, doing an update, running the build again, then checking in takes just two minutes on a one minute build but twenty on a ten. Two minutes won't disrupt to flow of development and probably works as a nice gap to discuss things with your pair or just give the brain a rest. The ten minute build equals a wait of twenty minutes: enough time to eat lunch! At the end of twenty minutes any focus you had has gone.

A one minute build may sound unrealistic but I believe it is possible. In some ways it is an indication to healthiness of your code base and by that I mean both production and test - poorly written and bad performing test code is equally as bad as poorly written, bad performing production code. It is very easy to be slightly relaxed about build times, treat them as an inevitability and the watch them slowly go from one minute builds to ten.

It's all very easy to preach about one minute builds when I am in the privilliged position to be working on a code base that only takes one minute forty five seconds. The speed of the build is due to the drive, early on in the project, not to tolerate long builds: several times it's been looked at when it started to get a bit overgrown and it's been hacked back down to size. If we had been more relaxed I'm sure it would have crept up a lot higher.