Jupiter Moonbeam & the Geeks from Cyberspace: April 2007

Friday 27 April 2007

Being more Fluent with Equals

A lot of .NET developers don't realise that there is a difference between the == operator and the Equals method. Even fewer developers realise there is a difference between the == operator and the == operator. Confused? That's because == will behave differently depending on whether you apply it to a reference type or a value type. More confusingly some .NET classes override == and will behave differently again. To explain MS made this attempt:

For predefined value types, the equality operator (==) returns true if the values of its operands are equal, false otherwise. For reference types other than string, == returns true if its two operands refer to the same object. For the string type, == compares the values of the strings.

Except that's not entirey true. The == operator on string only works if they are both strings. Confused even more? Then read this post from the C# FAQ.

So are you clear now? If not then MS sums it up quite nicely in their Guildelines for overloading:

To check for reference equality, use ReferenceEquals. To check for value equality, use Equals or Equals.

So why don't developers have it drummed into them to follow the above advice and just dump ==? Because they believe that == is easier to read. Let's think about it for a moment. How is it easier to read? Using == risks making buggy code and goes against every rule about intention revealing interfaces and maintaining encapsulation. How the hell do you know what the developer intended when she did x == y? Were they checking for value equality or reference equality? Or had they overloaded == to always do value equality (as string does)? Basically you don't know (breaking intent) and you'd have to open up the class to see (breaking encapsulation) and still you won't know for sure. Then of course there is just plain = now did they mean to do that or did they just miss the second =? So == is definitely not easier to read from an intent or encapsulation point of view.

So they must mean that == is better style. This I believe is flawed because sometimes you end up doing a bit of == here and a bit of Equals there and what's more all objects have Equals but structs do not have ==. So having a style which prefers == except when == doesn't do the same thing as Equals (or == doesn't even exist) throws all consistency and style out of the window and you end up with a style guide that says "use == except when or when or when" rather than just placing a total ban on ==.

Then it must just be that == looks better. I think this is just habit. Lets take the following lines:

if(x == y)
{
// do something
}
if(x = y)
{
// do something
}
if(x.Equals(y))
{
// do something
}

If you took a group of people who knew little about development (or C# for that matter) and ask them what each line means I can guarantee everyone of them will always get the last line right (they'd probably think that the double equals meant equals twice and that would confuse them on the single one). The Equals method is the most explicit and clear and readable of all of them (it actually reads as x equals y). To further prove my point grab your same person and ask them what this means:

if(x)
{
// do something
}
if(!x)
{
// do something
}

Then ask them what this means:

if(x.Equals(true))
{
// do something
}
if(x.Equals(false))
{
// do something
}

Those second examples look far clearer and you'd be an idiot to not know what the intent was. What's more they make their own mini fluent interfaces. Also you eliminate all those "oh there's a bang at the beginning" bugs. However I think it is fair to say that x.Equals(true) is a bit overkill though I do find that x.Equals(false) is somewhat clearer than the using the logical negation operator.

So after knowing that technically it's the right thing to do, that it's better for showing intent, that it is more consistent, that is reduces risk of bugs and everything else, if you still need convincing because you still think that == looks better then justify it by saying you're using a fluent interface.

TDD Anti-Patterns

It's a bit old now but I picked up a piece of code from a supplier that had the Generous Leftovers anti-pattern and it reminded me that every developer should have this list printed off and study it regularly.

Check out James Carr's TDD Anti-Patterns.

The Free Ride is probably the most common one and developers need to refer to "one assertion per test" to try and avoid this.

I think The Mockery is quite possibly the hardest anti-pattern to avoid mainly because we need a leap forward in the way Mock libaries work (what I call Quite Mocks). This will allow "one expectation per example". RSpec (used in the tutorial) has its own mock libary that supports Quite Mocks by passing a null_object argument. Unfortunately for us .NET developers out there such support doesn't exist. I have requested it on the NMock feature requests board but in the mean time you can create a helper class which uses reflection to stub out all the methods/properties you are not testing.

I am planning to 'translate' Dave Astels article to C# 'cos Ruby can be a bit tough on the eyes if you aren't used to it and as I mentioned already you need Quite Mocks (which don't exist) to really get it to work.

In the mean time I made my own contribution to the TDD Anti-Patterns with:

The Mad Hatter’s Tea Party
This is one of those test cases that seems to test a whole party of objects without testing any specific one. This is often found in poorly designed systems that cannot use mocks or stubs and as a result end up testing the state and behaviour of every peripheral object in order to ensure the object under test is working correctly.

Friday 20 April 2007

Sponsor Me

I'm doing the London Marathon this Sunday. If anyone falls across this post and wishes to sponsor me and give some money to the breaks4kids charity I'm running for then please do.

Sorry: no more non-geeky activity on this blog (though it's for a good cause).

Tuesday 17 April 2007

Don't override the back button

Have Google developers forgotton the number 1 of the The Top Ten Web Design Mistakes: namely don't break the back button?

I was just working on a great post, clicked on preview, found an error, clicked back to go back to the form and accidently (honest) clicked OK and now the post I just spent 45 minutes working on is gone because Google have messed with the way the back button works.

I know why this happened: because we're in a DHTML/AJAX world now and I didn't really submit anything (as the form gives the appearance of) just flipped a little javascript and switched some styles and layers about but for the sake of the Web are we really back to the 1999 world of DHTML abuse or could this be the first signs of Google's Microsoftization and they're now making the standards?

Wednesday 11 April 2007

Whole Values

I read a post the other day that said one of the commonest mistakes made by newbie OOP programmers was to do this:

PostCode postCode = "SW1 1EQ";

Then a spurt of experienced programmers hailed down their fury on this common misconception amongst newbies and how they had to bash out of them that they shouldn't waste their lives abstracting strings.

I also heard a MS qualified trainer tell all he knew and respected him that you should avoid the implicit operator overload at all costs because "if your type were meant to be a string MS would have made it one".

Unfortunately this is one of those places where the MS fan boys have been bought up badly by nasty VB. The concepts that are being so frowned upon by leagues of MVPs is not only a corner stone of OOP (to mix data and behaviour) but a wonderful concept called Whole Value and implicit overloading is C#'s gift to you to make it happen.

Thom Lawrence has a lovely short post on how to do whole values with implicit operators here
so I won't repeat his good work (though I will add that Martin Fowler recommends using structs) but what I will do is try and explain why whole values are a wonderful thing.

Because a PostCode isn't a Surname
One of the first and most basic things a Whole Value will give you is a level of type safety that you may never have realised existed. Have you ever had that annoying bug pop up in an application because someone accidently did this:

FindPerson(form.PostCode, form.Surname)
// somewhere else far, far away:
public ReadOnlyCollection FindPerson(string surname, string postCode);

In this simple example it's pretty obvious you've got it round the wrong way but when you've got a few extra variables to play with it's really easy to get it wrong. Well what if I said there's a way to prevent this ever happening? Use a Whole Value like so:


FindPerson(form.PostCode, form.Surname)

interface Form
{
Surname Surname {get{}};
PostCode PostCode {get{}};
}
// somewhere else far, far away:
public ReadOnlyCollection FindPerson(Surname surname, PostCode postCode);

Now when you go to compile you will get an error because tpye PostCode cannot be assigned to type Surname. You'll also find it helps when you do overloading. You can turn nasty code like this:

FindByPostCodeAndSurname(string postCode, string surname);
FindByPostCode(string postCode);
FindBySurname(string surname);

Into this:

Find(PostCode postCode, Surname surname);
Find(PostCode postCode);
Find(Surname surname);

How much cleaner is that? Of course there are other ways to skin that cat but you will still find that those ugly method names dissapear (especially in factories etc.).

Because a PostCode was born of string
The other thing is PostCode will start his life out as a string of some form. Either from a web form or a database but somewhere he was made out of a string. This is where the implicit overloading comes in: we can allow PostCode to easily start out as a string and handle like a string when he needs to (because he's gonna need to):

PostCode postCode = Form["PostCode"];

Then somewhere far away:


Parameter["PostCode"] = Address.PostCode;

Because a PostCode isn't a string
The other thing is PostCode isn't a string. Sure somewhere he starts life as a string and somewhere you've got to have a string with the real post code in it but somewhere even further down that ain't a string at all it's a char array and somewhere further down... The point of OOP is to abstract real world things and encapsulate them and if you let PostCode wander around your system as a string he's never gonna reach his full potential (and he might just wander where he shouldn't). All the other bigger grown up objects are going to have to do everything for him: deciding whether he's valid, chop him up to find out what his area code is, compare him to other postcodes to see if they're in the same area. The poor old postcode will never reach his potential and instead will be pushed and shoved around by all the bigger boys.

You are a cruel, cruel programmer to let this happen: you are just as bad as those parents who never give their children any responsibility and then moan at them for being incapable of doing anything for themselves. But there is still hope: give your PostCode some responsibility and start by making him a Whole Value:

struct PostCode
{
Area {get;}
District {get;}
Sector {get;}
InwardCode {get;}
OutwardCode {get;}
}

Doesn't that look better? Now instead of this:

string postCode = "SQ1 1EQ";
if(LondonPostCodes.Contains(postCode.Substring... yuk I can't go on!

You can do something beautiful like this:

postCode = "SQ1 1EQ";
if(LondonPostCodes.Contains(postCode.Area)) ...

This of course goes even further because Area is a whole value too and you may decide that it should know what city it is. So the code now becomes:

if(postCode.Area.City.Equals(City.London))

Now PostCode can take all of that nasty horrible code that all the bigger boys had (and probably duplicated) and deal with it himself.

Because a PostCode should be a legitimate PostCode
Validation is also a good responsibility of a whole value so you can make the thing blow up if you try and put something bad in it (just the same as a DateTime will). For extra safety you can add Parse and TryParse methods to your Whole Value (I have them as standard).

So not only does your code become more type safe, more powerful and flexible but it also becomes more stable. No longer does every other object have to keep checking whether the postcode is in good shape or reference some nasty function library to find out the area; our little PostCode string has grown up into a real object at last and can now go out into the big wide world knowing he can shoulder the responsibility of keeping himself valid and answer all the questions people want to know of him.

So now whenever you see a simple type like a string or an int sittting on the outside of one your classes take a close look at it and ask youself what could have been if you'd only let it become a whole value.

Jupiter Moonbeam & the Geeks from Cyberspace