Wednesday, February 24, 2010

Enforce Idioms - Get Religion

In the last series of articles we talked about how consistent coding style and great documentation can help a code base stay manageable. Beyond coding style and documentation, I think the most important thing Project 2 could have done would be to clearly define the coding idioms they deemed to be important.

Idioms and coding religion are a great way to boil down the things that the designers, architects and programmers of the original code base deem to be important. Some things are about how code should be written and others are related to business aspects of the project. Often they reflect, at a code level, what the functional specifications of the project are. They all should be addressed as early into the project as possible.

Perhaps you are making a web framework? What are the important characteristics that framework should exhibit. Look at how Django and Rails have defined their design philosophies.

What if you are making an asynchronous communications library? Look at what Ice deems to be important. Compare this to the Twisted library.

What if you are writing a programming language? Then perhaps a whole other set of things are important to you? Consider The Zen of Python.

Just will a little thought (and a couple of beers) I assembled this short list of things I view as important considerations when tackling a software project.

  • Low Coupling, High Cohesion
  • Less code
  • Quick development
  • Don't Repeat Yourself
  • Explicit is better than Implicit
  • Consistency
  • Prefer Stateless Classes
  • Separate Logic from Presentation
  • All Code must have Tests
  • Internationalization must be addressed in all cases
  • MVC
  • SOA (RESTful?)
  • Getting Real - https://gettingreal.37signals.com/
  • Convention over Configuration - No XML Situps
  • Principle of Least Surprise
  • Principle of Least Knowledge
  • What are the accepted licenses
  • What are the accepted libraries
  • What are the accepted languages
  • No commented out code
  • Remove dead classes
  • Compile Clean
  • Prefer Data Driven Design
  • Don't treat Exceptions like Booleans
  • Don't Mask Exceptions
  • Know your Exception Hierarchy
  • Prefer Aggregation over Inheritance
  • Prefer Event-driven over Multi-threaded (or vise-versa)
  • Prefer Reaping over Explicit Deletion
  • Prefer Immutable Data
  • Prefer Finite State Machines to Boolean Nests
  • No Switch Statements
  • No Singletons (includes Globals)
  • No “Regions” in C#
  • Avoid state / Offload state to the caller
  • Cache everything / Cache nothing
  • Prefer Lazy-loading / Lazy initialization
  • Don't make up new names for accepted practices
Now I'm sure some of you are looking at this list and going “Well, duh!” but don't laugh too soon. You would not believe how many of these common sense coding philosophies are violated daily. If anything these rules can serve as a training tool for new developers to learn the difference between good and bad code. Reviewing one ortwo of these during lunch sessions is a great way to ramp up your teams overall coding skills.

I'm sure, even without a few beers, you could add a ton of your own rules to this list. The c2.com website is a great place for ideas and discussion.

The question is … how many rules should you set?

The answer is, of course, it depends. If you are running an open source project and are looking to recruit more developers onto your project, then perhaps you don't want too many barriers to adoption. If you are in a corporate environment and give RCS commit privileges immediately to new hires (bad), you might want to set more ground rules.

Just be aware of one important thing: When you add or change a rule, you instantly incur technical debt! If it was previously acceptable to mask exceptions and then you set a project rule that exceptions cannot be masked, you just took on a chunk of technical debt. Your first step in adopting this rule has to be cleaning up the debt. Don't start building your church with broken windows.

If your programmers can follow and maintain an agreed upon set of programming idioms you are likely well on your way to producing a beautiful body of code.

Knowing this sort of thing comes from experience. Next time we will talk about developer skills and what the good ones seem to have in common.

Tuesday, February 09, 2010

Self Documenting Code ...

I hope you read Jacob Kaplan-Moss's article "What to Write" ... because I'm not going to talk a lot more about that topic. He does an amazing job of explaining what users are looking for when they approach a code base as well as what a waste of time embedded documentation is.

My perspective on documenting code builds on Jacob's observations. As I've mentioned previously in Your Code Is The Other Team Member, code should read like a book. Your eyes should be able to scan the source unencumbered to glean the meaning. Don Knuth says code should be written primarily to communicate its purpose to humans [not to make the compilers happy].

In order to read like a book, code should be Self-Documenting. And by self-documenting I do not mean Doxygen or Javadoc. I mean, does the code clearly express its purpose in and of itself?

When you write code you need to constantly ask yourself "Am I sabotaging the readability of this code?"

How can we sabotage the readability of our code?

Don't document the literal code, but rather, the intention of the code

Who should I believe in this snippet of code?
// Do foo if check is True ...
if ( ! check_thing_has_occurred())
{
foo();
}
You just broke my brain. Now I've got to drill through the logic to find out if the comment or the code is right.

IDE Code Snippet Macros are an invention of the Devil

What value does this comment bring to this method?

/*********************
Detail: Constructor
**********************/
CCPasswordDlg::CCPasswordDlg(CWnd* pParent)
{
...

How about this?

/*********************
Detail: Sets the quality level based on a 1-10 variant
Pram: iQuality - quality setting (1-10)
Date: Sunday, April 06, 2008
By: Joe Developer
Note: Returns VIDEO_CODEC_SUCCESS if successful, VIDEO_CODEC_INCORRECT_PARAMETER otherwise.
**********************/
int CVideoEncoder::SetQuality( int iQuality )
...

So much of this fluff is auto-generated by some keystroke macro from the IDE. It's a complete waste of space. It breaks the flow of the code and creates more places for the reader to second guess themselves. Even worse, it's not even metadata for a doc generator. It's double junk.

Why do we need a date time stamp in the source? Why do we need the developers name? All that stuff is in the RCS. Don't duplicate it. Why do we need the /***...***/? Modern IDE's highlight method names perfectly. Isn't the bounds check more useful in the code? Can't we look at the source to see the return types? Sure, if this is a public-facing API, then perhaps the summary, variable information and return types are valid ... but not for internal code. Again, I would much prefer to see a working unit test or tutorial that explains how this method should be called in the context of a proper use case. That gives me far more benefit than an auto-generated html page will ever provide.

This is just the low-hanging fruit. The C2 wiki entry on Self Documenting Code gives an excellent summary of other things you can do to make your code more readable. Think about all of these points before you start sprinkling doc sugar all over your source.

If you focus on a consistent coding style and follow these rules for making your code base self-documenting you are well on your way to creating a code base that will out live your team or process. There is one more thing that really make it truly timeless ... establish a religion. We'll discuss this next time.

Friday, February 05, 2010

Your Code is the Other Team Member ...

One of my big complaints with the code base of Project 2 was the complete lack of a consistent coding style guide.

I'm of the belief that code should read like a book. You should be able to look at a function, class or method and read it from top to bottom like a story. It should have a clear beginning, middle and end. How would you like to a read a book where the formatting was completely inconsistent and you had to struggle with the typesetting to get to the content. That's what happens when source doesn't follow a style guide. Your brain can't move up the semantic ladder because it's mired in syntactic noise.

In order to get to the goal of Collective Code Ownership, everyone needs to see things the same way. Formatting and style is the first step to this goal.

There are a many different code style guides already available. Most are for specific languages: C, C++, C#, Java, Python. While some are variations on these for specific libraries or packages: Gnome, Django, Rails, Linux. And others are company specific style guides like the Google C++ guide and Microsoft Hungarian Notation.

A good coding style should include more than just brace style, comment form and indent conventions, but also things such as:
  • Directory naming
  • File naming
  • Package/Module naming
  • Class, Function and Variable naming (CamelCase? all_lower? AbbrAllowed?)
  • Member variable referencing (m_ vs. _ vs. this.?)
  • Member variable ordering (public, protected, private? Top of class vs. End of class?)
  • Method ordering (alphabetical? public, protected, private?)
  • Import rules and organization (Should all external references be fully qualified? Are blanket includes allowed?)
  • Variable shadowing rules?
  • Versioning standards
  • Copyright and IP ownership notifications
These all need to be addressed! It's very hard to find one standard that will address all of these points for your specific project, so you can expect to roll up your sleeves. Put it on a wiki and have all the developers get notified whenever it changes. Try and pick one that is as close as possible to an existing, widely enforced standard.

With so many definitions it can be a little daunting to pick one. There is one rule you must follow: BE CONSISTENT. While we all have preferences on whether braces should go at the end of the line or the next line, it's far more important to pick one and stick with it.

I saw a great tweet the other day from @sbastn: "I am never going to use the word team unless I really mean it. The word group seems my best replacement"

Your code is your other group member. I don't think you can even think about transitioning from a "group" to a "team" until your code is playing the game too. So, how can you get your group to start considering your code as the other team member?

Many places I've worked have won't permit code reviews to start until the code meets the style guide. "Fix it and I'll come back." Personally I like that rule, but it can frustrate some developers. Enforcing code style can be a daunting, laborious, thankless task ... trust me. Your group is on its way to becoming a team when there is no enforcement required ... everyone just does it because they see the value in it. So, following the mantra of "Automate Everything You Can" (and knowing you can't automate everything) ... there are things you can do with respect to code.

Automated Reformatting
Some groups perform nightly automated reformat of the code base using pretty printers likePyIndent, Jalopy and AStyle. I don't like this approach. If the code changes after you have checked it in, it will look foreign to you when you next look at it. These tools are best used initially to get your legacy code to a good base state.

No Commit Until Compliant
With most revision control systems you can install pre-commit hooks that can verify code compliance first before letting the code into production. I prefer this approach because it makes the developer think about the code style as part of the check-in checklist ... right up there with making sure the tests are written.

Try to Convert the Legacy Codebase in One Fell Swoop
Picking away at a code base to make it match your new style can take forever. You may need to write some custom scripts to help with this (especially when renaming files), but formatting changes are one of the few modifications you can make to a legacy code base where behavior should not change. You have to get into the new code style mindset as quickly as possible and doing this piecemeal only makes it harder.

In summary, establishing and enforcing a coding style on your project should be one of your first steps to fixing a legacy code base. Use automated tools or scripts to help with this project. If you can't automate the effort ask developers not to make the problem worse and to help clean up the mess when they're in fixing something else.

Another thing your programming style guide should address is the documentation style. We'll go over this next time, but in preparation you should read this great series of articles on the topic by Jacob Kaplan-Moss: What to Write, Technical Style and, for large doc efforts, You Need an Editor.


Tuesday, February 02, 2010

Before sitting down to Eat the Elephant ...

In 'A Tale of Two Code Bases' I described two legacy code bases and how one (Project 1) was a pleasure to work on and the other (Project 2) was a nightmare. I promised to follow up with some hints for what Project 2 could have done to make their code more manageable.

About a year ago I was looking into wind generators and solar devices for my house ... it can get cold in Nova Scotia. The store owner told me "get your house fully insulated first, then we can talk about efficiency". Great advice. The same applies for Code Management. I think "The Joel Test" is a great litmus for whether your house is insulated enough. You shouldn't be struggling with bad source control/development tools, slow development PC's or perpetually looming deadlines before you go into this. You should also have some degree of buy-in from your superiors that you can regularly dedicate a little time to making your code base better. The intention is not to upset the apple-cart, boil the ocean or reinvent the wheel (choose your analogy) ... but rather Eat the Elephant.

I don't want to rehash topics that have already been exhaustively covered, such as TDD or the particulars of writing effective code in a given programming language. Well, I don't want to rehash them in any great detail. My intention is to make these posts about Effective Code Management and not about software development methodologies. That is, not about requirements gathering, feature definition, scheduling or estimating. I feel strongly that without a solid code management foundation whatever development methodology you use will fail over time.

So, I'll assume you've read one of the hundreds of TDD books out there and at least one of the "Effective [Foo]" books out there that apply to the programming language you use daily. C++/STL, C#, Java, etc. And, because I'm going to assume you will be working with code that is already in the field, I'll also recommend Michael Feathers 2004 book "Working Effectively with Legacy Code" which gives some nice guidelines for anyone faced with an elephant-meat diet.

Next time, we'll take our first nibble with a look at coding style guides ...