Hungarian Notation, the Original

If you’ve been programming on Windows for some time, you’ll know about Hungarian Notation. It’s a convention where you write a few characters in front of every variable name telling you what’s the type of the variable. For instance, the name of a variable of type int would be prefixed with “i”, a pointer with “p”, a pointer to an integer “pi”, etc. Today, this practice is highly frowned upon, even at Microsoft, because it’s mostly useless and error prone. I’ve been surprised however to learn that this is only a bad slip from the original purpose of the Hungarian Notation as it was first invented.

Here’s a nice excerpt from Making Wrong Code Look Wrong (Joel on Software):

Apps Hungarian was extremely valuable, especially in the days of C programming where the compiler didn’t provide a very useful type system.

But then something kind of wrong happened.

The dark side took over Hungarian Notation.

Nobody seems to know why or how, but it appears that the documentation writers on the Windows team inadvertently invented what came to be known as Systems Hungarian

Somebody, somewhere, read Simonyi’s paper, where he used the word “type,” and thought he meant type, like class, like in a type system, like the type checking that the compiler does. He did not. He explained very carefully exactly what he meant by the word “type,” but it didn’t help. The damage was done.

Apps Hungarian had very useful, meaningful prefixes like “ix” to mean an index into an array, “c” to mean a count, “d” to mean the difference between two numbers (for example “dx” meant “width”), and so forth.

Systems Hungarian had far less useful prefixes like “l” for long and “ul” for “unsigned long” and “dw” for double word, which is, actually, uh, an unsigned long. In Systems Hungarian, the only thing that the prefix told you was the actual data type of the variable.

So basically, App Hungarian is useful for differentiating data that the compiler let you mix because they are of the same type but that you shouldn’t mix. Just like keeping your units with you when you write a formula makes mistakes more obvious.

It’s not just for math though.

PHP Markdown use a similar approach

While I’m not sure using a suffix can be said to be Hungarian Notation, inside PHP Markdown many string variables get one: the _re prefix on a variable denotes a whole or partial regular expression. This helps make sure that no character pass unescaped through the regular expresion parser.

The net result is that it’s easy to spot errors. For instance if I see this in PHP Markdown, I know there is a problem with the $token variable which gets embedded in a regular expression while not being converted to one (notice the abscence of a _re suffix):

preg_match('/^(.*?[^`])' . $token . '(?!`)(.*)$/sm', $str, $matches);

In this case, the solution is to call preg_quote on the token in order to make it safe to pass to the regular expression parser.

It’s interesting to note that with languages that allow you to define your own types and overload operators — such as C++, C# and D — you could define your own types and make the compiler flag any mismatch for you. The required effort for defining new types (and all their interactions) instead of using a simple naming convention may not always be worth it however.


Comments

Ronald Landheer-Cieslak

I can only disagree with you on this: Hungarian notation is dangerous and makes your code lie to you. Joel, who has some interesting things to say, is regrettably misguided about this and, if you’re starting to use some form of Hungarian, so are you.

I’ve seen (a lot of) your code and I haven’t known you to use Hungarian. I’d advise you not to start now.

See http://landheer-cieslak.com/wordpress/?p=158

Michel Fortin

Hi Ronald.

I can’t say I fancy the one or two letter prefixes. I tend to keep things more readable than that.

The real problem with App Hungarian is the names which can become cryptic sometime. But what the variable contains is what the variable name should tell, wether it’s colSelected or selectedColumn isn’t very important to me, as long as it’s clear and the notation is used consistently.

The example from Joel you deconstruct on your blog about using us for “unsafe” is a really bad example if you ask me. Defining a variable by what it is not (“safe”), not counting the fuzziness of that definition (safe for what?), is certainly not a good idea. Better to prefix the so-called “safe” variables with “html” to better express they contain HTML-formatted strings. And if you really want to be safe, build your own “html string” type.

Ronald Landheer-Cieslak

Hi Michel,

Once again, we find ourselves agreeing with each other - ain’t life grand. Code should be easy to read - it should be as close to spoken english as possible - but no closer. That’s also why I’d prefer selected_column to col_selected but the latter would be easier to read for a francophone (colonne_selectionnee).

I also agree that Joel’s “safe” vs. “unsafe” leaves a lot to be desired and I agree that, if you want real safety, you should build it into your type system - so an “html string” would be the way to go if at all possible. That would be very similar to the “sanitized” string container I spoke about on my blog - but those are details.

Like I said before, I’ve seen your code before and I know you tend to keep it readable. Some-one might read your blog and think otherwise, though :)

Have fun!

rlc


  • © 2003–2024 Michel Fortin.