About thirty minutes after I finished the last post on the SHA-1 implementation I got to the part that required an MD4 implementation and started to get right to work.

Coding Like it’s 1999

If you’re not already aware, MD4 is a Message Digest algorithm. The way they work is that they take in a set of bytes and return a constant amount of different bytes (the hash). When two sets of bytes result in the same hash, that’s referred to as a collision. It’s expected that there will be collisions to some degree, especially when you’re hashing large files down to 16-20 bytes. The problem is when collisions become predictable and controllable. Even with it’s known issues, SHA-1 still gets around. MD4 less so. Known and efficient attacks on MD4 have been around long enough that the algorithm tends to be avoided. Unfortunately that also means there are a lot less reference implementations.

The false start

While implementing the SHA-1 digest, it was relatively easy to take the pseudocode from wikipedia and get it running with minimal conversion changes. So with no basic pseudocode representation I started with the C reference included in the RFC. This was a terrible idea. Pointers, global variables, assumptions on the processor’s endianness. By the time I got it compiled it was a giant nightmarish mess that was not working and I couldn’t understand enough to debug it.

At that point it was a lot easier to wipe everything out and just implement it from the actual RFC description. Coming from it at that angle, it was a lot clearer what code could be cribbed directly from the previous implementation. There were a few typos by the time I finished, but by swapping in reference code from the Apache Mina project I was able to get a working version without much further anguish.

Differences in implementation

This version didn’t make use of the Bytes class from the SHA-1 implementation as I’d been using Arrays all over the place to try to keep in line with the C version at first.

This is also my first time having a genuine use for a partial function (roundX), although that was added after everything was already working.