Eric Rahm

Minimum alignment of allocation across platforms

In Firefox we use a custom allocator, mozjemalloc, based on a rather ancient version of jemalloc. The motivation for using a custom allocator is that it potentially gives us both performance and memory wins. I don’t know the full history, so I’ll let someone else write that up. What I do know is that we use it and it behaves a bit differently than system malloc implementations in a rather significant way: minimum alignment.

Why does this matter? Well it turns out C runtime implementations and/or compilers make some assumptions based on what the minimum allocation size and alignment is. For example in bug 1181142 we’re looking at a crash on Windows that happens in strcmp. The CRT decided to walk off the end of a page because it was comparing 4 bytes at a time.

Crossing the page boundary.

Why was it doing that? Because the minimum allocation size is at least 4-bytes, so why not? If you head over to MSDN it’s spelled out somewhat clearly (although older versions of that page lack the specific byte sizes):

A fundamental alignment is an alignment that’s less than or equal to the largest alignment that’s supported by the implementation without an alignment specification. (In Visual C++, this is the alignment that’s required for a double, or 8 bytes. In code that targets 64-bit platforms, it’s 16 bytes.)

We’ve had similar issues on Linux (and maybe OS X), see bug 691003 for more historical details.

As it turns out we’re still not exactly in compliance in Linux which seems to stipulate 8-byte alignment on 32-bit and 16-byte alignment on 64-bit:

The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems).

We haven’t seen a compelling reason to go up to a 8-byte alignment on 32-bit platforms (in the form of crashes) but perhaps that’s due to Linux being such a small percentage of our users.

And lets not forget about OS X, which as far as I can tell has always had a 16-byte alignment minimum. I can’t find where that’s spelled out in bytes, but go bang on malloc and you’ll always get a 16-byte aligned thing. My guess is this is a leftover from the PPC days and altivec. From the malloc man page for OS X:

The allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types.

Again we haven’t seen crashes pointing to the lack of 16-byte alignment, again perhaps that’s because OS X is also a small percentage of our users. On the other hand maybe this is just an optimization but not an outright requirement.

So what happens when we do the right thing? Odds are less crashes which is good. Maybe more memory usage (you ask for a 1-byte thing on 64-bit Windows you’re going to get a 16-byte thing back), although early testing hasn’t shown a huge impact. Perf-wise there might be a win, with guaranteed minimum sizes we can compare things a bit quicker (4, 8, 16 bytes at a time).