In Firefox we use a custom allocator, mozjemalloc, based on a rather ancient version of jemalloc. The motivation for using a custom allocator is that it potentially gives us both performance and memory wins. I don’t know the full history, so I’ll let someone else write that up. What I do know is that we use it and it behaves a bit differently than system malloc
implementations in a rather significant way: minimum alignment.
Why does this matter? Well it turns out C runtime implementations and/or compilers make some assumptions based on what the minimum allocation size and alignment is. For example in bug 1181142 we’re looking at a crash on Windows that happens in strcmp
. The CRT decided to walk off the end of a page because it was comparing 4 bytes at a time.
Why was it doing that? Because the minimum allocation size is at least 4-bytes, so why not? If you head over to MSDN it’s spelled out somewhat clearly (although older versions of that page lack the specific byte sizes):
A fundamental alignment is an alignment that’s less than or equal to the largest alignment that’s supported by the implementation without an alignment specification. (In Visual C++, this is the alignment that’s required for a double, or 8 bytes. In code that targets 64-bit platforms, it’s 16 bytes.)
We’ve had similar issues on Linux (and maybe OS X), see bug 691003 for more historical details.
As it turns out we’re still not exactly in compliance in Linux which seems to stipulate 8-byte alignment on 32-bit and 16-byte alignment on 64-bit:
The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems).
We haven’t seen a compelling reason to go up to a 8-byte alignment on 32-bit platforms (in the form of crashes) but perhaps that’s due to Linux being such a small percentage of our users.
And lets not forget about OS X, which as far as I can tell has always had a 16-byte alignment minimum. I can’t find where that’s spelled out in bytes, but go bang on malloc
and you’ll always get a 16-byte aligned thing. My guess is this is a leftover from the PPC days and altivec. From the malloc
man page for OS X:
The allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types.
Again we haven’t seen crashes pointing to the lack of 16-byte alignment, again perhaps that’s because OS X is also a small percentage of our users. On the other hand maybe this is just an optimization but not an outright requirement.
So what happens when we do the right thing? Odds are less crashes which is good. Maybe more memory usage (you ask for a 1-byte thing on 64-bit Windows you’re going to get a 16-byte thing back), although early testing hasn’t shown a huge impact. Perf-wise there might be a win, with guaranteed minimum sizes we can compare things a bit quicker (4, 8, 16 bytes at a time).
Code generally shouldn’t be relying on these facts; I’d consider such a CRT operation a bug in the CRT – nothing requires that a string passed to a CRT function was allocated in the first place. Or even that it’s in the heap. What if the compiler packed it into the last bytes of read-only data space in a page? Or if it came from a driver? Or from a binary library?
Maybe the reason we’re not seeing them in Mac or Linux is that it’s not a bug… which isn’t to say you can’t accidentally write code that’s sensitive to allocation alignments; you certain can (with some trouble). But in that case the code is the bug, not the allocation minimum size or alignment.
I think the hypothesis is that the compiler knew this was heap allocated memory and inlined a “fast” version of
strcmp
because of alignment guarantees.FWIW we did see crashes on Linux (and possibly OS X) and bumped the minimum allocation size in bug 691003.