Custom Memory Allocator
Building a vector using static memory.
When Leetcoding, I always have to make decision on whether to use
std::vector
orstd::array
. For starters,std::vector
uses heap (dynamic memory allocation) whilestd::array
uses stack (static memory allocation). Accessing static memory is significantly faster than accessing dynamic memory. The drawback is that static memory needs to be pre-initialized, and there is a limit to how much stack memory we can use.“When to use
std::vector
vsstd::array
in LeetCode?” is a question for another post. – The short answer is, we preferstd::array
whenever we would want to use a vector and storing a “small” amount of data.For this post, we focus on how to build a custom memory allocator.
With a custom allocator, we will have a
std::vector
with automatic balancing between stack and heap memory allocation so that I can always usestd::vector
’s interface without having to work with preallocatedstd::array
.
std::vector
always uses heap allocation by default.
See the complete code example for this section here.
Take a look at this example, how many bytes have we allocated and deallocated in total?
std::vector<int> v; //0
v.push_back(1); //1
v.push_back(1); //2
v.push_back(1); //3
v.push_back(1); //4
v.push_back(1); //5
v.push_back(1); //6
At a first glance, we created a vector and pushed 6 int
s into it. It should be 6*sizeof(int)==24
bytes right?
- No, the exact allocation is more complicated because of the vector resize behavior. However, if call
vector::reserve(6)
at the beginning we can avoid the resizing atpush_back
and find that we allocated exactly24 bytes
.- Assuming the vector itself is created on the stack, so
line 0
should takes 0 allocation. - By default,
v
has 0 capacity, so vector has to make space for anint
online 1
, which issizeof(int)==4
bytes. - After
line 2
, we’ll need 8 bytes to store 2int
’s so resize kicks in and expanded the size to be 2. –vector::resize
will first create a new container of size4 bytes
than copy the original2 bytes
in there, and destroy the original2 bytes
. Hence, an allocation of4 bytes
and a deallocation happens here. - Exactly how much the vector resize expands its capacity depends on its resize factor, which is
2
for the implementations I’m testing. - I’ll skip the explanation for
line 3-6
here as they behave accordingly.v.push_back(1); //1 - allocate 4 bytes v.push_back(1); //2 - resize: deallocate 4 bytes and allocate 8 bytes v.push_back(1); //3 - resize: deallocate 8 bytes and allocate 16 bytes v.push_back(1); //4 - 4 ints (16 bytes) can fit in 16 bytes, nothing happens v.push_back(1); //5 - resize: deallocate 16 bytes, and allocate 32 bytes v.push_back(1); //6 - 6 ints (24 bytes) can fit in 32 bytes, nothing happens
To observe the exact allocation, we can override the global
new()
anddelete()
operator like this:void * operator new(size_t size) { cout<< "Allocating " << size << "bytes" << endl; void * p = malloc(size); return p; } void operator delete(void * p) { cout<< "Deallocating" << endl; free(p); }
- Assuming the vector itself is created on the stack, so
Creating a custom allocator.
See the complete code example for this section here.
To create a custom allocator, we need to satisfy the Allocator Completeness requirements.
The most minimum functions we need to provide for a template T
are:
allocate -> T*
deallocate -> void
- copy, copy-assign, move, move-assign constructors do not throw.
operator==
andoperator!=
Obviously, the most important parts are allocate/deallocate
.
In my example
of ShortAlloc
, we called allocate/deallocate
functions from Arena
which manages the memory resource.
- Arena maintains a internal stack
buffer_
a plain old byte array allocated on the stack.- a pointer
ptr_
to the internal stack to note where to allocate new objects into - If the stack memory runs out, we simply use the global
new
operator to allocate from heap. All the buzz about alignment is absolutely required, but they just try to align the allocation so that each new allocation is aligned tostd::max_align_t
for faster cpu cache access.
- a pointer
Prefer std::pmr::polymorphic_allocator
over custom allocator mentioned above.
See the complete code example for this section here.
ShortAlloc
and Arena
can be used for custom memory allocation because they provide the functions required by Allocator Completeness requirements.
The drawback to this – the custom allocator type name becomes part of the vector using it, so we cannot easily use it without knowing the exact allocator type at compile time.
- e.g. if a functions takes in a
std::vector
as input, we cannot pass an input of typevector<DataType, ShortAlloc<...>>
to it asstd::vector
with default allocator is a different type. - The solution is to use
std::pmr::vector
, which achieves run time polymorphism by allowingstd::pmr::vector
using different allocators.
To create a custom type ofstd::pmr::memory_resource
, we implement these interfaces in our implementationMyStackResource
void* do_allocate(std::size_t bytes, std::size_t alignment) override; void do_deallocate(void *p, size_t bytes, size_t alignment) override; bool do_is_equal(const memory_resource &other) const noexcept override;
The implementation detail of
MyStackResource
is almost the same as that ofArena
’s implementation, except for that this time we are aligning objects with it’s actual size instead ofstd::max_align_t
.
Testing performance.
See here to compile and run my entire test.
As mentioned in the Notes section, performance is NOT a most important reason for using custom allocator. – I wanted to create a test scenario to illustrate the point.
Here’s the scenario,
We want to maintain a cache of MANY vectors and each of them is guaranteed to have a SMALL size.
With my custom allocators that uses stack memory, we can avoid having to allocate to heap at runtime, as long as all the vectors and their data can fit into the stack memory.
Here are the results for creating 1562500
vectors with size of 5
.
Allocator Types | Init | Iterate | Destruct |
---|---|---|---|
ShortAlloc | 1 ms | 1 ms | 0 ms |
MyStackResource (pmr) | 2 ms | 1 ms | 0 ms |
std default heap | 39 ms | 2 ms | 25 ms |
Init
is measuring initializing all the vectors.Iterate
is looping through each element in each vector.Destruct
is destroying the vectors when scope ends. Obviously, the std heap allocation had much overhead over static allocation, but again, the scenario is not that realistic.
Notes
- While researching for this blog, I found that performance is not the most important reason for using custom memory allocator. Taming dynamic memory - An introduction to custom allocators in C++ - Andreas Weis - code::dive 2018
- We use custom memory allocators to avoid the common problems of default
std::allocator
like:- Memory coalescing
- Memory fragmentation
- Shared global state – all allocation uses the same allocator.
- e.g. a potential lock on a global allocator.
- For performance reasons, we have to go case by case to evaluate whether a custom allocator (or custom
pmr::memory_resource
) is indeed needed.
- We use custom memory allocators to avoid the common problems of default
- My example of a “Short Vector” is a solution looking for a problem. From this discussion, there are existing non-standard solution like
boost::container::small_vector
that works better if you really need a Small Vector Optimization solution.