What is the difference between Stack and Heap in terms of memory allocation? When should we use which to allocate our memory?

This is an extremely important distinction to understand. We first discuss the difference while ignoring virtual memory, then discuss how the answer changes in the presence of virtual memory.

Stack allocation occurs when the programmer defines local variables. For instance:

void f() {
    int a;  // a could be allocated on the stack
    char b[65536];  // b is allocated on the stack
    ....
}

In this case, the variable a is allocated on the stack. However, unless the programmer takes the address of a, as in &a, it's likely that no stack space is used for this variable - its value is kept in a register, where it can be accessed the fastest.

However, the compiler reserves space for the array b on the stack frame of the function. When the function is called, this stack frame is allocated. Allocating a stack frame is done by ``claiming'' the space, which involves decrementing the stack pointer:

	subq	$65424, %rsp

Note that's it - no function call, not even an instruction that writes anything to memory - stack allocation happens entirely inside the CPU by adjusting the value of a register.

When the function returns, the compiler inserts code to deallocate the memory so allocated by adjusting the stack pointer upward:

	addq	$65424, %rsp
	ret

By contrast, heap allocation involves calling into the user-level memory allocator:

void f() {
    char *b = malloc(65536);    // b refers to heap-allocated space
    ....
}

which results in

	movl	$65536, %edi
	call	malloc

Then, the user-level allocator's logic is executed as you are implementing it in your project: the allocator will try to find a suitable block on the heap, possibly extending the heap if necessary, update its data structures, etc. etc. This is a cost that can easily be tens or hundreds of cycles, depending on which path is taken.

From the discussion so far, it follows that stack allocation/deallocation is vastly faster in terms of throughput than heap allocation/deallocation.

Because stack-allocated memory does not survive when the function performing the allocation returns, pointers to such memory cannot be stored in any places where they could be dereferenced later. Hence some situations do require heap allocation.

Another drawback of stack allocation is that it is strictly nested. Local variables are not deallocated until a function returns, which does not happen until after all the functions called by that function have returned. This can introduce lag which can be fixed by splitting the code into separate functions.

Adding Virtual Memory§

Except in small embedded devices, most commonly used environments exploit demand-paged virtual memory. In these environments, the OS provides each process with its own virtual address space. Address spaces consist of typically 32 to about 1000 consecutive ranges of virtual addresses, each of which is divided into pages. Page sizes are typically 4KB, but where appropriate 2MB or 1GB pages may be used.

Each page is either present or not. If a page is present, then the OS has allocated physical memory, and there's an active page table entry available to the MMU. When the page is accessed, this PTE is cached in the TLB. Only memory in present pages can be accessed.

Physical pages are allocated lazily by the OS. In the case of stack allocated pages, the allocation step addq $65424, %rsp does not allocate any memory. Only subsequent accesses do (and they cause page faults). Similarly, in the case of heap allocation, the first access to a page will cause a page fault - typically, this may be an access to update free lists, or read (follow) list pointers.

For both stack and heap allocation, when large, multi-page ranges are allocated (such as the 64KB array of chars in the example), the pages spanning the payload will not be paged in until elements of the array are being accessed.

Therefore, from a virtual memory perspective, the underlying cost for both is very similar.

Other Considerations§

To defend against inadvertent infinite recursion, most common environments impose a limit on how much virtual address space can be allocated on the stack. Run ulimit -s to see the limit in KB; typical values are 8192, corresponding to about 8MB for the main thread. When writing code that makes extensive use of stack allocation you may need to increase this limit.

Finally, for applications where you have a choice between using recursive approaches (which consume more stack space) and iterative approaches (which consume less stack space but may require more heap accesses), careful benchmarking and engineering is necessary, rather than assuming on is better than the other. See Engineering DFS-based Graph Algorithms by Mehlhorn, Naeher, and Sanders for a comparison for DFS on graphs.