Memory Map of Linux Process

we will explore how memory address is mapped for a process. As we know process is set of instruction grouped together to perform certain task. The task could be anything from multiplying two numbers to face recognition software.

let’s take a simple program which prints string into console.

#include <iostream>
using namespace std;
int main() {
    cout<<"welcome to codeexpose.com"<<endl;
}

Let’s compile the code using gcc and try to understand the machine code it generates. The output file has a ELF header which stands for Extensible Linking Format. It provides metadata about the machine code such as it runs in 64 bit CPU,  Big/Little endian and so on.

How to see ELF Header ?

We can make use of readelf tool which comes pre installed in ubuntu machines. It dumps all the information about the binary such as ELF header, program header, section header, symbol table etc..

run readelf -a “binary” and check the top section of its output. Below screenshot shows header of my simple program.

The GCC understands the ELF header like class, data based on the magic number. In fact if we try to dump the binary using hexdump we would able to see the magic number embedded in the first 16 bytes of the binary file. So lets understand what exactly each byte in the magic number represents

“7f” is a placeholder or starting byte of the binary file.

“45 4c 46” hexadecimal to ascii conversion result to “ELF”

the next byte “02” represents its 64 bit instruction set and “01” represent 32 bit instruction set

the next byte “01” represent its little endian and “02” represents big endian. This will be used by the cpu to interpret the remaining part of the code.


Enough about header and Lets see how the memory is mapped for a process. Since the simple program is ending quickly we won’t be able to view the memory maps, so let’s add sleep in the program to make it prolong in memory

sleep(500);

Once the binary is running, we need to find pid of the executed process. We can either make use of ps or pgrep to find the pid. pgrep is easy to use and prints only pid of the process. so run pgrep “binary” command and find out the pid. Using pid we can see the memory maps of the process. In linux it is available in the location /proc/<pid>/maps. Below screenshot shows the memory map of my simple program.

As you see memory allocated for each sections are

heap memory address                         01d12000-01d44000

Stack memory address                        7ffcd025b000-7ffcd027c000

Text section address                            00400000-00401000

Static Data section address                 00600000-00601000

Non static Data section address          00601000-00602000

We may be wondering how to figure the text section and data section. This can be easily deduced using the permission it has. If it has read and execute(r-xp) then it is text section of code. If it has only read (r–p ) then it is static and constant data. If it has read and write(rw-p) then it is global data section which is modifiable.

Memory map also have additional address allocated for dynamic libraries and kernel helpers. Dynamic library is either shared and not shared based on the gcc compiling option. If we compiled with position independent code “FPIC” then it shares the dynamic library with all daemon otherwise it is not shared. So based on the behavior of library the address is allocated.

Now we know the addresses of each section. lets visualize how the memory is organized for a process.

7ffcd027c000
                 ################## STACK SECTION ####################
7ffcd025b000


                 ########## BOTH GROWS TO MEET####################                       



01d44000
                   ################## HEAP SECTION####################
01d12000
00602000
 
                   #############GLOBAL DATA SECTION###################

00601000
00601000
                     
 ############## CONSTANT DATA SECTION###############  

00600000
00401000
                  ##############CODE SECTION #########################

00400000

As you see both stack and heap grows opposite to catch up the memory it needs. We may wonder why there is such a memory gap between code section and data section. It is intentional by the GCC linker in order to avoid prediction of data section address.

More reference: https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/

Feel free to suggest or comment your thoughts in comment section.

Leave a Reply

Your email address will not be published. Required fields are marked *