1 1. Intel(R) MPX Overview
2 ========================
4 Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
5 introduced into Intel Architecture. Intel MPX provides hardware features
6 that can be used in conjunction with compiler changes to check memory
7 references, for those references whose compile-time normal intentions are
8 usurped at runtime due to buffer overflow or underflow.
10 For more information, please refer to Intel(R) Architecture Instruction
11 Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
14 Note: Currently no hardware with MPX ISA is available but it is always
15 possible to use SDE (Intel(R) Software Development Emulator) instead, which
16 can be downloaded from
17 http://software.intel.com/en-us/articles/intel-software-development-emulator
20 2. How to get the advantage of MPX
21 ==================================
23 For MPX to work, changes are required in the kernel, binutils and compiler.
24 No source changes are required for applications, just a recompile.
26 There are a lot of moving parts of this to all work right. The following
27 is how we expect the compiler, application and kernel to work together.
29 1) Application developer compiles with -fmpx. The compiler will add the
30 instrumentation as well as some setup code called early after the app
31 starts. New instruction prefixes are noops for old CPUs.
32 2) That setup code allocates (virtual) space for the "bounds directory",
33 points the "bndcfgu" register to the directory (must also set the valid
34 bit) and notifies the kernel (via the new prctl(PR_MPX_ENABLE_MANAGEMENT))
35 that the app will be using MPX. The app must be careful not to access
36 the bounds tables between the time when it populates "bndcfgu" and
37 when it calls the prctl(). This might be hard to guarantee if the app
38 is compiled with MPX. You can add "__attribute__((bnd_legacy))" to
39 the function to disable MPX instrumentation to help guarantee this.
40 Also be careful not to call out to any other code which might be
42 3) The kernel detects that the CPU has MPX, allows the new prctl() to
43 succeed, and notes the location of the bounds directory. Userspace is
44 expected to keep the bounds directory at that locationWe note it
45 instead of reading it each time because the 'xsave' operation needed
46 to access the bounds directory register is an expensive operation.
47 4) If the application needs to spill bounds out of the 4 registers, it
48 issues a bndstx instruction. Since the bounds directory is empty at
49 this point, a bounds fault (#BR) is raised, the kernel allocates a
50 bounds table (in the user address space) and makes the relevant entry
51 in the bounds directory point to the new table.
52 5) If the application violates the bounds specified in the bounds registers,
53 a separate kind of #BR is raised which will deliver a signal with
54 information about the violation in the 'struct siginfo'.
55 6) Whenever memory is freed, we know that it can no longer contain valid
56 pointers, and we attempt to free the associated space in the bounds
57 tables. If an entire table becomes unused, we will attempt to free
58 the table and remove the entry in the directory.
60 To summarize, there are essentially three things interacting here:
63 * enables annotation of code with MPX instructions and prefixes
64 * inserts code early in the application to call in to the "gcc runtime"
66 * Checks for hardware MPX support in cpuid leaf
67 * allocates virtual space for the bounds directory (malloc() essentially)
68 * points the hardware BNDCFGU register at the directory
69 * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
70 start managing the bounds directories
72 * Checks for hardware MPX support in cpuid leaf
73 * Handles #BR exceptions and sends SIGSEGV to the app when it violates
74 bounds, like during a buffer overflow.
75 * When bounds are spilled in to an unallocated bounds table, the kernel
76 notices in the #BR exception, allocates the virtual space, then
77 updates the bounds directory to point to the new table. It keeps
78 special track of the memory with a VM_MPX flag.
79 * Frees unused bounds tables at the time that the memory they described
83 3. How does MPX kernel code work
84 ================================
86 Handling #BR faults caused by MPX
87 ---------------------------------
89 When MPX is enabled, there are 2 new situations that can generate
91 * new bounds tables (BT) need to be allocated to save bounds.
92 * bounds violation caused by MPX instructions.
94 We hook #BR handler to handle these two new situations.
96 On-demand kernel allocation of bounds tables
97 --------------------------------------------
99 MPX only has 4 hardware registers for storing bounds information. If
100 MPX-enabled code needs more than these 4 registers, it needs to spill
101 them somewhere. It has two special instructions for this which allow
102 the bounds to be moved between the bounds registers and some new "bounds
105 #BR exceptions are a new class of exceptions just for MPX. They are
106 similar conceptually to a page fault and will be raised by the MPX
107 hardware during both bounds violations or when the tables are not
108 present. The kernel handles those #BR exceptions for not-present tables
109 by carving the space out of the normal processes address space and then
110 pointing the bounds-directory over to it.
112 The tables need to be accessed and controlled by userspace because
113 the instructions for moving bounds in and out of them are extremely
114 frequent. They potentially happen every time a register points to
115 memory. Any direct kernel involvement (like a syscall) to access the
116 tables would obviously destroy performance.
118 Why not do this in userspace? MPX does not strictly require anything in
119 the kernel. It can theoretically be done completely from userspace. Here
120 are a few ways this could be done. We don't think any of them are practical
121 in the real-world, but here they are.
123 Q: Can virtual space simply be reserved for the bounds tables so that we
124 never have to allocate them?
125 A: MPX-enabled application will possibly create a lot of bounds tables in
126 process address space to save bounds information. These tables can take
127 up huge swaths of memory (as much as 80% of the memory on the system)
128 even if we clean them up aggressively. In the worst-case scenario, the
129 tables can be 4x the size of the data structure being tracked. IOW, a
130 1-page structure can require 4 bounds-table pages. An X-GB virtual
131 area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
132 If we were to preallocate them for the 128TB of user virtual address
133 space, we would need to reserve 512TB+2GB, which is larger than the
134 entire virtual address space today. This means they can not be reserved
135 ahead of time. Also, a single process's pre-popualated bounds directory
136 consumes 2GB of virtual *AND* physical memory. IOW, it's completely
137 infeasible to prepopulate bounds directories.
139 Q: Can we preallocate bounds table space at the same time memory is
140 allocated which might contain pointers that might eventually need
142 A: This would work if we could hook the site of each and every memory
143 allocation syscall. This can be done for small, constrained applications.
144 But, it isn't practical at a larger scale since a given app has no
145 way of controlling how all the parts of the app might allocate memory
146 (think libraries). The kernel is really the only place to intercept
149 Q: Could a bounds fault be handed to userspace and the tables allocated
150 there in a signal handler intead of in the kernel?
151 A: mmap() is not on the list of safe async handler functions and even
152 if mmap() would work it still requires locking or nasty tricks to
153 keep track of the allocation state there.
155 Having ruled out all of the userspace-only approaches for managing
156 bounds tables that we could think of, we create them on demand in
159 Decoding MPX instructions
160 -------------------------
162 If a #BR is generated due to a bounds violation caused by MPX.
163 We need to decode MPX instructions to get violation address and
164 set this address into extended struct siginfo.
166 The _sigfault feild of struct siginfo is extended as follow:
168 87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
170 89 void __user *_addr; /* faulting insn/memory ref. */
171 90 #ifdef __ARCH_SI_TRAPNO
172 91 int _trapno; /* TRAP # which caused the signal */
174 93 short _addr_lsb; /* LSB of the reported address */
176 95 void __user *_lower;
177 96 void __user *_upper;
181 The '_addr' field refers to violation address, and new '_addr_and'
182 field refers to the upper/lower bounds when a #BR is caused.
184 Glibc will be also updated to support this new siginfo. So user
185 can get violation address and bounds when bounds violations occur.
187 Cleanup unused bounds tables
188 ----------------------------
190 When a BNDSTX instruction attempts to save bounds to a bounds directory
191 entry marked as invalid, a #BR is generated. This is an indication that
192 no bounds table exists for this entry. In this case the fault handler
193 will allocate a new bounds table on demand.
195 Since the kernel allocated those tables on-demand without userspace
196 knowledge, it is also responsible for freeing them when the associated
199 Here, the solution for this issue is to hook do_munmap() to check
200 whether one process is MPX enabled. If yes, those bounds tables covered
201 in the virtual address region which is being unmapped will be freed also.
203 Adding new prctl commands
204 -------------------------
206 Two new prctl commands are added to enable and disable MPX bounds tables
207 management in kernel.
209 155 #define PR_MPX_ENABLE_MANAGEMENT 43
210 156 #define PR_MPX_DISABLE_MANAGEMENT 44
212 Runtime library in userspace is responsible for allocation of bounds
213 directory. So kernel have to use XSAVE instruction to get the base
214 of bounds directory from BNDCFG register.
216 But XSAVE is expected to be very expensive. In order to do performance
217 optimization, we have to get the base of bounds directory and save it
218 into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
225 1) If userspace is requesting help from the kernel to do the management
226 of bounds tables, it may not create or modify entries in the bounds directory.
228 Certainly users can allocate bounds tables and forcibly point the bounds
229 directory at them through XSAVE instruction, and then set valid bit
230 of bounds entry to have this entry valid. But, the kernel will decline
231 to assist in managing these tables.
233 2) Userspace may not take multiple bounds directory entries and point
234 them at the same bounds table.
236 This is allowed architecturally. See more information "Intel(R) Architecture
237 Instruction Set Extensions Programming Reference" (9.3.4).
239 However, if users did this, the kernel might be fooled in to unmaping an
240 in-use bounds table since it does not recognize sharing.