The Portable Executable (PE) format is a file format for executables, object code, and DLLs, used in 32-bit and 64-bit versions of Windows operating systems. The term "portable" refers to the format's versatility in numerous environments of operating system software architecture. The PE format is basically a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, OBJ, SYS (device driver), and other file types. The Extensible Firmware Interface specification states that PE is the standard executable format in EFI environments.
PE is a modified version of the Unix COFF file format. PE/COFF is an alternative term in Windows development.
On Windows NT operating systems, PE currently supports the IA-32, IA-64, and x86-64 (AMD64/Intel64) instruction set architectures. Prior to Windows 2000, Windows NT (and thus PE) supported the MIPS, DEC Alpha, and PowerPC instruction set architectures. Because PE is used on Windows CE, it continues to support several variants of the MIPS architecture, ARM (including Thumb), and SuperH instruction set architectures.
A PE file consists of a number of headers and sections that tell the dynamic linker how to map the file into memory. An executable image consists of several different regions, each of which require different memory protection; so the start of each section must be aligned to a page boundary. For instance, typically the .text section (which holds program code) is mapped as execute/readonly, and the .data section (holding global variables) is mapped as no-execute/readwrite. However, to avoid wasting space, the different sections are not page aligned on disk. Part of the job of the dynamic linker is to map each section individually and assign the correct permissions to the resulting regions, according to the instructions found in the headers.
One section of note is the import address table (IAT), which is used as a lookup table when the application is calling a Windows API function. Because a compiled program cannot know the memory location of the libraries it depends upon, an indirect jump is required whenever an API call is made. As the dynamic linker loads modules and joins them together, it writes jump instructions into the IAT slots, so that they point to the memory locations of the corresponding library functions. Though this adds an extra jump over the cost of an intra-module call, the performance hit is mostly negligible and easily worth the flexibility of dynamic libraries. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimised code that simply results in an indirect call opcode.
PE files do not contain position-independent code. Instead they are compiled to a preferred base address, and all addresses emitted by the compiler/linker are fixed ahead of time. If a PE file cannot be loaded at its preferred address (because it's already taken by something else), the operating system will rebase it. This involves recalculating every absolute address and modifying the code to use the new values. The loader does this by comparing the preferred and actual load addresses, and calculating a delta value. This is then added to the preferred address to come up with the new address of the memory location. Base relocations are stored in a list and added, as needed, to an existing memory location. The resulting code is now private to the process and no longer shareable, so many of the memory saving benefits of DLLs are lost in this scenario. It also slows down loading of the module significantly. For this reason rebasing is to be avoided wherever possible, and the DLLs shipped by Microsoft have base addresses pre-computed so as not to overlap. In the no rebase case PE therefore has the advantage of very efficient code, but in the presence of rebasing the memory usage hit can be expensive. Contrast this with ELF which uses fully position independent code and a global offset table, which trades off execution time against memory usage in favour of the latter.