Skip to content

Data Models

In 32-bit programs, pointers and data types such as integers generally have the same length. This is not necessarily true on 64-bit machines. Mixing data types in programming languages such as C and its descendants such as C++ and Objective-C may thus work on 32-bit implementations but not on 64-bit implementations.

Preamble#

对于 GCC/Clang 预处理器定义的一些宏,在 Dump Compiler Options 中有介绍如何 dump compiler predefined macros。

可执行 cpp -dM /gcc -E -dM 预处理命令 dump Macros:

dump prededined macros
$ echo | cpp -dM
$ echo | gcc -x c -E -dM -
$ echo | g++ -x c++ -E -dM -

# IA32/IA64
$ llvm-gcc -x c -E -dM -arch i386 /dev/null
$ llvm-gcc -x c -E -dM -arch x86_64 /dev/null

# ARM32/ARM64
$ llvm-gcc -x c -E -dM -arch armv7s /dev/null
$ llvm-gcc -x c -E -dM -arch arm64 /dev/null

例如:Numeric limits - <limits.h> 中定义了 CHAR_BIT

  • CHAR_BIT: number of bits in a byte(macro constant)

在 macOS 和 rpi4b-ubuntu 下执行预处理命令,grep 过滤打印出 __CHAR_BIT__(其值均为 8)。

# macOS
llvm-gcc -x c -E -dM -arch i386 /dev/null | grep "__CHAR_BIT__"
llvm-gcc -x c -E -dM -arch x86_64 /dev/null | grep "__CHAR_BIT__"
llvm-gcc -x c -E -dM -arch armv7s /dev/null | grep "__CHAR_BIT__"
llvm-gcc -x c -E -dM -arch arm64 /dev/null | grep "__CHAR_BIT__"

# rpi4b-ubuntu
echo | cpp -dM | grep "__CHAR_BIT__"
gcc -x c -E -dM /dev/null | grep "__CHAR_BIT__"

在 IA32/ARM32 位平台上,通常会预定义宏 _ILP32__ILP32__ 值为 1;
在 IA64/ARM64 位平台上,通常会预定义宏 _LP64__LP64__ 值为 1。

ILP32LP64 是两组不同的数据模型(Data Model),决定了平台是 32 位还是 64 位,进而影响 long、long long 和 pointer 的宽度。

涉及到机器字长并决定数据模型相关的宏:

  • __SIZEOF_LONG__ @intel
  • __SIZEOF_POINTER__ / __POINTER_WIDTH__

4 个 Standard Variants 相关的宏:

  1. __WCHAR_TYPE__, __WCHAR_WIDTH__, __SIZEOF_WCHAR_T__
  2. __SIZE_TYPE__, __SIZE_WIDTH__, __SIZEOF_SIZE_T__
  3. __INTPTR_TYPE__, __INTPTR_WIDTH__; __UINTPTR_TYPE__, __UINTPTR_WIDTH__
  4. __PTRDIFF_TYPE__, __PTRDIFF_WIDTH__

以下为 mbpa2991-macOS/arm64 和 rpi4b-ubuntu/aarch64 下 dump 过滤出来的相关宏。

macros varying by data model
$ llvm-gcc -x c -E -dM -arch i386 /dev/null | grep -E "LP32|LP64|__SIZEOF_LONG__|__SIZEOF_POINTER__|__WCHAR_TYPE__|__SIZE_TYPE__|__INTPTR_TYPE__|__UINTPTR_TYPE__|__PTRDIFF_TYPE__"
#define _ILP32 1
#define __ILP32__ 1
#define __INTPTR_TYPE__ long int
#define __PTRDIFF_TYPE__ int
#define __SIZEOF_LONG__ 4
#define __SIZEOF_POINTER__ 4
#define __SIZE_TYPE__ long unsigned int
#define __UINTPTR_TYPE__ long unsigned int
#define __WCHAR_TYPE__ int

$ llvm-gcc -x c -E -dM -arch x86_64 /dev/null | grep -E "LP32|LP64|__SIZEOF_LONG__|__SIZEOF_POINTER__|__WCHAR_TYPE__|__SIZE_TYPE__|__INTPTR_TYPE__|__UINTPTR_TYPE__|__PTRDIFF_TYPE__"
#define _LP64 1
#define __INTPTR_TYPE__ long int
#define __LP64__ 1
#define __PTRDIFF_TYPE__ long int
#define __SIZEOF_LONG__ 8
#define __SIZEOF_POINTER__ 8
#define __SIZE_TYPE__ long unsigned int
#define __UINTPTR_TYPE__ long unsigned int
#define __WCHAR_TYPE__ int

$ llvm-gcc -x c -E -dM -arch armv7s /dev/null | grep -E "LP32|LP64|__SIZEOF_LONG__|__SIZEOF_POINTER__|__WCHAR_TYPE__|__SIZE_TYPE__|__INTPTR_TYPE__|__UINTPTR_TYPE__|__PTRDIFF_TYPE__"
#define _ILP32 1
#define __ILP32__ 1
#define __INTPTR_TYPE__ long int
#define __PTRDIFF_TYPE__ int
#define __SIZEOF_LONG__ 4
#define __SIZEOF_POINTER__ 4
#define __SIZE_TYPE__ long unsigned int
#define __UINTPTR_TYPE__ long unsigned int
#define __WCHAR_TYPE__ int

$ llvm-gcc -x c -E -dM -arch arm64 /dev/null | grep -E "LP32|LP64|__SIZEOF_LONG__|__SIZEOF_POINTER__|__WCHAR_TYPE__|__SIZE_TYPE__|__INTPTR_TYPE__|__UINTPTR_TYPE__|__PTRDIFF_TYPE__"
#define _LP64 1
#define __INTPTR_TYPE__ long int
#define __LP64__ 1
#define __PTRDIFF_TYPE__ long int
#define __SIZEOF_LONG__ 8
#define __SIZEOF_POINTER__ 8
#define __SIZE_TYPE__ long unsigned int
#define __UINTPTR_TYPE__ long unsigned int
#define __WCHAR_TYPE__ int

# rpi4b-ubuntu/aarch64
$ gcc -x c -E -dM /dev/null | grep -E "LP32|LP64|__SIZEOF_LONG__|__SIZEOF_POINTER__|__WCHAR_TYPE__|__SIZE_TYPE__|__INTPTR_TYPE__|__UINTPTR_TYPE__|__PTRDIFF_TYPE__"
#define __SIZEOF_LONG__ 8
#define __SIZEOF_POINTER__ 8
#define __LP64__ 1
#define __SIZE_TYPE__ long unsigned int
#define __INTPTR_TYPE__ long int
#define __WCHAR_TYPE__ unsigned int
#define _LP64 1
#define __PTRDIFF_TYPE__ long int
#define __UINTPTR_TYPE__ long unsigned int

Concept#

64-bit data models

In many programming environments for C and C-derived languages on 64-bit machines, int variables are still 32 bits wide, but long integers and pointers are 64 bits wide. These are described as having an LP64 data model, which is an abbreviation of "Long, Pointer, 64". Other models are the ILP64 data model in which all three data types are 64 bits wide, and even the SILP64 model where short integers are also 64 bits wide. However, in most cases the modifications required are relatively minor and straightforward, and many well-written programs can simply be recompiled for the new environment with no changes. Another alternative is the LLP64 model, which maintains compatibility with 32-bit code by leaving both int and long as 32-bit. LL refers to the long long integer type, which is at least 64 bits on all platforms, including 32-bit environments.

There are also systems with 64-bit processors using an ILP32 data model, with the addition of 64-bit long long integers; this is also used on many platforms with 32-bit processors. This model reduces code size and the size of data structures containing pointers, at the cost of a much smaller address space, a good choice for some embedded systems.

GCC Internals | Effective-Target Keywords | Data type sizes

  • ilp32: Target has 32-bit int, long, and pointers.
  • lp64: Target has 32-bit int, 64-bit long and pointers.
  • llp64: Target has 32-bit int and long, 64-bit long long and pointers.

aapcs64 - 2.2 Terms and abbreviations:

  • ILP32: SysV-like data model where int, long int and pointer are 32-bit.
  • LP64: SysV-like data model where int is 32-bit, but long int and pointer are 64-bit.
  • LLP64: Windows-like data model where int and long int are 32-bit, but long long int and pointer are 64-bit.

Fundamental types - Data Models:

The choices made by each implementation about the sizes of the fundamental types are collectively known as data model. Four data models found wide acceptance:

data model of 32 bit systems

LP32 or 2/4/4 (int is 16-bit, long and pointer are 32-bit)

  • Win16 API

ILP32 or 4/4/4 (int, long, and pointer are 32-bit);

  • Win32 API
  • Unix and Unix-like systems (Linux, macOS)

data model of 64 bit systems

LLP64 or 4/4/8 (int and long are 32-bit, pointer is 64-bit)

  • Win32 API (also called the Windows API) with compilation target 64-bit ARM (AArch64) or x86-64 (a.k.a. x64)

LP64 or 4/8/8 (int is 32-bit, long and pointer are 64-bit)

  • Unix and Unix-like systems (Linux, macOS)

refer to the table corresponding to the width in bits by data model.

Convention#

ARM/Keil : Basic data types in ARM C and C++

  • Size and alignment of basic data types

AIX - Data models for 32-bit and 64-bit processes
z/OS - LP64 | ILP32, ILP32 and LP64 data models and data type sizes

  • ILP32, acronym for integer, long, and pointer 32
  • LP64, acronym for long, and pointer 64

ILP32 and LP64 data models.PDF - HP-UX 64-bit data model

  • hp C/HP-UX 32-bit and 64-bit base data types
  • ILP32 and LP64 data alignment

Writing 64-bit Intel code for Apple Platforms

Apple platforms typically follow the data representation and procedure call rules in the standard System V psABI for AMD64, using the LP64 programming model.

Programming Guide for 64-bit Windows
Getting Ready for 64-bit Windows

In the 32-bit programming model (known as the ILP32 model), integer, long, and pointer data types are 32 bits in length.
In the LLP64 data model, only pointers expand to 64 bits; all other basic data types (integer and long) remain 32 bits in length.

GCC | IA-64 Options

-milp32 | -mlp64: Generate code for a 32-bit or 64-bit environment. The 32-bit environment sets int, long and pointer to 32 bits. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits. These are HP-UX specific flags.

GCC | AArch64 Options

-mabi=name: Generate code for the specified data model. Permissible values are ‘ilp32’ for SysV-like data model where int, long int and pointers are 32 bits, and ‘lp64’ for SysV-like data model where int is 32 bits, but long int and pointers are 64 bits.

The default depends on the specific target configuration. Note that the LP64 and ILP32 ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries.

数据模型資料模型64-bit data models

TYPE LP32 ILP32 LP64 ILP64 LLP64
CHAR 8 8 8 8 8
SHORT 16 16 16 16 16
INT 16 32 32 64 32
LONG 32 32 64 64 32
LONG LONG 64 64 64 64 64
POINTER 32 32 64 64 64
  • LP32: sizeof(long)=sizeof(pointer)=32
  • ILP32: sizeof(int)=sizeof(long)=sizeof(pointer)=32

  • LP64: sizeof(long)=sizeof(pointer)=64

  • LLP64: sizeof(long long)=sizeof(pointer)=64
  • ILP64: sizeof(int)=sizeof(long)=sizeof(pointer)=64

    • ILP64 model is very rare, only appeared in some early 64-bit Unix systems (e.g. UNICOS on Cray).

aapcs64 - 7 The standard variants:

10.1.2 Types varying by data model:

C/C++ Type ILP32 (Beta) LP64 LLP64
[signed] long signed word signed double-word signed word
unsigned long unsigned word unsigned double-word unsigned word
wchar_t unsigned word unsigned word unsigned halfword
T * 32-bit data pointer 64-bit data pointer 64-bit data pointer

10.1.4 Additional types:

Typedef ILP32 (Beta) LP64 LLP64
size_t unsigned long unsigned long unsigned long long
ptrdiff_t signed long signed long signed long long

ARM Cortex-A Series Programmer's Guide for ARMv8-A - 5.1 The ARMv8 instruction sets - 5.1.3 Registers - Table 5-1 Variable width

Type ILP32 LP64 LLP64
char 8 8 8
short 16 16 16
int 32 32 32
long 32 64 32
long long 64 64 64
size_t 32 64 64
pointer 32 64 64

在 LP 数据模型下:

  • __SIZEOF_POINTER__ = __SIZEOF_LONG__
  • __POINTER_WIDTH__ = LONG_BIT = __WORDSIZE

关于机器字长相关的宏 LONG_BIT__WORDSIZE,参考 《Machine Word》。

Evolution#

The Evolution of Computing: From 8-bit to 64-bit
The Evolution of CPUs: Exploring the Dominance of 32-bit and 64-bit Architectures
The 64-bit Evolution – Computerworld

The Long Road to 64 Bits - PDF - TABLE 1 Common C Data Types

Data Models and Word Size

64-bit and Data Size Neutrality


opengroup - 64-Bit Programming Models: Why LP64?.PDF - 1997

Major 64-Bit Changes - Apple Developer - 20121213

Why did the Win64 team choose the LLP64 model?
What is the bit size of long on 64-bit Windows?

All modern 64-bit Unix systems use LP64. MacOS X and Linux are both modern 64-bit systems.
Microsoft uses a different scheme for transitioning to 64-bit: LLP64 ('long long, pointers are 64-bit').
This has the merit of meaning that 32-bit software can be recompiled without change.

The Tools - 64-bit Compiler Switches and Warnings

Wireshark Development/Win64

32-bit UNX platforms, and 32-bit Windows, use the ILP32 data model.
64-bit UNX platform use the LP64 data model; however, 64-bit Windows uses the LLP64 data model.

Comments