All Topics  
C preprocessor

 

   Email Print
   Bookmark   Link






 

C preprocessor



 
 
The C preprocessor (cpp) is the preprocessor
Preprocessor

In computer science, a preprocessor is a Computer program that processes its input data to produce output that is used as input to another program....
 for the C programming language
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
. In many C implementations, it is a separate program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
 invoked by the compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
 as the first part of translation. The preprocessor handles directives for source file
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 inclusion (#include), macro definitions (#define), and conditional inclusion (#if). The language of preprocessor directives is not strictly specific to the grammar of C, so the C preprocessor can also be used independently to process other types of files.

The transformations it makes on its input form the first four of C's so-called Phases of Translation.






Discussion
Ask a question about 'C preprocessor'
Start a new discussion about 'C preprocessor'
Answer questions from other users
Full Discussion Forum



Encyclopedia


The C preprocessor (cpp) is the preprocessor
Preprocessor

In computer science, a preprocessor is a Computer program that processes its input data to produce output that is used as input to another program....
 for the C programming language
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
. In many C implementations, it is a separate program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
 invoked by the compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
 as the first part of translation. The preprocessor handles directives for source file
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 inclusion (#include), macro definitions (#define), and conditional inclusion (#if). The language of preprocessor directives is not strictly specific to the grammar of C, so the C preprocessor can also be used independently to process other types of files.

The transformations it makes on its input form the first four of C's so-called Phases of Translation. Though an implementation may choose to perform some or all phases simultaneously, it must behave as if it performed them one-by-one in order.

Phases

The following are the first four (of eight) phases of translation specified in the C Standard:

  1. Trigraph Replacement - The preprocessor replaces trigraph sequences
    C trigraph

    In computer programming, digraphs and trigraphs are sequences of two and three character s respectively which are interpreted as one character by the programming language....
     with the characters they represent.
  2. Line Splicing - Physical source lines that are continued with escaped newline sequences are spliced to form logical lines.
  3. Tokenization - The preprocessor breaks the result into preprocessing tokens and whitespace. It replaces comments with whitespace.
  4. Macro Expansion and Directive Handling - Preprocessing directive lines, including file inclusion and conditional compilation, are executed. The preprocessor simultaneously expands macros and, in the 1999 version of the C standard, handles _Pragma operators.


Including files

The most common use of the preprocessor is to include another file:
  1. include


int main (void)

The preprocessor replaces the line #include <stdio.h> with the system header file
Header file

In computer programming, particularly in the C and C++ programming languages, a header file or include file is a computer file, usually in the form of source code, that a compiler automatically includes when processing another source file....
 of that name, which declares the printf function
Subroutine

In computer science, a subroutine or subprogram is a portion of computer code within a larger computer program, which performs a specific task and is relatively independent of the remaining code....
 amongst other things. More precisely, the entire text of the file 'stdio.h' replaces the #include directive.

This can also be written using double quotes, e.g. #include "stdio.h". If the filename is enclosed within angle brackets, the file is searched for in the standard compiler include paths. If the filename is enclosed within double quotes, the search path is expanded to include the current source directory. C compilers and programming environments all have a facility which allows the programmer to define where include files can be found. This can be introduced through a command line flag, which can be parameterized using a makefile, so that a different set of include files can be swapped in for different operating systems, for instance.

By convention, include files are given a .h extension, and files not included by others are given a .c extension. However, there is no requirement that this be observed. Occasionally you will see files with other extensions included, in particular files with a .def extension may denote files designed to be included multiple times, each time expanding the same repetitive content.

#include often compels the use of #include guard
Include guard

In the C and C++ programming languages, an #include guard, sometimes called a macro guard, is a particular construct used to avoid the problem of double inclusion when dealing with the #include compiler directive....
s or #pragma once
Pragma once

In the C and C++ programming languages, #pragma once is a non-standard but widely supported C preprocessor designed to cause the current source file to be included only once in a single compilation....
 to prevent double inclusion.

Conditional compilation

The #if, #ifdef, #ifndef, #else, #elif and #endif directives can be used for conditional compilation.

  1. ifdef WIN32 // WIN32 is defined by all Windows 32 compilers, but not by others.
  2. include
  3. else
  4. include
  5. endif


  1. if VERBOSE >= 2
print("trace message");
  1. endif


The macro WIN32 could be defined implicitly by the compiler, or specified on the compiler's command line, perhaps to control compilation of the program from a makefile.

The code tests if a macro WIN32 is defined. If it is, as in this example, the file <windows.h> is included, otherwise <unistd.h>.

Macro definition and expansion

There are two types of macros, object-like and function-like. Object-like macros do not take parameters; function-like macros do. The generic syntax for declaring an identifier as a macro of each type is, respectively,
  1. define
  2. define ()
Note that the function-like macro declaration must not have any whitespace between the identifier and the first, opening, parenthesis. If whitespace is present, the macro will be interpreted as object-like with everything starting from the first parenthesis added to the token list.

Whenever the identifier appears in the source code it is replaced with the replacement token list, which can be empty. For an identifier declared to be a function-like macro, it is only replaced when the following token is also a left parenthesis that begins the argument list of the macro invocation. The exact procedure followed for expansion of function-like macros with arguments is subtle.

Object-like macros were conventionally used as part of good programming practice to create symbolic names for constants, e.g.

  1. define PI 3.14159


... instead of hard-coding those numbers throughout one's code. However, both C and C++ provide the const directive, which provides another way to avoid hard-coding constants throughout the code.

An example of a function-like macro is:

  1. define RADTODEG(x) ((x) * 57.29578)


This defines a radian
Radian

The radian is a unit of plane angle, equal to 180/pi Degree , or about 57.2958 degrees, or about 57?17'45?. It is the standard unit of angular measurement in all areas of mathematics beyond the elementary level....
s to degrees conversion which can be written subsequently, e.g. RADTODEG(34) or RADTODEG (34). This is expanded in-place, so the caller does not need to litter copies of the multiplication constant all over his code. The macro here is written as all uppercase to emphasize that it is a macro, not a compiled function.

Precedence

Note that the example macro RADTODEG(x) given above uses normally superfluous parentheses both around the argument and around the entire expression. Omitting either of these can lead to unexpected results. For example:

  • Macro defined as
#define RADTODEG(x) (x * 57.29578) will expand RADTODEG(a + b) to (a + b * 57.29578)

  • Macro defined as
#define RADTODEG(x) (x) * 57.29578 will expand 1 / RADTODEG(a) to 1 / (a) * 57.29578 neither of which give the intended result.

Multiple lines

A macro can be extended over as many lines as required using a backslash
Backslash

The backslash is a typographical mark used chiefly in computing. It was first introduced to computers in 1960 by Bob Bemer. Sometimes called a reverse solidus or an oblique, it is the mirror image of the common slash ....
 escape character
Escape character

In computing and telecommunication, an escape character is a single character which in a sequence of characters signifies that what is to follow takes an alternative interpretation....
 at the end of each line. The macro ends after the first line which does not end in a backslash.

The extent to which multi-line macros enhance or reduce the size and complexity of the source of a C program, or its readability
Readability

In writing and typography Readability is defined as reading ease, especially as it results from a writing style. Extensive research has shown that easy-reading text improves comprehension, retention, reading speed, and reading persistence....
 and maintainability
Maintainability

In software engineering, the ease with which a software product can be modified in order to:* correct defects* meet new requirements* make future maintenance easier, or...
 is open to debate (there is no experiment
Experiment

In scientific inquiry, an experiment is a method of investigating causal relationships among variables. An experiment is a cornerstone of the empiricism approach to acquiring data about the world and is used in both natural sciences and social sciences....
al evidence on this issue). Techniques such as "supermacros" are occasionally used to address these potential issues.

Multiple evaluation of side effects

Another example of a function-like macro is:
  1. define MIN(a,b) ((a)>(b)?(b))
Notice the use of the ?:
?:

Commonly referred to as the conditional operator, ?: is a ternary operator that is part of the syntax for a basic conditional statement in several programming languages including C , C++, JavaScript, Objective-C, C Sharp , D programming language, Java , ECMAScript, Perl, PHP, Tcl, Ruby programming language, and Verilog....
operator. This illustrates one of the dangers of using function-like macros. One of the arguments, a or b, will be evaluated twice when this "function" is called. So, if the expression MIN(++firstnum,secondnum) is evaluated, then firstnum may be incremented twice, not once as would be expected.

A safer way to achieve the same would be to use a typeof-construct:
  1. define max(a,b) \


This will cause the arguments to be evaluated only once, and it will not be type-specific anymore. This construct is not legal ANSI C
ANSI C

ANSI C is the standard published by the American National Standards Institute for the C . Software developers writing in C are encouraged to conform to the requirements in the document, as it encourages easily porting code....
; both the typeof keyword, and the construct of placing a compound statement within parentheses, are non-standard extensions implemented in the popular GNU C compiler (GCC)
GNU Compiler Collection

The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain....
. If you are using GCC, the same general problem can also be solved using a static inline function, which is as efficient as a #define. The inline function allows the compiler to check/coerce parameter types -- in this particular example this appears to be a disadvantage, since the 'max' function as shown works equally well with different parameter types, but in general having the type coercion is often an advantage.

Within ANSI C, there is no reliable general solution to the issue of side-effects in macro arguments.

Token concatenation

Token concatenation, also called token pasting, is one of the most subtle — and easy to abuse — features of the C macro preprocessor. Two arguments can be 'glued' together using ## preprocessor operator; this allows two tokens to be concatenated in the preprocessed code. This can be used to construct elaborate macros which act like a crude version of C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
 templates
Template (programming)

Templates are a feature of the C++ programming language that allow functions and classes to operate with Generic programming. This allows a function or class to work on many different datatype without being rewritten for each one....
.

For instance:

  1. define MYCASE(item,id) \
case id: \ item##_##id = id;\ break

switch(x)

The line MYCASE(widget,23); gets expanded here into case 23: widget_23 = 23; break; (The semicolon following the invocation of MYCASE becomes the semicolon that completes the break statement.)

Semicolons

One stylistic note about the above macro is that the semicolon on the last line of the macro definition is omitted so that the macro looks 'natural' when written. It could be included in the macro definition, but then there would be lines in the code without semicolons at the end which would throw off the casual reader. Worse, the user could be tempted to include semicolons anyway; in most cases this would be harmless (an extra semicolon denotes an empty statement) but it would cause errors in control flow blocks:
  1. define PRETTY_PRINT(msg) \
printf ("Message: '%s'\n", msg);

if (n < 10) PRETTY_PRINT("n is less than 10"); else PRETTY_PRINT("n is at least 10"); This expands to give two statements – the intended printf and an empty statement – in each branch of the if/else construct
Conditional statement

In computer science, conditional statements, conditional expressions and conditional constructs are features of a programming language which perform different computations or actions depending on whether a programmer-specified condition evaluates to true or false ....
, which will cause the compiler to give an error message similar to:

Multiple statements
Inconsistent use of multiple-statement macros can result in unintended behaviour. The code
  1. define CMDS \
a = b; \ c = d

if (var

13) CMDS; else return; will expand to if (var

13) a = b; c = d; else return; which is a syntax error (the else is lacking a matching if).

The macro can be made safe by replacing the internal semicolon with the comma operator, since two operands connected by a comma form a single statement. The comma operator is the lowest precedence operator
Operators in C and C++

This is a list of operator s in the C++ and C . All the operators listed exist in C++; the third column indicates whether an operator is also present in C....
. In particular, its precedence is lower than the assignment operator's, so that a = b, c = d does not parse as a = (b,c) = d. Therefore,
  1. define CMDS a = b, c = d


if (var

13) CMDS; else return; will expand to if (var

13) a = b, c = d; else return;

The problem can also be fixed without using the comma operator:
  1. define CMDS \
do while (0) expands to if (var

13) do while (0); else return;

The do and while (0) are needed to allow the macro invocation to be followed by a semicolon; if they were omitted the resulting expansion would be if (var

13) ; else return; The semicolon in the macro's invocation above becomes an empty statement, causing a syntax error at the else by preventing it matching up with the preceding if.

Quoting macro arguments

Although macro expansion does not occur within a quoted string, the text of the macro arguments can be quoted and treated as a string literal by using the "#" directive(also known as the "Stringizing Operator"). For example, with the macro
  1. define QUOTEME(x) #x
the code printf("%s\n", QUOTEME(1+2)); will expand to printf("%s\n", "1+2"); This capability can be used with automatic string literal concatenation
C syntax

The syntax of the C is a set of rules that specifies whether the sequence of characters in a file is conforming C source code. The rules specify how the character sequences are to be chunked into tokens , the permissible sequences of these tokens and some of the meaning to be attributed to these permissible token sequences ....
 to make debugging macros. For example, the macro in
  1. define dumpme(x, fmt) printf("%s:%u: %s=" fmt, __FILE__, __LINE__, #x, x)


int some_function would print the name of an expression and its value, along with the file name and the line number.

Indirectly quoting macro arguments
The "#" directive can also be used indirectly. For example, with the macro:
  1. define FOO bar
  2. define _QUOTEME(x) #x
  3. define QUOTEME(x) _QUOTEME(x)
the code printf("FOO=%s\n", QUOTEME(bar)); will expand to printf("FOO=%s\n", "bar"); One common use for this technique is to convert the __LINE__ macro to a string. Eg: QUOTEME(__LINE__); is converted to: "34" if __LINE__ happens to have the value 34 when QUOTEME is called. On the other hand _QUOTEME(__LINE__) will expand to "__LINE__"

Variadic macros

Macros that can take a varying number of arguments (variadic macro
Variadic macro

A variadic macro is a feature of the C preprocessor whereby a Macro may be declared to accept a varying number of Parameter .Variable-argument macros were introduced in the ISO/IEC 9899:1999 revision of the C Programming Language standard in 1999....
s) are not allowed in C89, but were introduced by a number of compilers and standardised in C99
C99

C99 is a modern dialect of the C programming language....
. Variadic macros are particularly useful when writing wrappers to variable parameter number functions, such as printf
Printf

The class of printf functions is a class of function , typically associated with curly bracket programming languages, that accept a string parameter which specifies a method for rendering a number of other parameters into a string....
, for example when logging warnings and errors.

X-Macros

One little-known usage-pattern of the C preprocessor is known as "X-Macros". An X-Macro is an #include file
Header file

In computer programming, particularly in the C and C++ programming languages, a header file or include file is a computer file, usually in the form of source code, that a compiler automatically includes when processing another source file....
 (commonly using a ".def" extension instead of the traditional ".h") that contains a list of similar macro calls (which can be referred to as "component macros"). The include file is then referenced repeatedly in the following pattern:

(Given that the include file is "xmacro.def" and it contains a list of component macros of the style "foo(x, y, z)")

  1. define foo(x, y, z) doSomethingWith(x, y, z);
  2. include "xmacro.def"
  3. undef foo


  1. define foo(x, y, z) doSomethingElseWith(x, y, z);
  2. include "xmacro.def"
  3. undef foo


(etc...)

The most common usage of X-Macros is to establish a list of C objects and then automatically generate code for each of them. Some implementations also perform any #undefs they need inside the X-Macro, as opposed to expecting the caller to undef them.

Common sets of objects are a set of global configuration settings, a set of members of a structure, a list of possible XML tags for converting an XML file to a quickly traversable tree or the body of an enum declaration, although other lists are possible.

Once the X-Macro has been processed to create the list of objects, the component macros can be redefined to generate, for instance, accessor and/or mutator functions. Structure serializing and deserializing are also commonly done.

Here is an example of an X-Macro that establishes a struct and automatically creates serialize/deserialize functions:

(Note: for simplicity, we don't account for endianness or buffer overflows)

File: object.def

struct_member( x, int ); struct_member( y, int ); struct_member( z, int ); struct_member( radius, double );

File: star_table.c

typedef struct star;

void serialize_star( const star *_star, unsigned char *buffer )

void deserialize_star( star *_star, const unsigned char *buffer )



Often, handlers for individual data types are created and accessed using the token concatenation ("##") and quoting ("#") operators. For instance, the following might be added to the above code:



void print_int( int val )

void print_double( double val )

void print_star( const star *_star )



The creation of a separate header file can be avoided by creating a single macro containing what would be the contents of the file. For instance, the above defined "object.def" could be replaced with this macro:

  1. define object_def \
struct_member( x, int ); \ struct_member( y, int ); \ struct_member( z, int ); \ struct_member( radius, double );



and then all calls to '#include "object.def"' could be replaced with a simple object_def statement. The above function would become:



void print_star( const star *_star )



A variant which avoids needing to know the names of any expanded sub-macros is to accept the operators as an argument to the list macro:

  1. define object_def(_) \
_( x, int ) \ _( y, int ) \ _( z, int ) \ _( radius, double )

void print_star( const star *_star )



This approach can be dangerous in that the entire macro set is always interpreted as if it was on a single source line, which could encounter compiler limits with complex component macros and/or long member lists.

User-defined compilation errors and warnings

The #error directive inserts an error message into the compiler output.

  1. error "Gaah!"


This prints "Gaah!" in the compiler output and halts the computation at that point. This is extremely useful for determining whether a given line is being compiled or not. It is also useful if you have a heavily parameterized body of code and want to make sure a particular #define has been introduced from the makefile, e.g.:

  1. ifdef WINDOWS
... /* Windows specific code */
  1. elif defined(UNIX)
... /* Unix specific code */
  1. else
#error "What's your operating system?"
  1. endif


Some implementations provide a non-standard #warning directive to print out a warning message in the compiler output, but not stop the compilation process. A typical use is to warn about the usage of some old code, which is now unfavored and only included for compatibility reasons, e.g.:

  1. warning "Do not use ABC, which is deprecated. Use XYZ instead."


Although the text following the #error or #warning directive does not have to be quoted, it is good practice to do so. Otherwise, there may be problems with apostrophes and other characters that the preprocessor tries to interpret. Microsoft C uses #pragma message ( "text" ) instead of #warning.

Compiler-specific preprocessor features

The #pragma directive is a compiler specific directive which compiler vendors may use for their own purposes. For instance, a #pragma is often used to allow suppression of specific error messages, manage heap and stack debugging, etc.

C99 introduced a few standard #pragma directives, taking the form #pragma STDC …, which are used to control the floating-point implementation.

Standard positioning macros

Certain symbols are predefined in ANSI C. Two useful ones are __FILE__ and __LINE__, which expand into the current file and line number. For instance:

// debugging macros so we can pin down message provenance at a glance
  1. define WHERESTR "[file %s, line %d] "
  2. define WHEREARG __FILE__,__LINE__


printf(WHERESTR ": hey, x=%d\n", WHEREARG, x);

This prints the value of x, preceded by the file and line number, allowing quick access to which line the message was produced on. Note that the WHERESTR argument is concatenated with the string following it.

Compiler-specific predefined macros

Compiler-specific predefined macros are usually listed in the compiler documentation, although this is often incomplete. The lists "various pre-defined compiler macros that can be used to identify standards, compilers, operating systems, hardware architectures, and even basic run-time libraries at compile-time".

Some compilers can be made to dump at least some of their useful predefined macros, for example:

GNU C Compiler
GNU Compiler Collection

The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain....
:
gcc -dM -E - < /dev/null
/dev/null

In Unix-like operating systems, /dev/null or the null device is a special file that discards all data written to it , and provides no data to any process that reads from it ....
HP-UX
HP-UX

HP-UX 11i is Hewlett-Packard's proprietary software implementation of the Unix operating system, based on UNIX System V . It runs on the HP 9000 PA-RISC-based range of central processing unit and HP Integrity Intel's Itanium-based systems, and was also available for later Apollo/Domain systems....
 ansi C compiler:
cc -v fred.c (where fred.c is a simple test file)
SCO OpenServer
SCO OpenServer

SCO OpenServer, previously SCO UNIX and SCO Open Desktop , is a closed source version of the Unix computer operating system developed by Santa Cruz Operation and now maintained by the SCO Group....
 C compiler:
cc -## fred.c (where fred.c is a simple test file)
Sun Studio
Sun Studio (software)

The Sun Studio compiler suite is Sun Microsystems' flagship software development product for Solaris and Linux. Sun Studio software delivers optimizing C, C++, and Fortran compilers, libraries, and performance analysis, and debugging tools for the Solaris OS on SPARC, and both Solaris and Linux on x86/x64 platforms, including the latest mult...
 C/C++ compiler:
cc -## fred.c (where fred.c is a simple test file)


As a general-purpose preprocessor

Since the C preprocessor can be invoked independently to process files other than those containing to-be-compiled source code, it can also be used as a "general purpose preprocessor" for other types of text processing. One particularly notable example is the now-deprecated imake
Imake

imake is a discontinued build automation implemented on top of the C preprocessor.imake generates makefiles from a template, a set of C preprocessor macro functions, and a per-directory input file called an Imakefile....
 system; more examples are listed at General purpose preprocessor
Preprocessor

In computer science, a preprocessor is a Computer program that processes its input data to produce output that is used as input to another program....
.

See also

  • C syntax
    C syntax

    The syntax of the C is a set of rules that specifies whether the sequence of characters in a file is conforming C source code. The rules specify how the character sequences are to be chunked into tokens , the permissible sequences of these tokens and some of the meaning to be attributed to these permissible token sequences ....
  • Make
  • Preprocessor
    Preprocessor

    In computer science, a preprocessor is a Computer program that processes its input data to produce output that is used as input to another program....


External links


  • . The official C:1999 standard, along with defect reports and a rationale. As of 2005 the latest version is .