GCC is the cornerstone of development in both the open source and closed source worlds. It's the enabler of architectures and operating systems. When a new processor appears, its success depends on a version of GCC that will support it (a back end that can generate code for it). GCC is also the enabler of Linux®. Linux as an operating system is widely successful because it is run on so many different architectures. Once again, a port of GCC to the target environment enables Linux to be ported and run on it. Without trying to put too fine a point on it, GCC paves the way for Linux and embedded development.
But GCC can't just sit still. New processor architectures continue to appear, and new research finds better ways to optimize and generate code. So GCC moves forward and has now matured into its fourth major release. This article explores the fundamental changes in GCC version 4 to show you why—if you haven't switched yet—the time has come to use the compiler standard.
Short history
GCC originally stood for GNU C Compiler when it was first released by Richard Stallman in 1987. (A historical timeline of GCC is shown in Figure 1.) Richard started the project in 1984 with the desire to build a free C
compiler that could be used, modified, and evolved. GCC originally ran on early Sun and DEC VAX systems.
As an open compiler (that is, the source was freely available), others began to provide fixes and—more importantly—updates for new languages and target architectures. Not long after, its acronym was changed to mean GNU Compiler Collection, as it supported numerous languages targeted to the most popular (and esoteric) architectures.
Figure 1. A modern history of GCC releases
Today, GCC is the most popular compiler toolchain available. The same source base can be used to build compilers for Ada, Fortran, the Java™ language, variants of C
(C++
and Objective-C
) and covers the largest number of target processor architectures of any compiler (30 supported processor families). The source base is also completely portable and runs on more than 60 platforms. The compilers are highly tunable, with a large number of options for tweaking the generated code. GCC, in short, is the Swiss Army knife of compilers and redefines the meaning of flexibility. It is also the most complex open source system that exists: Today, GCC is made up of almost 1.5 million lines of source code.
Wow! With all of that, you'd think I was truly enamored with GCC. Let's just say that when I'm developing software with GCC and my wife walks into the room, I feel a little uncomfortable.
Before you start
Compilers are constructed in a pipeline architecture made up of several stages that communicate different forms of data (see Figure 2). The front end of a compiler is language specific and includes a parser for the given language that results in parsed trees and the intermediate representation (the Register Transfer Language, or RTL). The back end is then responsible for using this language-independent representation and product instructions for the particular target architecture. To do this, the optimizer uses the RTL to create fast or more compact code (or both, when possible). The optimized RTL is then fed to the code generator, which produces target code.
Figure 2. Simplified view of the compiler stages
Core changes for GCC 4
GCC 4 brings many changes to the standard compiler suite, the biggest of which is around support for optimizations with the introduction of the tree Static Single Assignment (SSA) form. But in general, the compiler is faster in some optimization modes and provides many new enhancements, including new target support. GCC 4 is also much more thorough when it comes to warnings and errors (in fact, certain warnings may now show up as errors with GCC 4). One drawback to GCC 4 is that it is not binary-compatible with objects built with the GCC 3 compilers (which means that source must be recompiled with GCC 4)—unfortunate, but it's the price to be paid to move forward.
Let's look at some of the key advancements introduced with the new GCC 4.
The 4.0 release series
The 4.0 release (4.0.4 being the last in the series) is the first step into GCC 4. As such, it was not recommended for production development until a stabilization process could be completed. This release included a large number of changes—two in particular being the introduction of a new optimization framework (Tree SSA) and support for autovectorization.
Prior to GCC 4, the intermediate representation used was called Register Transfer Language (RTL). RTL is a low-level representation very close to assembly language (inspired by LISP S-expressions). The problem with RTL is that the optimizations it enables are those close to the target. Optimizations that require higher-level information about the program may not be possible, because their expression is lost in RTL. Tree SSA is designed to be both language independent and target independent while supporting improved analysis and richer optimizations.
Tree SSA introduces two new intermediate representations. The first is called GENERIC, which is a generic tree representation that's formed from the language front-end trees. The GENERIC trees are converted into GIMPLE form and a subsequent control flow graph to support SSA-based optimizations. Finally, the SSA trees are converted into RTL, which the back end uses for target code generation. An overly simplified description, but the result is a new intermediate form better suited for high- and low-level optimizations. (See Resources for more details on this process.)
As the changes really represent a new framework, it's possible to define new optimizations. Several new optimizations have been implemented to date, but more work is obviously ahead to ensure that GCC generates the most compact and efficient code possible.
Another interesting change for GCC 4 is the addition of a loop vectorizer (based on the Tree SSA framework). Autovectorization is a feature that allows the compiler to identify scalar processing loops within code that can benefit from vector instructions available in the target processor. The result is tighter and more efficient target code. Another loop-based optimization is Swing Modulo Scheduling (SMS), which is used to construct instruction pipelines with the goal of minimizing cycle counts by exploiting instruction-level parallelism. More information on each of these new approaches is available in Resources.
Finally, the 4.0 series also introduced (in addition to many C
and C++
changes) a new Fortran front end that supports Fortran 90 and 95 (rather than the older Fortran 77, which was supported in GCC 3). New Ada 2005 features can also be found as well as support for Ada features on many more target architectures.
The 4.1 release series
With the new optimization framework in place, the 4.1 release series introduced a larger number of optimizations, such as improved profiling support and more accurate branch probability estimation. Two of the more useful optimizations are better inline support and the ability to exploit the instruction cache locality. When functions are to be inlined, the compiler no longer inlines functions that are not executed frequently. Instead, hot call sites are more likely to be inlined to keep the code size small while still getting inline function benefits. GCC can also help to partition functions into hot and cold sections. Keeping hot functions together (that is, those functions that are used more often) results in better instruction cache use compared to polluting the cache with cold functions.
The front end saw a number of updates, including support for Objective-C++
. There were also a very large number of updates for the Java core library (libgc). The back end saw the introduction of support for the IBM® System z™ 9-109 processor, including 128-bit Institute of Electrical and Electronics Engineers (IEEE) floating point numbers and atomic memory access built in. If that weren't enough, the back end can now emit code to protect against stack-smashing attacks (that is, buffer overflow detection and reordering to protect against pointer corruption). Some built-in functions have also been updated to protect against buffer overruns with a minimal amount of overhead.
The 4.2 release series
The 4.2 release series continued with new optimizations and enhancements that covered both languages and processor architectures. The back end was updated to include support for Sun's UltraSPARC T1 processor (codenamed Niagara) as well as Broadcom's SB-1A MIPS core.
The front end also saw changes in version 4.2 with the overhauling of C++
visibility handling and support for Fortran 2003 streaming input/output (I/O) extensions. But one of the most interesting changes in the 4.2 release was the addition of OpenMP for the C
, C++
, and Fortran compilers. OpenMP is a multi-threading implementation that allows the compiler to generate code for task and data parallelism.
Using one aspect of OpenMP, code is annotated with areas in which parallelism should occur using preprocessor directives. The code is converted into a multi-threaded program for the duration of block, then joined back together as each thread within the block finishes.
Figure 3 provides a look at how this process works in practice. OpenMP provides not only a set of pragma
s (that is, preprocessor directives) but also functions for C
, C++
, and Fortran. In Figure 3, you see a simple program that directs elements of code into multiple threads (parallelizing the for
block). The effect is shown graphically in Figure 3: A traditional program would execute the loop sequentially, whereas the OpenMP implementation creates threads to parallelize the for
block. You can learn more about OpenMP in Resources.
Figure 3. Simple example of OpenMP support
The 4.3 release series
The current release series in GCC 4 is 4.3. This release series shows an acceleration of features and supported architectures (as well as unsupported architectures, as many obsolete architectures and ports have been removed). New language support was added for Fortran 2003 as well as a host of general optimizer improvements.
New processors supported in this release include several in the Coldfire processor family, the IBM System z9 EC/BC processor, the Cell broadband engine architecture's Synergistic Processor Unit (SPU), support for SmartMIPS, and numerous others. You'll also find compiler and library support for Thumb2 (compressed ARM instructions) and for the ARMv7 architecture as well as tuning support for Core2 processors and the Geode processor family.
In the front end of the compiler, the internal representation for GIMPLE was redefined, meaning that the compiler consumes less memory.
Beyond the 4.3 release
Work has already begun on the 4.4 release series, and its moving toward a general release. In version 4.4, you'll find numerous bug fixes and more general optimizer improvements. Version 3.0 of the OpenMP specification has also been integrated for C
, C++
, and Fortran.
The compiler will also now allow you to define an optimization level at the function level (instead of at the file level, which was the previous default). This functionality is provided by theoptimize
attribute, which also allows you to specify the individual options for the optimizer.
Finally, processor support was added for the Picochip, which is a 16-bit multi-core processor. What's interesting about the Picochip is that each core can be programmed independently, with communication provided in a mesh.
What's ahead?
The future is obviously bright for GCC. The toolchain continues to evolve—both architecturally and incrementally—to support the latest in processor architectures. You'll also find that the language landscape is well covered by GCC. Under development is support for a number of different languages, such as Mercury, GHDL (a GCC front end for VHDL), and the Unified Parallel C
language (UPC).
In addition to GCC's bright future, its continued improvement means benefits for all types of software (from Linux and Berkeley Software Distribution [BSD] to Apache and everything in between). Software compiled with GCC 4 will be generally more compact and faster, meaning software industry goodness all around.
By M. Tim Jones, Consultant Engineer, Emulex Corp.