## Introduction to Compiler Principle (a)


The first part of the note compiler theory, the contents of reference: Soft Northern Faculty Shao Bing classroom courseware and content, Zhang Li a “compiler theory and compiler construction”, Defense Industry Publishing House “compiler theory – study guide and resolve typical problems” AlvinZH study notes and personal understanding

The current version is the inclusion of the entire contents of the follow-up will launch a Lite version and review knowledge

If an error or mistake welcome that suggestion in the comments or contact me: QQ: 847590417

Read catalog

1.1 compiled some basic concepts

1.2 compile the whole process

1.3 compiler construction

1.4 compiler pre- and postprocessor

1.5 Application of compiler technology


1.1 compiled some basic concepts

Low-level language (Low level language)

– word bit code, machine language, assembly language

– Features: related to a particular machine, high efficiency, but the use of complex, cumbersome, time-consuming and error-prone.

High-level language

– Fortran, Pascal, C language, etc.

– Features: not rely on specific machines, portability, and user requirements for low, easy to use, easy to maintain.

Source: a program written in a compiled language or high-level language

Target program (object code): target language program represented by a target language: not mandatory, may be some machine assembly language, machine language, it may be “intermediate language interposed between the source language and machine language . “

Translation program: the source is converted to the target program to become a translator. It refers to a variety of language translator, is a general term assembler, compiler and various transformation programs

Three relationships: the source is the input translation program, the goal of the program is the output of the translation process


Source program written in assembly language, translated the program to get the program expressed in machine language yet, when the translation program called assembler, this translation process is called “compilation”


Written in high-level language source code, object program is obtained after processing, the translation process is called “compiling”

Assembler and compiler is a translation program, a total of just love the different objects, single-piece assembly language and machine language format one relationship, so much the assembler to do translation work easier than compiler.


From the source to actually use the program has two phases: compile and run

The compiler or assembly stage of the source code by the compiler, assembler, and other translation program into a target program

Through the input stage of operation in the target program to the subroutine, and its operation data and output data obtained


Interpreter: The source program is a program interpreted, moral intermediate language variations were interpreted in the program. Source program into the interpreter:


1.2 compile the whole process

Translating high-level language program into an object program equivalent process, generally divided into five basic stages:

Lexical analysis, syntax analysis, and semantic analysis of intermediate code generation, code optimization, to generate a target program

1. lexical analysis:

Analysis and word recognition

I.e. the scan source (string), in accordance with this regulation is to analyze and identify the language words, and outputs encoded in some form.

Words: syntax is the basic unit of language, the language generally four categories of words: the language definition reserved words or keywords, identifiers (variable names), constant (constant), delimiters (operators, special symbols). Word that is the smallest meaningful words.

The assignment can be identified nine words


2. parsing:

The appropriate language grammar, syntax analysis and identify various components, such as expression, a variety of instructions, statements, functions, etc., and syntactic correctness check.

This grammar, syntax analysis can use its contents in accordance with <> is a recognized and the syntax check, an error message if an error is output.


3. semantic analysis, generating intermediate code:

The syntax of the various components of the identified semantic analysis, and generates a corresponding intermediate code

A cross between the intermediate code is an intermediate form between the source language and target language, the resulting object: 1 do facilitate optimization, easy to transplant the compiler 2 (without relying on the target computer, so converted into other forms).

In the form of intermediate code: programmers can compile their own design, commonly used four yuan, ternary type, reverse polish and so on.


First, the assignment statement is identified, then the correctness of the analysis above, after generating the intermediate code is correct

Four yuan (three-address instruction):



4. Code Optimization:

From the quad of formula in the form of intermediate code seen, a first constant is calculated, it can be calculated in order to optimize the results compiled in units of work, so that the target instruction not have to generate each calculation.


Then also the number of temporary work units Optimization: T2 turns T1, a case can be used to reduce the unit.


The generated object program:

After generating the intermediate code, it is easy to generate a target program (address of the instruction sequence), and this part of the work machine and a very close relationship, it is necessary according to the specific machine. In this part of the work to be noted full use of the accumulator, it can also be optimized when generated. (This process requires maintaining semantic equivalence)


1.3 compiler construction

1.3.1 Logical Structure

Depending on the logic function, the compilation process can be divided into five basic stages, corresponding to a compiler may be implemented entire compilation process is divided into five logical stages:


The five stages have to do two things: construction and Lookup Table and error handling, that is to be included in compiler table management and error handling of two parts

Forms Management (construction and Lookup Table):

The information that is timely source of information and compilation process resulting in the registration form, and in the subsequent compilation process while also continue to look for information in these tables, the compilation process throughout the construction and Lookup Table work.

Error handling:

The larger source is inevitable that many errors, the compiler must have an error processing work, that is able to diagnose errors, and the nature and location of users report errors to the user to modify the source code. Pros and cons of error handling capability is an important indicator of good or bad quality of the compiler.

As is typical seven logical part of the compiler, plus five logical stage work has been the need for the two.


1.3.2 passes (pass)

Times: the source program (including the intermediate form source code) is scanned once from beginning to end, and for the relevant processing, generates a new source program or an intermediate form of the target program, commonly referred to again:

One is to complete the work over the five basic stages, it is necessary after several scanning process

A scan can complete the compilation called “scan pass compiler”, again scanning the compiler to parser as the core.

Points over the transplant can create conditions for the compiler, the main drawback is to increase the number of repetitive tasks.

Its structure is:


start pass to over pass (SP, OP probably mean)


1.3.3 front-end and back-end

The function of each part of the compiler compiler may be divided into a front end and a rear end

Front: source and, related to the source language, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization part these analyzes

Rear: the target machine and related parts, including the target program generation, machine optimization and objectives related to these integrated part

Division reasons:

This is a conventional method, the same can be achieved using a compiler front end, a rear end gold rewritten compiler can generate the same source language on different target, and also the rear end of the previous work in parallel.


1.4 compiler pre- and postprocessor

Left for the front, right after

Source: multi-file, macro definitions and red calls, including file

Target program: generally assembler or machine code relocatable



1.5 Application of compiler technology

Syntax-structured compiler, program formatting tools, software testing tools, program understanding tools, high-level language translation tools.

Leave a Reply