written 8.5 years ago by | • modified 8.5 years ago |
- Code generator phase generates the target code taking input as intermediate code.
- The output of intermediate code generator may be given directly to code generation or may pass through code optimization before generating code.
Code generator main tasks:
- Instruction selection
- Register allocation and assignment
- Instruction ordering
Issues in Design of Code generation:
Target code mainly depends on available instruction set and efficient usage of registers. The main issues in design of code generation are
1. Intermediate representation
- Linear representation like postfix and three address code or quadruples and graphical representation like Syntax tree or DAG.
- Assume type checking is done and input in free of errors.
2. Target Code
- The code generator has to be aware of the nature of the target language for which the code is to be transformed.
- That language may facilitate some machine-specific instructions to help the compiler generate the code in a more convenient way.
- The target machine can have either CISC or RISC processor architecture.
- The target code may be absolute code, re-locatable machine code or assembly language code.
- Absolute code can be executed immediately as the addresses are fixed.
- But in case of re-locatable it requires linker and loader to place the code in appropriate location and map (link) the required library functions.
- If it generates assembly level code then assemblers are needed to convert it into machine level code before execution.
Re-locatable code provides great deal of flexibilities as the functions can be compiled separately before generation of object code.
1.1 Absolute machine language:
- Producing an absolute machine language program as output has the advantage that it can be placed in a fixed location in memory and immediately executed.
1.2 Relocatable machine language:
Producing a relocatable machine language program as output allows subprograms to be compiled separately. A set of relocatable object modules can be linked together and loaded for execution by a linking loader.
If the target machine does not handle relocation automatically, the compiler must provide explicit relocation information to the loader, to link the separately compiled program segments.
3. Assembly language:
Producing an assembly language program as output makes the process of code generation somewhat easier.
Example
MOV R0, R1
ADD R1, R2
Target Machine supports for the following addressing modes
a. Absolute addressing mode
Example: MOV R0, M where M is the address of memory location of one of the operands. MOV R0, M moves the contents of register R0 to memory location M.
b. Register addressing mode where both the operands are in register.
Example: ADD R0, R1
c. Immediate addressing mode – The operand value appears in the instruction.
Example: ADD # 1, R0
d. Index addressing mode- this is of the form C(R) where the address of operand is at the location C+Contents(R)
Example: MOV 4(R0), M the operand is located at address = contents (4+contents (R0)) Cost of instruction is defined as cost of execution plus the number of memory access.
3. Address mapping
- Address mapping defines the mapping between intermediate representations to address in the target code.
- These addresses are based on the runtime environment used like static, stack or heap.
- The identifiers are stored in symbol table during declaration of variables or functions, along with type.
- Each identifier can be accessed in symbol table based on width of each identifier and offset. The address of the specific instruction (in three address code) can be generated using back patching
4. Instruction Set
- The instruction set should be complete in the sense that all operations can be implemented.
- Sometimes a single operation may be implemented using many instruction (many set of instructions).
- The code generator should choose the most appropriate instruction.
- The instruction should be chosen in such a way that speed is of execution is minimum or other machine related resource utilization should be minimum.
- The code generator takes Intermediate Representation as input and converts (maps) it into target machine’s instruction set.
- One representation can have many ways (instructions) to convert it, so it becomes the responsibility of the code generator to choose the appropriate instructions wisely.
Example
The factors to be considered during instruction selection are:
The uniformity and completeness of the instruction set.
Instruction speed and machine idioms.
Size of the instruction set.
Eg., for the following address code is:
a := b + c
d := a + e
inefficient assembly code is:
MOV b, R0 R0 ← b ADD c, R0 R0 ← c + R0 MOV R0, a a ← R0 MOV a, R0 R0 ← a ADD e, R0 R0 ← e + R0 MOV R0 , d d ← R0
Here the fourth statement is redundant, and so is the third statement if
'a' is not subsequently used.
5. Register Allocation
- A program has a number of values to be maintained during the execution.
- The target machine’s architecture may not allow all of the values to be kept in the CPU memory or registers.
- Code generator decides what values to keep in the registers.
- Also, it decides the registers to be used to keep these values.
- Instructions involving register operands are usually shorter and faster than those involving operands in memory.
- Therefore efficient utilization of registers is particularly important in generating good code.
- During register allocation we select the set of variables that will reside in registers at each point in the program.
- During a subsequent register assignment phase, we pick the specific register that a variable will reside in.
6. Memory Management
- Mapping names in the source program to addresses of data objects in run-time memory is done cooperatively by the front end and the code generator.
- A name in a three- address statement refers to a symbol-table entry for the name.
- From the symbol-table information, a relative address can be determined for the name in a data area for the procedure.
7. Evaluation of order
- At last, the code generator decides the order in which the instruction will be executed.
- It creates schedules for instructions to execute them.
- The order in which computations are performed can affect the efficiency of the target code.
- Some computation orders require fewer registers to hold intermediate results than others.