written 5.8 years ago by |
Most of processors are “synchronous”. They use a clock to time when instructions occur. A synchronous processor carries out same number of instructions instruction per clock cycle. We can get more processor power simply by increasing the clock frequency. Increasing the clock rate is one of the most effective way to increase processing.
How can you make a processor faster without increasing the clock speed?
We know that,
Performance = clock speed x instructions per clock pulse To make processor faster without increasing the clock speed. We have to increase amount done per clock pulse. Over time processors supported more and more instructions. More, bigger, complex instructions make the processor to do more processing per clock pulse. Early processors supports only addition and subtraction instruction and for multiplication you need to write small program by using logic of addition and subtraction. Now day’s processors have hardware to do high precision arithmetic operations in one cycle such as multiplication.
4.1 The RISC and the CISC design Philosophy
4.1.1 RISC (Reduced Instruction Set Computer) RISC or Reduced Instruction Set Computer is a type of microprocessor architecture that utilizes a small, highly-optimized set of instructions. In other architectures more specialized set of instructions are used. RISC means Simple but powerful instructions that execute within a single cycle at high clock speed. Four major design rules:
– Instructions: reduced instructions of fixed length runs in Single Cycle.
– Pipeline: decode in one stage. There is no need for microcode
– Registers: a large set of general-purpose registers.
– Load/store architecture: data processing instructions apply to registers only. load/store architecture is used to transfer data from memory This results in simple design and fast clock rate
Examples of RISC processors:
• IBM RS6000, MC88100
• DEC’s Alpha 21064, 21164 and 21264 processors
Features of RISC Processors: The standard features of RISC processors are listed below:
RISC processors use a small and limited number of instructions. RISC processors support small number of essential instructions.
RISC machines mostly uses hardwired control unit. Most of the RISC processors are based on the hardwired control unit design approach. In hardwired control unit, the control units use fixed logic circuits to interpret instructions and generate control signals from them. It is faster than its counterpart..
RISC processors consume less power and have high performance. RISC processors have been known to be heavily pipelined this ensures that the hardware resources of the processor are utilized to a maximum giving higher throughput and also consuming less power.
Each instruction is very simple Most instructions in a RISC instruction set are very simple that get executed in one clock cycle.
RISC processors use simple addressing modes. RISC processors don’t have as many addressing modes and the addressing modes these processors have are very simple. Most of the addressing modes are for register operations and do not refer memory.
RISC instruction is of uniform fixed length. The decision of RISC processor designers to provide simple addressing modes leads to uniform length instructions.
Large Number of Registers. The RISC design philosophy generally incorporates a larger number of registers to prevent in large amounts of interactions with memory.
4.1.2 CISC (Complex Instruction Set Computer)
CISC stands for Complex Instruction Set Computer. The control unit contains a number of micro-electronic circuitry to generate a set of control signals. Each micro-circuitry is activated by a micro-code; this design approach is called CISC design. The primary goal of CISC architecture is to complete a task in few lines of assembly code.
Examples of CISC processors are:
• Intel 386, 486, Pentium, Pentium Pro, Pentium II, Pentium III
• Motorola’s 68000, 68020, 68040, etc.
Features of CISC Processors: The standard features of CISC processors are listed below:
1. CISC chips have complex instructions. A CISC processor would come prepared with a specific instruction (e.g. "MULT"). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers (2, 3) can be completed with one instruction: MULT 2, 3 MULT is what is known as a "complex instruction." It operates directly on the computer's memory banks and does not require the programmer to explicitly call any loading or storing functions. It closely resembles a command in a higher level language.
2. CISC processors have a variety of instructions: There are a variety of instructions many of which are complex and thus make up for smaller assembly code thus leading to very low RAM consumption.
3. CISC machines generally make use of complex addressing modes. CISC processes have a variety of different addressing modes in which the operands can be addressed from the memory as well as located in the different registers of the CPU. There are many instructions that refer memory as opposed to RISC architecture
4. CISC processors have variable length instructions: The decision of CISC processor designers to provide a variety of addressing modes leads to variable-length instructions.
5. Easier compiler design: Compilers have very little to do when executing on a computer having CISC architecture. The complex instruction set and smaller assembly code meant little work for the compiler and thus eased up compiler design
6. CISC machines uses micro-program control unit: CISC uses micro programmed control unit. These systems consist of micro programs which are series of microinstructions, which control the CPU at a very fundamental level of hardware circuitry. This is then stored in a control memory like ROM from where the CPU accesses them and generates control signals.
7. CISC processors are having limited number of registers. CISC processors normally only have a single set of registers. Since the addressing modes give provisions for memory operands, limited number of register memory is sufficient for the functions.
4.2 Concepts of Cortex-A, the Cortex-R and the Cortex-M The ARM® Cortex® series of cores are having very wide range of scalable performance options. They are offering choice to designer to use best fit core for their application. The Cortex portfolio is split broadly into three main categories:
• Cortex-A -- application processor cores for a performance-intensive systems
• Cortex-R – high-performance cores for real-time applications
• Cortex-M – microcontroller cores for a wide range of embedded applications.
Cortex-A: Cortex-A processors provide a range of solutions for devices that uses of a rich operating system such as Linux or Android and are used in a wide range of applications from low-cost handsets to smart phones, tablet computers, set-top boxes and also enterprise networking equipment. The first range of Cortex-A processors (A5, A7, A8, A9, A12, A15 and A17) is based on the ARMv7-A architecture.
Cortex-R: The Cortex-R series is the smallest ARM processor. The Cortex-R processors target high-performance real-time applications such as hard disk controllers (or solid state drive controllers), networking equipment and printers in the enterprise segment, consumer devices such as Blu-ray players and media players, and also automotive applications such as airbags, braking systems and engine management. The Cortex-R series is similar in some respects to a high-end microcontroller (MCU) but targets larger systems than you would typically use a standard MCU.
Cortex-M: Cortex-M is designed specifically to target the already very crowded MCU market. The Cortex-M series is built on the ARMv7-M architecture (used for Cortex-M3 and Cortex-M4), and the smaller Cortex-M0+ is built on the ARMv6-M architecture. The Cortex-M series can be implemented as a soft core in an FPGA, for example, they are implemented as MCU with integrated memories, clocks and peripherals.
4.3 Features of ARM Microcontroller
4.3.1 ARM history: The ARM processor core originates within a British computer company called Acorn. In the mid-1980s they were looking for replacement for the 6502 processor used in their BBC computer range, which were widely used in UK schools. None of the 16-bit architectures becoming available at that time met their requirements, so they designed their own 32-bit processor. The RISC philosophy was used by ARM to create embedded processor. The first ARM processor was developed at Acorn Computers Limited. Initially ARM stood for Acorn RISC Machine. Later, it is renamed as Advanced RISC Machine. ARM Ltd now designs the ARM family of RISC processor cores with a range of other supporting technologies.ARM does not fabricate silicon itself, but instead just produces the design. It is an Intellectual Property (or IP) company.
4.3.2 Why ARM? ARM is most licensed and thus widespread processor cores in the world. It is used in PDA, cell phones, multimedia players, handheld game console digital TV and cameras. It is used especially in portable devices due to its low power consumption and reasonable performance.
4.3.3 ARM design philosophy:
• Small processor for lower power consumption (for embedded system)
• High code density for limited memory and physical size restrictions
• The ability to use slow and low-cost memory
• Reduced die size for reducing manufacture cost and accommodating more peripherals
4.3.4 ARM Architecture: The ARM Processor is either based on Von Neumann architecture or Harvard architecture. In Von Neumann architecture data items and instructions share the same bus whereas in Harvard architecture data items and instructions use separate buses. ARM7 is based on Von Neumann architecture. However, the ARM processors after ARM9 onwards are based on Harvard architecture.
The ARM components are as follows:
The register bank (r0-r15)
The barrel shifter that can shift/rotate an operand by number of bits in a single operation.
The Arithmetic Logic Unit (ALU) is used to perform arithmetic and logic operations
The address register, memory data registers and Program Counter (PC) incrementer.
The instruction decoder which decodes instructions before they are executed.
4.3.5 ARM Core Data Flow Model: When an instruction is decoded inside the ARM core and how a particular instruction is executed by interacting with the internal registers file and then send result out of the registers.
• In Von Neuman Architecture Data coming through bus is either instruction or data (same memory).
• The Sign extend hardware converts signed 8-bit & 16-bit numbers to 32-bit values as they are read from memory & placed in a register (for signed values), fill zeros if unsigned.
• Source operands (Rn & Rm) are read from the register file using the internal buses A & B respectively & result Rd is written back.
• The PC value is in the address register which is fed in to the incrementer, and then the incremented value is copied back in to r15.It is also written in to address register to be used as the address for the next instruction fetch.
• ALU: (The Arithmetic & logic Unit) or MAC (multiply & accumulate Unit) takes the register values Rn & Rm from A & B buses & computers a result).
• Data processing instructions write the result in Rd directly to the register file.
• Load & Store instruction uses the ALU to generate on Address to be to be held in the address register & broadcast on the address bus.
• Barrel shifter: One important feature of is that register Rm can be pre processed in barrel in barrel shifter before it enters the ALU [left shift, right shift, rotated etc.].Depending on the instruction Barrel Shifter may be used or it could be short circuit. Barrel shifter & ALU can calculate together a wide range of expression & address in the same cycle.
4.3. 6 ARM7 Programmer's Model: The Programmers Model can be split into two elements.
1) Processor modes
2) Processor registers.
The ARM Processor Mode:
The ARM architecture supports seven operating modes. There is one user mode and six privileged modes.
The ARM has seven basic operating modes:
User : unprivileged mode under which most tasks run
FIQ : entered when a high priority (fast) interrupt is raised
IRQ : entered when a low priority (normal) interrupt is raised
Supervisor : entered on reset and when a Software Interrupt instruction is executed
Abort : used to handle memory access violations
Undef : used to handle undefined instructions
System : privileged mode using the same registers as user mode
The user mode is the normal program execution mode. A program that writes directly to the current program status register. When an exception occurs can change the processor mode. The privileged modes are used to handle exceptions and software interrupts which suspend the normal execution of sequential instruction and jump to a specific location. The six privileged modes are abort, fast interrupt request, interrupt request, supervisor, system and undefined. Processor registers will be discussed later in the chapter.
4.3.7 ARM features: Following are the Features of ARM:
• 32/16-bit RISC architecture. 32-bit ARM instruction set for maximum performance and flexibility. 16-bit Thumb instruction set for increased code density.
• Unified bus interface, 32-bit data bus carries both instructions and data.
• Three-stage pipeline.
• 32-bit ALU.
• Very small die size and low power consumption.
• Fully static operation.
• Coprocessor interface. The principle feature of the ARM7 microcontrollers is that it is a register based load-and-store architecture with a number of operating modes. ARM7 is a 32 bit microcontroller; it is also capable of running a 16-bit instruction set, known as “THUMB”. This helps it achieve a greater code density and enhanced power saving. While all of the register-to-register data processing instructions are single-cycle, other instructions such as data transfer instructions, are multi-cycle. To assist the developer, the ARM core has a built-in JTAG debug port and on-chip “embedded ICE” that allows programs to be downloaded and fully debugged in-system. In order to keep the ARM7 both simple and cost-effective, the code and data regions are accessed via a single data bus. Thus while the ARM7 is capable of single-cycle execution of all data processing instructions, data transfer instructions may take several cycles since they will require at least two accesses onto the bus (one for the instruction one for the data). In order to improve performance, a three stage pipeline is used that allows multiple instructions to be processed simultaneously. The pipeline has three stages; FETCH, DECODE and EXECUTE. The hardware of each stage is designed to be independent so up to three instructions can be processed simultaneously. The pipeline is most effective in speeding up sequential code.
4.4 Pipeline Architecture Pipelining in the CPU enables the CPU to achieve some degree of parallelism. Every instruction in the ARM architecture goes through various stages (e.g. Fetch, Decode, Execute etc). Pipelining allows the CPU to keep processing other instructions while the previous ones go through different stages, thereby allowing the CPU to deliver a higher throughput since it each stage to run a smaller volume of the processing per cycle. Various generations of the ARM core have different stages of the pipeline (also called depth of the pipeline). E.g., ARM7TDMI has a 3 stage pipeline, whereas a more modern Cortex-A9 has 8 stages. Let’s have a look at a 3 stage ARM7TDMI pipeline:
During the Fetch stage, the instruction is fetched from memory. During Decode, the instruction is decoded and data path control signals prepared for the next cycle. During Execute, the operands are read from register banks, shifted, and combined in the ALU and the result written back. If we look at Cycle 3, we see that the processor is executing instruction 1, whilst decoding instruction 2 and Fetching instruction 3.
The ARM9 core supports a five-stage pipeline and is shown in below, which also shows how the Program Counter is affected:
4.5 Registers
In ARM, Each register is 32 bits in size. In ARM byte means 8 bit, half word means 16 bit and word means 32 bit. In ARM state all ARM instruction are 32-bits wide. In Thumb state all instructions are 16-bit wide. In total there are 37 registers out of which 17 are Visible and 20 are Banked Registers.
ARM have 1 dedicated program counter 1 dedicated current program status register 5 dedicated saved program status registers 30 general purpose registers
The registers are roughly divided into:
30 General Purpose Registers: Only 15 GPRs are visible any one time depending on the mode of operation and are numbered R0-R12, Stack Pointer and Link Register.
Stack pointer (r13) stores the top of the stack in the current processor mode.
Link register (r14) stores return addresses in subroutines or exceptions depending on the mode of operation.
Program Counter (r15): Loads the address of destinations on branching operations and may be manually set while doing subroutine calls.
Application Program Status Register (APSR): It contains a copy of flags from the ALU to check if the conditional instructions were executed.
Current Program Status Register (CPSR): It holds various information regarding APSR, current processor mode, interrupt flags, execution state bits etc.
Saved Program Status Register (SPSR): In case an exception is detected, this register holds the values of the CPSR.
Suppose Processor is in USER mode of operation and if FIQ request arrives then processor has to switch itself to FIQ mode of operation. First CPSR is get copied to SPSR then processor will serve FIQ mode. In FIQ mode processor will use registers r8-r12 with sp(r13) and lr(r14). After serving FIQ mode processor should return to USER mode and should resume its working. So SPSR is get copied again into CPSR to serve USER mode.