Elec 374

Digital Systems Engineering

 

 

 

Design of a Simple RISC Processor

Final Report

 

 

 

Mike Wood:  426-4464

Fadi Yared:  432-0222

 

 

 

Friday April 5th 2002

4:00pm

 

 

 

 

Abstract

 

This report contains design, specification, analysis and simulation of a simple RISC processor (mini SRC).  The processor design project was divided into 4 phases each phase building on the previous one.  Within this report are specifications and simulations of the major modules within the mini SRC.  These modules include: Data Path, Grxlogic , Branch Logic, Register Set, ALU, and IRQ.  As part of the testing and analysis, conclusions about the CPI, MIPS, and maximum frequency of operation are also included within this report.

 

The final product is one that meets all required specifications and includes additional features such as: interrupts, stack, and subroutine calls.  After analysis of the design it was concluded that a 3-BUS architecture would yield a significant increase in CPI and therefore it was employed.  This report concludes that further possible upgrades to the mini SRC such as 16-bit Operand support and Input/Output ports would significantly increase its usefulness in modern embedded systems applications.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table Of Contents

 

Title Page…………………………………………………………………………….

Abstract………………………………………………………………………………

Table Of Contents……………………………………………………………………

Objective……………………………………………………………………………..

Project Specification…………………………………………………………………

Design Specification………………………………...……………………………….

-- instruction set and opcode specification…………………………………………..

-- data path…………………………………………………………………………...

-- control unit………………………………………………………………………...

-- grx logic…………………………………………………………………………...

-- branch logic………………………………………………………………………..

-- register set…………………………………………………………………………

-- alu………………………………………………………………………………….

-- irq………………………………………………………………………………….

Results and Comparison …………………………………………………………….

Functional Simulation Results……………………………………………………….

-- proof of validity of instruction sequences…………………………………………

-- demonstration of irq sequence…………………………………………………….

Discussion……………………………………………………………………………

-- evolution of the design (phases 1 through 4)……………………………………...

-- extra features………………………………………………………………………

Conclusion…………………………………………………………………………...

Future work………………………………………………………………………….

 

Appendix A:  Schematic Drawings………………………………………………….

Appendix B:  VHDL of Control Unit…………………………………………..……

Appendix C1:  Program Source for Instruction Set Testing…………………………

Appendix C2:  Simulation Waveforms from Instruction Testing…………………...

1

2

3

4

5

7

7

12

13

14

14

14

15

16

19

20

20

21

23

23

27

27

27

 

1..8

1..8

1..4

1..6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Objective

 

The purpose of the project is to design and implement a fully functional processor with an assortment of common processor instructions.  The design would be completed and simulated through the use of the Altera Max+PlusII CAD software system.  The processor (also called Mini SRC) is to be implemented on a FLEX 10k FPGA chip.  The instruction set is of type (RISC) and is scaled down to the use of 2 general-purpose registers named R1 and R2.  A further objective is to create a 3-Bus architecture for the design spec. with each Bus width being 8 bits long.  This is done in hopes of reducing the number of cycles per instruction (CPI) and optimizing system performance.  Further functionality in addition to common ALU and memory access instructions are: Sub routine calls, system stack, and interrupt support.  For a detailed description of the Instruction Set for the Mini SRC see the opcode specifications included in this report.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Project Specification

 

The Project Specification was given by the Department of Electrical and Computer Engineering Queen’s University:  ELEC-374 Digital Systems Engineering.  An abridged version is reproduced below.

 

Designing a Simple Processor (Mini SRC):

-- design, simulate, implement, and verify a small processor.

-- design is to be made using the Altera Max+PlusII CAD software system.

-- processor should be implemented on the FLEX 10k FPGA chip. 

 

Properties of the Design:

-- 8-bit machine

-- two general-purpose registers named R0 and R1

-- 8-bit data paths

-- minimum goal is a 1-bus architecture

-- capable of addressing up to 256 bytes of memory

-- all instructions are 8 bits long

-- Arithmetic and Logic Unit (ALU) that performs 5 operations:  Add, Subtract, Increment by 1, Shift Right 1 bit, and Logical AND

-- support 12 instructions:  Load, Store, Load Immediate/extended, Store extended, Add, Subtract, Branch, Shift, AND, No-operation, and Stop.

-- instructions encoded into a 4-bit field at the higher-order end of an instruction

 

Details of the Instruction format for each instruction were given in the original spec.  Because a detailed description of each instruction and opcode is given in the design spec it is omitted here.

 

Processor State:

-- PC<7..0>: 8-bit register named Program Counter (PC)

-- IR<7..0>: 8-bit register named Instruction Register (IR)

-- R[0..1]<7..0>: two 8-bit general purpose registers named R[0] and R[1]

-- Run: 1-bit run/halt indicator

-- Start: Start signal

-- Reset: Reset signal

 

Memory State:

-- M[0..255]<7..0>: 256 1-byte words of memory

 

Additional Features:

--  multi-bus architecture

--  new instructions (NEG, OR, INPUT, OUTPUT CALL, RETURN etc.)

--  stack

--  support for interrupt handling

 

 

Phases of Design:

-- 1:  Design and test the Data Path and the ALU using Functional Simulation.

-- 2:  Add logic for selecting R0 and R1 from the ra, rb, rc fields in the instructions and add logic for evaluation whether or not to follow a branch.  Also implement the memory interface design.  Test using Functional Simulation.

-- 3:  Design and test the Control Unit using Functional Simulation.

-- 4:  Integrate the Data Path and Control Unit into a single design and tested using both

Functional Simulation and Timing Simulation for an implementation in a FLEX 10k FPGA chip.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Design Specification

 

Instruction Set and Opcode Specification

 

Note 1:  In many opcodes, bit0 is used to distinguish between two actions such as Add and Subtract.  The notation 1/0 is used.  I.e.:  a/s implies that 1 indicates Add and 0 indicates Subtract

 

Note 2:  Bits 7 to 4 are the opcodes proper.  This value is also shown at the top right of each opcode description in both binary and hex.

 

Note 3:  Fields:  ra, rb, rc each indicate a register.  0 indicates R0, 1 indicates R1.

 

Note 4:  “--” indicates that the field is unused.

 

 

Load                                                    (0000b) (0h)

7

6

5

4

3

2

1

0

0

0

0

0

ra

rb

c2

If rb=1 then                  // Indexed/Indirect

R[ra] ί M[R1+c2]

Else rb=0 then              // Direct           

            R[ra] ί M[c2]

End If

 

c2 is a sign extended 2’s compliment number

I.e.:  it can have values:  +1, 0, -1, -2

 

 

Store                                                    (0001b) (1h)

7

6

5

4

3

2

1

0

0

0

0

1

ra

rb

c2

If rb=1 then                  // Indexed/Indirect

            M[R1+c2] ί R[ra]

Else rb=0 then              // Direct

            M[c2] ί R[ra]

End If

 

c2 is a sign extended 2’s compliment number

I.e.:  it can have values:  +1, 0, -1, -2

 

 

 

Load Immediate / Load Extended         (0010b) (2h)

7

6

5

4

3

2

1

0

0

0

1

0

ra

--

--

i/x

If i/x=1 then                  // Immediate

            R[ra] ίM[PC+1]

Else i/x=0 then  // Extended

            R[ra] ίM[M[PC+1]]

End If

 

 

Store Extended                         (0011b) (3h)

7

6

5

4

3

2

1

0

0

0

1

1

ra

--

--

--

M[M[PC+1]] ί R[ra]

 

 

Add / Subtract                                      (0100b) (4h)

7

6

5

4

3

2

1

0

0

1

0

0

ra

rb

rc

a/s

If a/s=1 then                 // Add

            R[ra] ί R[rb] + R[rc]

Else a/s=0 then // Subtract

            R[ra] ί R[rb] – R[rc]

End If

 

Addition and subtraction is in 2’s compliment.

 

 

Enable IRQ / Disable IRQ                    (0101b) (5h)

7

6

5

4

3

2

1

0

0

1

0

1

ra

--

--

e/d

If e/d=1 then                 // Enable IRQ and set value of Period Register

            // IRQ is enabled

            Period ί R[ra]

Else e/d=0 then // Disable IRQ

            // IRQ is disabled

End If

 

 

 

 

 

 

 

 

And / Or                                              (0110b) (6h)

7

6

5

4

3

2

1

0

0

1

1

0

ra

rb

rc

a/o

If a/o=1 then                 // Logical Bitwise And

            R[ra] ί R[rb] and R[rc]

Else a/o=0 then // Logical Bitwise Or

            R[ra] ί R[rb] or R[rc]

End If

 

 

Branch                                                 (0111b) (7h)

7

6

5

4

3

2

1

0

0

1

1

1

ra

rb

C

PC ί R[ra] if R[rb] meets the condition c

 

c = 00

Always

Branch always.

c = 01

Zero

Branch if the contents of R[rb] is zero.

c = 10

Nonzero

Branch if the contents of R[rb] is nonzero.

c = 11

Minus

Branch if the contents of R[rb] is negative.

 

 

Shift Right / Shift Left                (1000b) (8h)

7

6

5

4

3

2

1

0

1

0

0

0

ra

rb

c

r/l

If r/l=1 then                  // Logical Shift Right by 1 bit

            R[ra] ί 0 # R[rb]<7..1>

Else r/l=0 then              // Logical Shift Left by 1 bit

            R[ra] ί R[rb]<6..0> # 0

End If

 

# means concatenate

R[rb]<x..y> means bits x to y of R[rb]

 

 

No Operation                                       (1001b) (9h)

7

6

5

4

3

2

1

0

1

0

0

1

--

--

--

--

Waste one cycle.

 

 

Stop                                                     (1010b) (Ah)

7

6

5

4

3

2

1

0

1

0

1

0

--

--

--

--

Stop processing instructions.

Return From ISR                                  (1011b) (Bh)

7

6

5

4

3

2

1

0

1

0

1

1

--

--

--

--

R1 ί M[SP + 1]

R0 ί M[SP + 2]

PC ί M[SP + 3]

 

When IRQ is received, the system automatically stacks PC, R0, R1 and disables IRQ.

Hence, when Return from ISR opcode is read, these are un-staked in reverse order.

 

 

Negate                                                 (1100b) (Ch)

7

6

5

4

3

2

1

0

1

1

0

0

ra

rb

--

--

R[ra] ί not(R[rb])

 

Where not(x) is a bitwise negation of x.

 

 

Increment / Decrement              (1101b) (Dh)

7

6

5

4

3

2

1

0

1

1

0

1

ra

rb

--

i/d

If i/d=1 then                  // Unsigned Increment

            R[ra] ί R[rb] + 1

Else i/d=0 then  // Unsigned Decrement

            R[ra] ί R[rb] – 1

End If

 

Numbers are considered as unsigned numbers, hence FF-01=FE

 

 

Call Sub / Return From Sub                  (1110b) (Eh)

7

6

5

4

3

2

1

0

1

1

1

0

ra

--

--

c/r

If c/r=1 then                 // Call Subroutine

            PC ί R[ra]

Else c/r=0 then // Return From Subroutine

            PC ί M[SP + 1]

End If

 

When calling a subroutine, the system automatically stacks only the PC

SP points to the next empty cell in memory

The stack grows downwards in memory I.e.: FF then FE the FD etc.

SP is automatically initialized to FF

 

Push / Pull                                            (1111b) (Fh)

7

6

5

4

3

2

1

0

1

1

1

1

ra

--

--

h/l

If h/l=1 then                  // Push onto Stack

            M[SP] ί R[ra]

Else h/l=0 then  // Pull from Stack

            R[ra] ί M[SP + 1]

End If

 

When calling a subroutine, the system automatically stacks only the PC

SP points to the next empty cell in memory

The stack grows downwards in memory I.e.: FF then FE the FD etc.

SP is automatically initialized to FF

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Data Path  (The overall interconnection of components)  See Appendix A, page 1.

 

The data path connects the main components of the system.  Also in this schematic, the memory interface is specified.  The design uses a three bus system, labeled A, B and C.  The register set takes data input from all three of these busses, and nowhere else.  Values from Registers and Ram can be put only on Busses A and B.  The schematic for Busses A and B is given on page 2 of Appendix A.  Bus C is the output of the ALU.  System Registers are synchronous to the falling edge of the clock, but the Control Unit is synchronous to the rising edge of the clock.  Hence, control signals are generated half a cycle before registers clock in new values.  Busses A and B and the ALU are asynchronous, so they respond to control signals immediately.  As a result of this setup, values can be passed between registers and/or through the ALU in one clock cycle, without the necessity of latching values.

 

An example of a one cycle sequence is addition.  At the rising edge of the cycle, control signals are generated telling Bus A to carry the value of R0, telling Bus B to carry the value of R1, telling the ALU to add its two inputs and telling R0 to input the value from Bus C.  The asynchronous Busses and ALU respond immediately and in a very shout time, the result of the addition propagates through to Bus C.  On the falling edge of the cycle, R0 clocks in the value on Bus C which is the proper result.

 

The RAM is synchronous to the rising edge of the clock.  It communicates to the system through registers MD, MA and Bus A.  By default, Ram is set to Read (i.e. memWrite=0); therefore, on every clock cycle, it generates at its output, the value at the address specified in MA.

 

A memory read can be accomplished by generating control signals at the rising edge of clock 1 such that MA latches the address at the falling edge of clock 1.  Ram will then be outputting the requested data at the rising edge of clock 2 which can be latched into the desired register on the falling edge of clock 2.  Therefore a read takes 2 cycles.

 

A memory write can be accomplished by generating control signals at the rising edge of clock 1 such that MA latches the address and MD latches the data to be written at the falling edge of clock 1.  Ram will then write the specified data to the specified address at the rising edge of clock 2 (i.e. the end of clock 1).  Therefore a write takes 1 cycle.

 

Special logic was necessary for the memWrite control signal.  Because control signals are rising edge, it would be de-asserted just as Ram (also rising edge) was supposed to read it.  It was, consequently, necessary to add a flip flop to delay this signal half a clock cycle. This delay flip flop is labeled on the schematic.

 

For Interrupt Service Routine support, the user must be able to specify the address of their ISR.  This system requires them to write the address of their ISR to memory location E0.  The system consequently needs to be able to read specifically from that address, hence the constant E0 and MA are multiplexed to the Address input of the Ram.

Control Unit  (Generation of all system control signals)  See Appendix B.

 

An external decoder, translates the four bit opcode from IR[7..3] into 16 signals, one for each opcode.  The inputs to the Control unit are:

-         the above 16 signals

-         IR[0] and IR[2] used for distinguishing between actions within a given op code.  For example:  Add and Sub have the same op code and IR[0] distinguishes them.

-         CON which is the result of decision logic for branching.

-         IRQ which is the interrupt service request signal.

-         Clock which is the system clock.

-         Reset which resets the system.

The control unit uses these inputs to generate all system control signals.  The control unit is written as a single process.  Within this process is a conditional:  if the reset signal is received then the unit goes to the reset state, else if there is a rising clock edge then the system resets all control signals then sets control signals and determines its next state based on its current state.

 

The Reset State:  (Rset)

In this state, the system zeros all registers, except for SP which it sets to FF.  It also initializes run, and disables IRQ.  It sets the next state to T0 which is the beginning of Opcode Fetch.  Hence, all programs must start at address 00 because that address is expected to hold the first opcode after reset.

 

Opcode Fetch:  (states T0 and T1)

Every operation except the servicing of an ISR begins with the opcode fetch carried out in states T0 and T1.  It is only in state T0 that the system checks the IRQ signal.  If there is an Interrupt Service Request then the system stacks the system state and services the interrupt.  Otherwise it fetches the next opcode and carries out the instruction sequence to completion.

 

State T2 is the most complex because it is in this state that the control unit considers the opcode held in IR.  Most instructions are completed in this state and set the next state to T0.  Some instructions require more clock cycles and, consequently, have extra states.

 

Format of a State:

In any state, first the control unit sets the appropriate signals, then it sets the next state.  It may make a decision based on the contents of the IR as to what signals to set or which state should be next, but that is often inherent to the state and no therefore decisions are necessary.

 

 

 

 

Grx Logic  (Generation of control signals for R0 & R1)  See Appendix A, page 3.

 

The Grxlogic module for the mini SRC is responsible for the generation of the control signals for R0 and R1 (see Appendix A pg 3).  More Specifically, this module is responsible interfacing the Instruction Register fields a , b, c with actual registers.  This is done through a grouping of  ‘Sum of Products’ logic gates.  Inputs to the Grxlogic module are IR fields [3..1] and the Grx signals which are generated by the Control Unit.  The outputs of the Grxlogic module are the control signals for the desired register to be activated.

 

Branch Logic  (Decision logic for branches)  See Appendix A, page 4.

 

This module is responsible for the interface between the IR branch fields and the control unit.  This module interprets the appropriate field within the instruction register and determines if the branch condition is met.  If the branch condition is met a CON signal is asserted and is processed by the control unit.  The inputs to this module are IR fields [1..0] , BUSA, clock, CONclear, and CONin.  The output is simply the CON signal, which is sent to the control unit.

 

Register Set  (All system registers)  See Appendix A, page 5.

 

The register set is the collection of registers that are available in the system.  For simplicity they were grouped together within a module.  Some of the registers within the module have the ability to accept input from multiple buses (reg2input), the interface logic for this can be seen in Appendix A (pg6).  The inputs to the Register Set module are the control signals which enable the specific registers, the 3 Bus lines (A,B,C) clock, and clear.  While the outputs of the Register Set module are the outputs of the individual registers within the module.

 

 

 

 

 

 

 

 

 

ALU  (Arithmetic and Logic Unit)  See Appendix A, page 7.

 

The ALU can perform 10 functions:

 

  1. Logical Bit-wise OR of Bus A and Bus B
  2. Logical Bit-wise AND of Bus A and Bus B
  3. Logical Bit-wise NEGATE of Bus B
  4. Logical Shift Right (by one bit) of Bus B
  5. Logical Shift Left (by one bit) of Bus B
  6. Addition:  Bus B + Bus A
  7. Subtraction:  Bus B – Bus A
  8. Increment Bus B by 1
  9. Decrement Bus B by 1
  10. Sign Extend IR[1..0]
  11. Addition:  Bus A + Sign Extended IR[1..0]

 

Most of the implementation is straightforward and apparent from the schematic; however, the interface to the adder/subtracter is somewhat complicated.  Each if its inputs are multiplexed with two input multiplexers.  This allows us to consider four useful combinations:

 

  1. Bus A and Bus B.  This is useful for addition
  2. Bus B and Zero.  This is useful for incrementing and decrementing.
  3. Sign Extended IR[1..0] and Zero.  This is necessary for Direct Load and Store instructions.
  4. Sign Extended IR[1..0] and Bus A.  This is necessary for Indexed/Indirect Load and Store instructions.

 

 

 

 

 

 

 

 

 

 

 

IRQ  (Interrupt Support)  See Appendix A, page 8.

 

Description:

 

The mini SRC processor has a Timer interrupt system.  The interrupt signal is generated by a free running 8 bit counter which is clocking with the processor clock.  Configuring the system for interrupt support is done through the use of reserved opcodes that have been hardwired into the system.  A detailed description if the interrupt system is as follows:

 

Implementation (see IRQ.GDF spec):

 

As the mini SRC clock runs, a free running 8 bit counter is incremented by 1 with each clock cycle.  When the counter reaches FF it simply rolls over and begins counting again at 00.  With each clock cycle the value in the free running counter is compared to a user specified 8 bit value that is stored within a register.  The comparison is done with a simple 8 bit compare circuit available in the MEGA_LPM package.  If the comparison yields a match, and the user has specified interrupts to be enabled within the system, an IRQ signal is generated from the interrupt circuitry and the control unit begins to process the interrupt service     routine (ISR).  By design , the address of the ISR is stored at location E0 in memory.  This memory address is reserved for the ISR jump vector and the user should be careful that E0 contains the desired starting address for an ISR.  When the system breaks to process an ISR it stores the current state of the system (ie: register values, Program counter.) onto the system stack and begins processing the ISR at the user specified address.  While the system is processing the ISR , it ignores interrupts in order to allow processing to be complete, it is the users responsibility to ensure the Return from interrupt opcode is placed at the end of the ISR.  Once the control unit detects the Return from Interrupt opcode it proceeds to generate the appropriate signals that pull (from the stack) the system state information back into the appropriate registers.

 

User Guide:

 

The following is a guide – line for interrupt support:

 

Steps:

 

1 - Store the desired ISR starting address (jump vector) at address E0 in memory.  Remember that E0 is a RESERVED address and should contain the starting address of the ISR if interrupt functionality is desired.

 

2 - Enable interrupts using the Enable interrupts Op code (see opcode spec), the opcode requires for the user to specify the location of the interrupt period (in R0 or R1).  The value in the specified register will correspond to the clock number at which to generate an interrupt.

 

3 -At the end of the interrupt service routine the user should use the Return from ISR opcode (see opcode spec) to complete the process.

 

The following page contains a sample program and Timing Diagram for the Interrupt process

 

The following program is a guide to using interrupts with the Mini SRC:

 

The interrupt is set to fire after 34 clock cycles (22 hex).  It was chosen that this interrupt would happen during a sequence of NOP instructions.  The user specifies the ISR starting address to be D0.  Finally, within the ISR, registers R0 and R1 are both loaded with 99 (hex) to demonstrate that the ISR is being serviced. 

 

DEPTH = 256; % Memory depth and width are required          %

WIDTH = 8;    % Enter a decimal number        %

 

ADDRESS_RADIX = HEX;   

DATA_RADIX = HEX;          

 

CONTENT

            BEGIN

 

            01        :           21;       % ldi , R0 <= 22 %

            02        :           22;       %Fire interrupt after 22(hex) clock cycles%

            03        :           29;       %ldi R1<= D0%

            04        :           D0;      %ISR address D0%

            05        :           38;       %stx E0 <= R1 (DO)%

            06        :           E0;       %Write D0 to reserved address E0%

            07        :           51;       % enable interrupts, get interrupt period (22) from R0%

            08        :           28;       % Ldx R1 %

            09        :           03;       % addr %

            0A       :           90;       % NOP %

            0B        :           90;       % NOP %       <--- Expect interrupt to fire in here

            0C       :           90;       % NOP %

            0D       :           90;       % NOP %

            0E        :           42;       % Add: R0 = R0 + R1 = 43%

           

%%%%%% THE ISR %%%%%%%

 

            D0       :           21;

            D1       :           99;

            D2       :           29;

            D3       :           99;

            D4       :           B0; %return from interrupt%

END ;

 

The following is a simulation of the above program, note that the IRQ signal is a pulse that occurs after 34 (22 hex) clock cycles, somewhere within the NOP instruction sequence (NOP  is opcode 90)

 

 

 


 

 

 

 

 

 

 

 

 

 

 

 

 

Results and Comparison

 

Instruction

Number of Cycles

Service Interrupt Request

7

load

4

store

4

ldi

4

ldx

5

stx

4

add

3

sub

3

enable irq

3

disable irq

4

and

3

or

3

branch

4

shift R

3

Sihft L

3

NOP

3

STOP

3

Return from ISR

8

Neg

3

INC

3

DEC

3

Call Subroutine

4

Return from Subroutine

4

Push

4

Pull

4

Average Cycles per instruction (CPI)

96 / 25 = 3.84

 

-- maximum frequency of operation in the simulator                  

            15 Mhz

-- maximum frequency of operation on the chip

15 Mhz

-- average Cycle Per Instruction (CPI) for the program

            3.84

-- MIPS rating

(for 15mhz physical test run) = 15Mhz / 3.84 cpi = 3.9 MIPS

-- percentage of chip area

memory utilized : 16 %

LCs utilized :     40 %

# of LC’s : 464

 

 

Functional Simulation Results

 

Proof of Validity of Instruction Sequences

 

All instructions (except interrupt related instructions) where rigorously tested with a single program.  The program source is given in Appendix C1 and the resulting simulation waveform is given in Appendix C2.  Appendix C2 is heavily documented, and each instruction setup and test is labeled and explained.  The following as an abstracted summary of those results.

 

Load Immediate and Load Extended are both used extensively throughout the program and are easily verified as accurate. 

 

Addition and Subtraction consider the operands to be in 2’s compliment.  Addition of two positive numbers was successful.  Addition of a positive and a negative number was successful.  Subtraction of two positive numbers yielding a negative result was successful.  Subtraction of two positive numbers yielding a negative result was successful.  Also, near the end of the program, Subtraction of two negative numbers was found to be successful. 

 

Logical bit-wise And, logical bit-wise Or, Shift Right one bit position, Shift Left one bit position, NOP and logical bit-wise Negate were all tested and found to be successful.

 

Increment and Decrement consider the operands to be unsigned.  Both were successfully tested on high ($CC) and low ($15) numbers.

 

Regarding stack support and Subroutines: 

Instructions Push and Pull were successfully tested.  In the wave form it can be observed that the SP properly decrements and increments and the proper values are written to and read from the proper (SP) memory locations.

Subroutine Call and Return were successfully tested.  In the wave form it can be observed that the PC is properly pushed onto the stack and program flow does skip to the subroutine’s address.  The Return instruction properly pulls the PC from the stack and resumes program flow immediately after the original Call command.

 

All four Branch conditions were successfully tested.  For each a test value was loaded into R1 and when that value met the condition specified in the Branch instruction, program flow properly skipped to the branch’s target address.

 

Store Direct and Load Direct were successfully tested.  Values were stored to the four possible addresses and were subsequently successfully loaded from the four possible addresses.

 

Store Indexed/Indirect and Load Indexed/Indirect were successfully tested.  An address was loaded into R1 and values were stored to the four possible offsets from that address.  The values were subsequently successfully loaded from the four same addresses thereby proving the validity of the instruction sequences.

 

At the end of the program, the Stop instruction successfully halts program flow.

 

 

 

 

 

Demonstration of IRQ Sequence

 

The IRQ signal-generating module was tested and simulated both independent of the data path and integrated into the data path.  The following simulation result is of the independent test it can be seen on the next page of this report (see IRQ design spec for integrated simulation result). 

 

As shown in the simulation results, Bus A had the value 03 placed onto it and this value was latched into the Period register.  This value is the interrupt period value and thus an interrupt is generated on the 3rd clock cycle.  After the interrupt is asserted, a clear IRQ signal is asserted (normal by Control Unit) causing the interrupt signal to be set low again.  All functional simulation signals behave as expected and confirm proper operation of the interrupt module

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Discussion

 

Evolution of the Design: 

 

Phase 1

 

The objective of Phase 1 was to design and implement the Data path for the mini SRC.  This data path consisted of a 1-BUS 8 bit wide system.  After completion of the 1-BUS system a 3-BUS implementation would be straightforward.  In anticipation of a switch to a 3-BUS system the registers were all placed into a single Design schematic (see Appendix A) that would lend itself to modularity.  An abstract view of the data path is as follows:

 

 

                                                Mini SRC data Path (figure from ELEC 374 course material)

 

The Bus was implemented using a simple 8 input multiplexer with 3 selects.  Some of the components along the data path were not implemented within the first phase and thus the signals needed were manually generated during simulation.  The final product at the end of the first phase was a functional data path along with limited ALU functionality.  All opcode decoding, was implemented in a temporary fashion which would suffice for the testing and simulation of the data path.

 

Phase 2

 

During Phase 2 much of the Instruction Decoding logic was added to the mini SRC.  Outputs from the instruction register were passed through combinational logic so that op code instructions could be decoded.  Furthermore, memory interface design was added to the data path to allow for memory accesses.  The type of Ram used was Synchronous RAM with 256 accessible memory locations each containing 8-bits of data.  Further modification was made to the ALU design in order to incorporate the instruction-decoding scheme employed.  At the end of the phase memory access instructions, ALU instructions and , Branch instruction were all tested and simulated for expected results.

 

Key Schematics for phase 2:

                        Memory Interface Design (figure from ELEC 374 course material)

 

 

Register Select Decoding Logic (figure from ELEC 374 course material)

 

Phase 3

 

The major task of Phase 3 was the addition of a control unit.  The control unit would contain the cycle-by-cycle instruction sequence of all opcodes.  Implementation of the control unit was done through a finite state machine that was written in VHDL.  After the control unit was written in VHDL it was placed within the data path, tested and simulated.  Furthermore, the entire processor was tested by writing a small program into RAM and verifying simulation results.

 

A general Schematic of the Control unit and decoding information:

 

 

 

                                    Control Unit and Instruction Register interface logic

                                                            (Figure from ELEC 374 course material)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phase 4

 

The major upgrades to the processor in this phase were as follows: 3-BUS implementation, System stack support, Sub Routine call support, and finally Interrupt support.  The move to a 3-BUS architecture required a thorough reassessment of many system components.  Major changes had to be applied to the control unit now since the additional Bus support would mean some cycles in certain instructions could be eliminated.  Furthermore, every instruction sequence in the control unit had to be re – coded to incorporate multi – Bus support.  Components such as Temp register A and C were removed since they were no longer needed; operations could be preformed in parallel on the additional Buses.   After all modification and upgrades were complete, a full timing and functional simulation was Performed and the processor was then uploaded onto a FLEX EPF10K20 chip and tested from 1 – 15 MHZ.  A program that had been written into memory was observed by noting appropriate values that would appear on the LED’s on the ALTERA evaluation board.

 

The program that was tested on the hardware is shown below:

 

NOTE: Value of R0 was observed on LED’s

 

 

ORG 00

ldi R0,  $67 ;                R0 = $67

ldx R1, $80 ;                R1 = ($80) = $44

add R0, R0, R1 ;          R0 = $AB

and R1, R0, R1 ;          R1 = 00

ldx R1, $81 ;                R1 = ($81) = 01

sub R0, R0, R1 ;          R0 = $AB – 1 = $AA

shr R0, R0, 0 ; R0 = $55

stx R0, $80 ;                ($80) = $55

bral R0, R1 ;                PC = $55

 

ORG $55

loop nop

brnz R0, R1 ;                branch to loop for ever

 

 

 

 

 

 

 

Extra Features

 

In addition to the functionality required by the official specification sheet, the features below were added.  Note that even though load and store were not necessary (in light of Ldi, Ldx, Stx), they also have been implemented.

 

1- Load – loads from 1 of 4 specified address or from address + offset

2- Store - stores to 1 of  4 specified address or from address + offset

3- OR – Bit-wise logical OR of specified values.

4- ShiftL – shifts specified register value to the left.

5- NEG – negates specified register value

6- INC / DEC – increment or decrement specified register value by 1

7- 3-Bus architecture

8- Timer Interrupt  – user can enable/disable interrupts to occur at a given clock #

9- Stack (push , pull) – user has access to a Stack

10- Subroutines (Call Routine, Return ) – user has ability to jump to a specified routine          and return from it.

 

Conclusion

 

After completing the mini SRC it was found that the most significant improvement in performance was gained by implementing a 3-Bus architecture.  This Greatly reduced our CPI and resulted in performance increases.  Unfortunately,  the addition of a 3-BUS architecture introduced some timing issues which may have reduced our maximum frequency of operation.

 

The mini SRC that was produced is capable of performing many common instructions that are typically required by processors used in small embedded systems.  The additon of interrupt handling makes the mini SRC have even more applications such as timers, and more complicated user – feedback systems.

 

Future work

 

Examples of future work that can be undertaken for the mini SRC include

 

1-     The addition of Input and Output ports with handshaking

2-     Different Types of interrupts (Input compare / output compare)

3-     16 bit Operand support instead of 8 bit

4-     Additional ALU operations such as Multiply / Divide

5-     Condition Code register which sets bits after register contents have changed

Condition bits : Negative, Overflow, ZERO, Non-ZERO