Digital Systems
Engineering
Design of a
Simple RISC Processor
Final Report
Mike Wood: 426-4464
Abstract
This
report contains design, specification, analysis and simulation of a simple RISC
processor (mini SRC). The processor
design project was divided into 4 phases each phase building on the previous
one. Within this report are
specifications and simulations of the major modules within the mini SRC. These modules include: Data Path, Grxlogic ,
Branch Logic, Register Set, ALU, and IRQ.
As part of the testing and analysis, conclusions about the CPI, MIPS,
and maximum frequency of operation are also included within this report.
The final product is one that
meets all required specifications and includes additional features such as:
interrupts, stack, and subroutine calls.
After analysis of the design it was concluded that a 3-BUS architecture
would yield a significant increase in CPI and therefore it was employed. This report concludes that further possible
upgrades to the mini SRC such as 16-bit Operand support and Input/Output ports
would significantly increase its usefulness in modern embedded systems
applications.
Table Of Contents
|
Title
Page
. Abstract
Table Of
Contents
Objective
.. Project
Specification
Design
Specification
...
. --
instruction set and opcode specification
.. -- data
path
... --
control unit
... -- grx
logic
... --
branch logic
.. --
register set
--
alu
. --
irq
. Results
and Comparison
. Functional
Simulation Results
. -- proof
of validity of instruction sequences
-- demonstration
of irq sequence
. Discussion
--
evolution of the design (phases 1 through 4)
... -- extra
features
Conclusion
... Future
work
. Appendix
A: Schematic
Drawings
. Appendix
B: VHDL of Control
Unit
..
Appendix
C1: Program Source for Instruction Set
Testing
Appendix
C2: Simulation Waveforms from
Instruction Testing
... |
1 2 3 4 5 7 7 12 13 14 14 14 15 16 19 20 20 21 23 23 27 27 27 1..8 1..8 1..4 1..6 |
Objective
The purpose of the project is to design and implement a
fully functional processor with an assortment of common processor instructions. The design would be completed and simulated
through the use of the Altera Max+PlusII CAD software system. The processor (also called Mini SRC) is to be
implemented on a FLEX 10k FPGA chip. The
instruction set is of type (RISC) and is scaled down to the use of 2
general-purpose registers named R1 and R2.
A further objective is to create a 3-Bus architecture for the design
spec. with each Bus width being 8 bits long.
This is done in hopes of reducing the number of cycles per instruction
(CPI) and optimizing system performance.
Further functionality in addition to common ALU and memory access
instructions are: Sub routine calls, system stack, and interrupt support. For a detailed description of the Instruction
Set for the Mini SRC see the opcode specifications included in this report.
Project
Specification
The Project Specification was given by the Department of Electrical and Computer Engineering Queens University: ELEC-374 Digital Systems Engineering. An abridged version is reproduced below.
Designing a Simple Processor (Mini SRC):
-- design, simulate, implement, and verify a small processor.
-- design is to be made using the Altera Max+PlusII CAD software system.
-- processor should be implemented on the FLEX 10k FPGA chip.
Properties of the Design:
-- 8-bit machine
-- two general-purpose registers named R0 and R1
-- 8-bit data paths
-- minimum goal is a 1-bus architecture
-- capable of addressing up to 256 bytes of memory
-- all instructions are 8 bits long
-- Arithmetic and Logic Unit (ALU) that performs 5 operations: Add, Subtract, Increment by 1, Shift Right 1 bit, and Logical AND
-- support 12 instructions: Load, Store, Load Immediate/extended, Store extended, Add, Subtract, Branch, Shift, AND, No-operation, and Stop.
-- instructions encoded into a 4-bit field at the higher-order end of an instruction
Details of the Instruction format for each instruction were given in
the original spec. Because a detailed
description of each instruction and opcode is given in the design spec it is
omitted here.
Processor State:
-- PC<7..0>: 8-bit register named Program Counter (PC)
-- IR<7..0>: 8-bit register named Instruction Register (IR)
-- R[0..1]<7..0>: two 8-bit general purpose registers named R[0] and R[1]
-- Run: 1-bit run/halt indicator
-- Start: Start signal
-- Reset: Reset signal
Memory State:
-- M[0..255]<7..0>: 256 1-byte words of memory
Additional Features:
-- multi-bus architecture
-- new instructions (NEG, OR, INPUT, OUTPUT CALL, RETURN etc.)
-- stack
-- support for interrupt handling
Phases of Design:
-- 1: Design and test the Data Path and the ALU using Functional Simulation.
-- 2: Add logic for selecting R0 and R1 from the ra, rb, rc fields in the instructions and add logic for evaluation whether or not to follow a branch. Also implement the memory interface design. Test using Functional Simulation.
-- 3: Design and test the Control Unit using Functional Simulation.
-- 4: Integrate the Data Path and Control Unit into a single design and tested using both
Functional Simulation and Timing Simulation for an implementation in a FLEX 10k FPGA chip.
Design
Specification
Instruction Set and Opcode
Specification
Note 1: In many opcodes, bit0 is used to distinguish between two actions such as Add and Subtract. The notation 1/0 is used. I.e.: a/s implies that 1 indicates Add and 0 indicates Subtract
Note 2: Bits 7 to 4 are the opcodes proper. This value is also shown at the top right of each opcode description in both binary and hex.
Note 3: Fields: ra, rb, rc each indicate a register. 0 indicates R0, 1 indicates R1.
Note 4: -- indicates that the field is unused.
Load (0000b) (0h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
0 |
0 |
0 |
ra |
rb |
c2 |
|
If rb=1 then // Indexed/Indirect
R[ra] ί M[R1+c2]
Else rb=0 then // Direct
R[ra] ί M[c2]
End If
c2 is a sign extended 2s compliment number
I.e.: it can have values: +1, 0, -1, -2
Store (0001b) (1h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
0 |
0 |
1 |
ra |
rb |
c2 |
|
If rb=1 then // Indexed/Indirect
M[R1+c2] ί R[ra]
Else rb=0 then // Direct
M[c2] ί R[ra]
End If
c2 is a sign extended 2s compliment number
I.e.: it can have values: +1, 0, -1, -2
Load Immediate / Load Extended (0010b) (2h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
0 |
1 |
0 |
ra |
-- |
-- |
i/x |
If i/x=1 then // Immediate
R[ra] ίM[PC+1]
Else i/x=0 then // Extended
R[ra] ίM[M[PC+1]]
End If
Store Extended (0011b) (3h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
0 |
1 |
1 |
ra |
-- |
-- |
-- |
M[M[PC+1]] ί R[ra]
Add / Subtract (0100b) (4h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
1 |
0 |
0 |
ra |
rb |
rc |
a/s |
If a/s=1 then // Add
R[ra] ί R[rb] + R[rc]
Else a/s=0 then // Subtract
R[ra] ί R[rb] R[rc]
End If
Addition and subtraction is in 2s compliment.
Enable IRQ / Disable IRQ (0101b) (5h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
1 |
0 |
1 |
ra |
-- |
-- |
e/d |
If e/d=1 then // Enable IRQ and set value of Period Register
// IRQ is enabled
Period ί R[ra]
Else e/d=0 then // Disable IRQ
// IRQ is disabled
End If
And / Or (0110b) (6h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
1 |
1 |
0 |
ra |
rb |
rc |
a/o |
If a/o=1 then // Logical Bitwise And
R[ra] ί R[rb] and R[rc]
Else a/o=0 then // Logical Bitwise Or
R[ra] ί R[rb] or R[rc]
End If
Branch (0111b) (7h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
0 |
1 |
1 |
1 |
ra |
rb |
C |
|
PC ί R[ra] if R[rb] meets the condition c
|
c = 00 |
Always |
Branch always. |
|
c = 01 |
Zero |
Branch if the contents of R[rb] is zero. |
|
c = 10 |
Nonzero |
Branch if the contents of R[rb] is nonzero. |
|
c = 11 |
Minus |
Branch if the contents of R[rb] is negative. |
Shift Right / Shift Left (1000b) (8h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
0 |
0 |
0 |
ra |
rb |
c |
r/l |
If r/l=1 then // Logical Shift Right by 1 bit
R[ra] ί 0 # R[rb]<7..1>
Else r/l=0 then // Logical Shift Left by 1 bit
R[ra] ί R[rb]<6..0> # 0
End If
# means concatenate
R[rb]<x..y> means bits x to y of R[rb]
No Operation (1001b) (9h)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
0 |
0 |
1 |
-- |
-- |
-- |
-- |
Waste one cycle.
Stop (1010b) (Ah)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
0 |
1 |
0 |
-- |
-- |
-- |
-- |
Stop processing instructions.
Return From ISR (1011b) (Bh)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
0 |
1 |
1 |
-- |
-- |
-- |
-- |
R1 ί M[SP + 1]
R0 ί M[SP + 2]
PC ί M[SP + 3]
When IRQ is received, the system automatically stacks PC, R0, R1 and disables IRQ.
Hence, when Return from ISR opcode is read, these are un-staked in reverse order.
Negate (1100b) (Ch)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
1 |
0 |
0 |
ra |
rb |
-- |
-- |
R[ra] ί not(R[rb])
Where not(x) is a bitwise negation of x.
Increment / Decrement (1101b) (Dh)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
1 |
0 |
1 |
ra |
rb |
-- |
i/d |
If i/d=1 then // Unsigned Increment
R[ra] ί R[rb] + 1
Else i/d=0 then // Unsigned Decrement
R[ra] ί R[rb] 1
End If
Numbers are considered as unsigned numbers, hence FF-01=FE
Call Sub / Return From Sub (1110b) (Eh)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
1 |
1 |
0 |
ra |
-- |
-- |
c/r |
If c/r=1 then // Call Subroutine
PC ί R[ra]
Else c/r=0 then // Return From Subroutine
PC ί M[SP + 1]
End If
When calling a subroutine, the system automatically stacks only the PC
SP points to the next empty cell in memory
The stack grows downwards in memory I.e.: FF then FE the FD etc.
SP is automatically initialized to FF
Push / Pull (1111b) (Fh)
|
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
1 |
1 |
1 |
1 |
ra |
-- |
-- |
h/l |
If h/l=1 then // Push onto Stack
M[SP] ί R[ra]
Else h/l=0 then // Pull from Stack
R[ra] ί M[SP + 1]
End If
When calling a subroutine, the system automatically stacks only the PC
SP points to the next empty cell in memory
The stack grows downwards in memory I.e.: FF then FE the FD etc.
SP is automatically initialized to FF
Data Path (The overall interconnection of components) See Appendix A, page 1.
The data path connects the main components of the system. Also in this schematic, the memory interface is specified. The design uses a three bus system, labeled A, B and C. The register set takes data input from all three of these busses, and nowhere else. Values from Registers and Ram can be put only on Busses A and B. The schematic for Busses A and B is given on page 2 of Appendix A. Bus C is the output of the ALU. System Registers are synchronous to the falling edge of the clock, but the Control Unit is synchronous to the rising edge of the clock. Hence, control signals are generated half a cycle before registers clock in new values. Busses A and B and the ALU are asynchronous, so they respond to control signals immediately. As a result of this setup, values can be passed between registers and/or through the ALU in one clock cycle, without the necessity of latching values.
An example of a one cycle sequence is addition. At the rising edge of the cycle, control signals are generated telling Bus A to carry the value of R0, telling Bus B to carry the value of R1, telling the ALU to add its two inputs and telling R0 to input the value from Bus C. The asynchronous Busses and ALU respond immediately and in a very shout time, the result of the addition propagates through to Bus C. On the falling edge of the cycle, R0 clocks in the value on Bus C which is the proper result.
The RAM is synchronous to the rising edge of the clock. It communicates to the system through registers MD, MA and Bus A. By default, Ram is set to Read (i.e. memWrite=0); therefore, on every clock cycle, it generates at its output, the value at the address specified in MA.
A memory read can be accomplished by generating control signals at the rising edge of clock 1 such that MA latches the address at the falling edge of clock 1. Ram will then be outputting the requested data at the rising edge of clock 2 which can be latched into the desired register on the falling edge of clock 2. Therefore a read takes 2 cycles.
A memory write can be accomplished by generating control signals at the rising edge of clock 1 such that MA latches the address and MD latches the data to be written at the falling edge of clock 1. Ram will then write the specified data to the specified address at the rising edge of clock 2 (i.e. the end of clock 1). Therefore a write takes 1 cycle.
Special logic was necessary for the memWrite control signal. Because control signals are rising edge, it would be de-asserted just as Ram (also rising edge) was supposed to read it. It was, consequently, necessary to add a flip flop to delay this signal half a clock cycle. This delay flip flop is labeled on the schematic.
For Interrupt Service Routine support, the user must be able to specify the address of their ISR. This system requires them to write the address of their ISR to memory location E0. The system consequently needs to be able to read specifically from that address, hence the constant E0 and MA are multiplexed to the Address input of the Ram.
Control Unit (Generation of all system control signals) See Appendix B.
An external decoder, translates the four bit opcode from IR[7..3] into 16 signals, one for each opcode. The inputs to the Control unit are:
- the above 16 signals
- IR[0] and IR[2] used for distinguishing between actions within a given op code. For example: Add and Sub have the same op code and IR[0] distinguishes them.
- CON which is the result of decision logic for branching.
- IRQ which is the interrupt service request signal.
- Clock which is the system clock.
- Reset which resets the system.
The control unit uses these inputs to generate all system control signals. The control unit is written as a single process. Within this process is a conditional: if the reset signal is received then the unit goes to the reset state, else if there is a rising clock edge then the system resets all control signals then sets control signals and determines its next state based on its current state.
The Reset State: (Rset)
In this state, the system zeros all registers, except for SP which it sets to FF. It also initializes run, and disables IRQ. It sets the next state to T0 which is the beginning of Opcode Fetch. Hence, all programs must start at address 00 because that address is expected to hold the first opcode after reset.
Opcode Fetch: (states T0 and T1)
Every operation except the servicing of an ISR begins with the opcode fetch carried out in states T0 and T1. It is only in state T0 that the system checks the IRQ signal. If there is an Interrupt Service Request then the system stacks the system state and services the interrupt. Otherwise it fetches the next opcode and carries out the instruction sequence to completion.
State T2 is the most complex because it is in this state that the control unit considers the opcode held in IR. Most instructions are completed in this state and set the next state to T0. Some instructions require more clock cycles and, consequently, have extra states.
Format of a State:
In any state, first the control unit sets the appropriate signals, then it sets the next state. It may make a decision based on the contents of the IR as to what signals to set or which state should be next, but that is often inherent to the state and no therefore decisions are necessary.
Grx Logic (Generation of control signals for R0 & R1) See Appendix A, page 3.
The Grxlogic module for the mini SRC is responsible for the generation of the control signals for R0 and R1 (see Appendix A pg 3). More Specifically, this module is responsible interfacing the Instruction Register fields a , b, c with actual registers. This is done through a grouping of Sum of Products logic gates. Inputs to the Grxlogic module are IR fields [3..1] and the Grx signals which are generated by the Control Unit. The outputs of the Grxlogic module are the control signals for the desired register to be activated.
Branch Logic (Decision logic for branches) See Appendix A, page 4.
This module is responsible for the interface between the IR branch fields and the control unit. This module interprets the appropriate field within the instruction register and determines if the branch condition is met. If the branch condition is met a CON signal is asserted and is processed by the control unit. The inputs to this module are IR fields [1..0] , BUSA, clock, CONclear, and CONin. The output is simply the CON signal, which is sent to the control unit.
Register Set (All system registers) See Appendix A, page 5.
The register set is the collection of registers that are
available in the system. For simplicity
they were grouped together within a module.
Some of the registers within the module have the ability to accept input
from multiple buses (reg2input), the interface logic for this can be seen in
Appendix A (pg6). The inputs to the
Register Set module are the control signals which enable the specific
registers, the 3 Bus lines (A,B,C) clock, and clear. While the outputs of the Register Set module
are the outputs of the individual registers within the module.
ALU (Arithmetic and Logic Unit) See Appendix A, page 7.
The ALU can perform 10 functions:
Most of the implementation is straightforward and apparent from the schematic; however, the interface to the adder/subtracter is somewhat complicated. Each if its inputs are multiplexed with two input multiplexers. This allows us to consider four useful combinations:
IRQ (Interrupt Support) See Appendix A, page 8.
Description:
The mini SRC processor has a Timer interrupt system. The interrupt signal is generated by a free running 8 bit counter which is clocking with the processor clock. Configuring the system for interrupt support is done through the use of reserved opcodes that have been hardwired into the system. A detailed description if the interrupt system is as follows:
Implementation (see IRQ.GDF spec):
As the mini SRC clock runs, a free running 8 bit counter is incremented by 1 with each clock cycle. When the counter reaches FF it simply rolls over and begins counting again at 00. With each clock cycle the value in the free running counter is compared to a user specified 8 bit value that is stored within a register. The comparison is done with a simple 8 bit compare circuit available in the MEGA_LPM package. If the comparison yields a match, and the user has specified interrupts to be enabled within the system, an IRQ signal is generated from the interrupt circuitry and the control unit begins to process the interrupt service routine (ISR). By design , the address of the ISR is stored at location E0 in memory. This memory address is reserved for the ISR jump vector and the user should be careful that E0 contains the desired starting address for an ISR. When the system breaks to process an ISR it stores the current state of the system (ie: register values, Program counter.) onto the system stack and begins processing the ISR at the user specified address. While the system is processing the ISR , it ignores interrupts in order to allow processing to be complete, it is the users responsibility to ensure the Return from interrupt opcode is placed at the end of the ISR. Once the control unit detects the Return from Interrupt opcode it proceeds to generate the appropriate signals that pull (from the stack) the system state information back into the appropriate registers.
User Guide:
The following is a guide line for interrupt support:
Steps:
1 - Store the desired ISR starting address (jump vector) at address E0 in memory. Remember that E0 is a RESERVED address and should contain the starting address of the ISR if interrupt functionality is desired.
2 - Enable interrupts using the Enable interrupts Op code (see opcode spec), the opcode requires for the user to specify the location of the interrupt period (in R0 or R1). The value in the specified register will correspond to the clock number at which to generate an interrupt.
3 -At the end of the interrupt service routine the user should use the Return from ISR opcode (see opcode spec) to complete the process.
The following page contains a sample program and Timing Diagram for the Interrupt process
The following program is a guide to using interrupts with the Mini SRC:
The interrupt is set to fire after 34 clock cycles (22 hex). It was chosen that this interrupt would happen during a sequence of NOP instructions. The user specifies the ISR starting address to be D0. Finally, within the ISR, registers R0 and R1 are both loaded with 99 (hex) to demonstrate that the ISR is being serviced.
DEPTH = 256; % Memory depth and width are required %
WIDTH = 8; % Enter a decimal number %
ADDRESS_RADIX = HEX;
DATA_RADIX = HEX;
CONTENT
BEGIN
01 : 21; % ldi , R0 <= 22 %
02 : 22; %Fire interrupt after 22(hex) clock cycles%
03 : 29; %ldi R1<= D0%
04 : D0; %ISR address D0%
05 : 38; %stx E0 <= R1 (DO)%
06 : E0; %Write D0 to reserved address E0%
07 : 51; % enable interrupts, get interrupt period (22) from R0%
08 : 28; % Ldx R1 %
09 : 03; % addr %
0A : 90; % NOP %
0B : 90; % NOP % <--- Expect interrupt to fire in here
0C : 90; % NOP %
0D : 90; % NOP %
0E : 42; % Add: R0 = R0 + R1 = 43%
%%%%%% THE ISR %%%%%%%
D0 : 21;
D1 : 99;
D2 : 29;
D3 : 99;
D4 : B0; %return from interrupt%
END ;
The following is a simulation of the above program, note that the IRQ signal is a pulse that occurs after 34 (22 hex) clock cycles, somewhere within the NOP instruction sequence (NOP is opcode 90)

Results and
Comparison
|
Instruction |
Number of Cycles |
|
Service Interrupt Request |
7 |
|
load |
4 |
|
store |
4 |
|
ldi |
4 |
|
ldx |
5 |
|
stx |
4 |
|
add |
3 |
|
sub |
3 |
|
enable irq |
3 |
|
disable irq |
4 |
|
and |
3 |
|
or |
3 |
|
branch |
4 |
|
shift R |
3 |
|
Sihft L |
3 |
|
NOP |
3 |
|
STOP |
3 |
|
Return from ISR |
8 |
|
Neg |
3 |
|
INC |
3 |
|
DEC |
3 |
|
Call Subroutine |
4 |
|
Return from Subroutine |
4 |
|
Push |
4 |
|
Pull |
4 |
|
Average Cycles per instruction (CPI) |
96 / 25 = 3.84 |
-- maximum
frequency of operation in the simulator
15 Mhz
-- maximum
frequency of operation on the chip
15 Mhz
-- average
Cycle Per Instruction (CPI) for the program
3.84
-- MIPS
rating
(for 15mhz physical test run) = 15Mhz / 3.84 cpi = 3.9
MIPS
--
percentage of chip area
memory utilized : 16 %
LCs utilized : 40 %
# of LCs : 464
Functional
Simulation Results
All
instructions (except interrupt related instructions) where rigorously tested
with a single program. The program source
is given in Appendix C1 and the resulting simulation waveform is given in
Appendix C2. Appendix C2 is heavily
documented, and each instruction setup and test is labeled and explained. The following as an abstracted summary of
those results.
Load Immediate and Load Extended are both used extensively throughout
the program and are easily verified as accurate.
Addition and Subtraction consider the operands to be in 2s
compliment. Addition of two
positive numbers was successful. Addition
of a positive and a negative number was successful. Subtraction of two positive numbers
yielding a negative result was successful.
Subtraction of two positive numbers yielding a negative result
was successful. Also, near the end of
the program, Subtraction of two negative numbers was found to be
successful.
Logical
bit-wise And, logical bit-wise Or, Shift Right one
bit position, Shift Left one bit position, NOP and logical
bit-wise Negate were all tested and found to be successful.
Increment and Decrement consider the operands to be unsigned. Both were successfully tested on high ($CC)
and low ($15) numbers.
Regarding
stack support and Subroutines:
Instructions
Push and Pull were successfully tested. In the wave form it can be observed that the
SP properly decrements and increments and the proper values are written to and
read from the proper (SP) memory locations.
Subroutine
Call and Return were successfully tested. In the wave form it can be observed that the
PC is properly pushed onto the stack and program flow does skip to the
subroutines address. The Return
instruction properly pulls the PC from the stack and resumes program flow
immediately after the original Call command.
All four Branch
conditions were successfully tested. For
each a test value was loaded into R1 and when that value met the condition
specified in the Branch instruction, program flow properly skipped to
the branchs target address.
Store
Direct and Load Direct were successfully
tested. Values were stored to the four
possible addresses and were subsequently successfully loaded from the four
possible addresses.
Store
Indexed/Indirect and Load Indexed/Indirect
were successfully tested. An address was
loaded into R1 and values were stored to the four possible offsets from that
address. The values were subsequently
successfully loaded from the four same addresses thereby proving the validity
of the instruction sequences.
At the end
of the program, the Stop instruction successfully halts program flow.
The IRQ signal-generating module was tested and simulated both independent of the data path and integrated into the data path. The following simulation result is of the independent test it can be seen on the next page of this report (see IRQ design spec for integrated simulation result).
As shown in the simulation results, Bus A had the value 03 placed onto it and this value was latched into the Period register. This value is the interrupt period value and thus an interrupt is generated on the 3rd clock cycle. After the interrupt is asserted, a clear IRQ signal is asserted (normal by Control Unit) causing the interrupt signal to be set low again. All functional simulation signals behave as expected and confirm proper operation of the interrupt module
The objective of Phase 1 was to design and implement the Data path for the mini SRC. This data path consisted of a 1-BUS 8 bit wide system. After completion of the 1-BUS system a 3-BUS implementation would be straightforward. In anticipation of a switch to a 3-BUS system the registers were all placed into a single Design schematic (see Appendix A) that would lend itself to modularity. An abstract view of the data path is as follows:

Mini
SRC data Path (figure
from ELEC 374 course material)
The Bus was implemented using a simple 8 input multiplexer with 3 selects. Some of the components along the data path were not implemented within the first phase and thus the signals needed were manually generated during simulation. The final product at the end of the first phase was a functional data path along with limited ALU functionality. All opcode decoding, was implemented in a temporary fashion which would suffice for the testing and simulation of the data path.
During Phase 2 much of the Instruction Decoding logic was added to the mini SRC. Outputs from the instruction register were passed through combinational logic so that op code instructions could be decoded. Furthermore, memory interface design was added to the data path to allow for memory accesses. The type of Ram used was Synchronous RAM with 256 accessible memory locations each containing 8-bits of data. Further modification was made to the ALU design in order to incorporate the instruction-decoding scheme employed. At the end of the phase memory access instructions, ALU instructions and , Branch instruction were all tested and simulated for expected results.
Key Schematics for phase 2:

Memory
Interface Design (figure
from ELEC 374 course material)

Register Select
Decoding Logic (figure
from ELEC 374 course material)
The major task of Phase 3 was the addition of a control unit. The control unit would contain the cycle-by-cycle instruction sequence of all opcodes. Implementation of the control unit was done through a finite state machine that was written in VHDL. After the control unit was written in VHDL it was placed within the data path, tested and simulated. Furthermore, the entire processor was tested by writing a small program into RAM and verifying simulation results.
A general Schematic of the Control unit and decoding information:

Control Unit and Instruction Register interface logic
(Figure from ELEC 374 course
material)
The
major upgrades to the processor in this phase were as follows: 3-BUS
implementation, System stack support, Sub Routine call support, and finally
Interrupt support. The move to a 3-BUS
architecture required a thorough reassessment of many system components. Major changes had to be applied to the control
unit now since the additional Bus support would mean some cycles in certain
instructions could be eliminated.
Furthermore, every instruction sequence in the control unit had to be re
coded to incorporate multi Bus support.
Components such as Temp register A and C were removed since they were no
longer needed; operations could be preformed in parallel on the additional
Buses. After all modification and
upgrades were complete, a full timing and functional simulation was Performed
and the processor was then uploaded onto a FLEX
EPF10K20 chip and tested from 1 15 MHZ.
A program that had been written into memory was observed by noting
appropriate values that would appear on the LEDs on the ALTERA evaluation
board.
The program that was
tested on the hardware is shown below:
ORG 00
ldi R0, $67 ; R0
= $67
ldx R1, $80 ; R1
= ($80) = $44
add R0, R0, R1 ; R0 = $AB
and R1, R0, R1 ; R1 = 00
ldx R1, $81 ; R1 = ($81) = 01
sub R0, R0, R1 ; R0 = $AB 1 = $AA
shr R0, R0, 0 ; R0 = $55
stx R0, $80 ; ($80) = $55
bral R0, R1 ; PC = $55
ORG $55
loop nop
brnz
R0, R1 ; branch to loop for
ever
In addition to the functionality required by the official specification sheet, the features below were added. Note that even though load and store were not necessary (in light of Ldi, Ldx, Stx), they also have been implemented.
1- Load loads from 1 of 4 specified address or from address + offset
2- Store - stores to 1 of 4 specified address or from address + offset
3- OR Bit-wise logical OR of specified values.
4- ShiftL shifts specified register value to the left.
5- NEG negates specified register value
6- INC / DEC increment or decrement specified register value by 1
7- 3-Bus architecture
8- Timer Interrupt user can enable/disable interrupts to occur at a given clock #
9- Stack (push , pull) user has access to a Stack
10- Subroutines (Call Routine, Return ) user has ability to jump to a specified routine and return from it.
Conclusion
After completing the mini SRC it was found that the most significant improvement in performance was gained by implementing a 3-Bus architecture. This Greatly reduced our CPI and resulted in performance increases. Unfortunately, the addition of a 3-BUS architecture introduced some timing issues which may have reduced our maximum frequency of operation.
The mini SRC that was produced is capable of performing many common instructions that are typically required by processors used in small embedded systems. The additon of interrupt handling makes the mini SRC have even more applications such as timers, and more complicated user feedback systems.
Future work
Examples of future work that can be undertaken for the mini SRC include
1- The addition of Input and Output ports with handshaking
2- Different Types of interrupts (Input compare / output compare)
3- 16 bit Operand support instead of 8 bit
4- Additional ALU operations such as Multiply / Divide
5- Condition Code register which sets bits after register contents have changed
Condition bits : Negative, Overflow, ZERO, Non-ZERO