ELEC 498 Final Report:
Design, Verification and Implementation of a Floating Point
Coprocessor.

|
Submitted By: Group #2 Hock Lee Ooi (424-9575) Michael Wood (426-4464) Fadi Yared (432-0222) |
|
Submitted To: Course Instructor:
Dr. Peter J. McLane Course Instructor:
Pawel A. Dmochowski Faculty Supervisor:
Dr. Ahmad Afsahi Teaching Assistant:
Jim Reed |
|
April 17, 2003 |
EXECUTIVE SUMMARY
A floating point
unit (FPU) was designed using MAX+plus II 10.2 software and implemented on the
Altera UP1 Education Board with a chip from the FLEX10K family. The FPU was designed to interface via an
eight bit parallel communications handshaking protocol with a microcontroller
from the MC68HC11 family, hosted on an evaluation board offered by
Technological Arts. This arrangement was
used to demonstrate and verify the design.
The FPU was designed to carry out addition,
subtraction, multiplication and division according to the IEEE-754 floating
point standard. The addition,
subtraction and multiplication designs were implemented following standard
circuit algorithms. Division was
implemented according to the non-restoring division algorithm, using 32 bit
registers for the mantissas.
The communications protocol was designed according to
handshaking as defined by the documentation for MC68HC11 family
microcontrollers. The design uses two
dedicated ports for send and receive.
The user interface designed in software for the
MC68HC11 was meant for debugging and demonstration purposes. The MC68HC11 connected to a terminal window
via the serial port of a PC, and accepted commands of the form 8E45F985+85A38C5E,
where each character was a hex value representing four bits of the 32 bit
operand encoded according to the IEEE-754 floating point standard. It then returned the result to the terminal
window in the same format.
This project was of main importance as an educational
tool. All designs were implemented on
platforms used in Queen’s ECE curriculum; therefore, Queen’s ECE students have
free access to the same environments.
Future development may add new functions to the FPU such as integration
or convolution. And finally, members of
the Queen’s community may use this design in some of their applications.
TABLE OF CONTENTS
2.2.1 Communications Specifications
3 DESIGN AND PRODUCTION APPROACH
3.4.2 MC68HC11 Side Communications
4.3 Verification of Arithmetic Results
4.3.2 Rigorous Physical Testing
7.1 Appendix A: Test Numbers Analysis
7.2 Appendix B: Design Files and Simulation Results
This report was intended to summarize the work and
results of the ELEC 498 project (design, verification and implementation of a
floating-point coprocessor) undertaken by Hock Lee Ooi, Michael Wood and Fadi
Yared. The intended audience was the Dr.
Peter J. McLane, Pawel A. Dmochowski, Dr. Ahmad Afsahi, Jim Reed and future
ELEC 498 project groups.
In many computing applications, the importance of floating point arithmetic is paramount. Applications such as integration, convolution, and many signal processing operations require floating point arithmetic precision. The goal of this project was to provide microcontrollers from the MC68HC11 family with floating-point arithmetic capabilities modeled after to the IEEE-754 standard. This goal implies the design, verification and implementation of a floating-point coprocessor that allows users to perform simple arithmetic functions (addition, subtraction, multiplication and division) with floating point precision.
The main objective of this project was subdivided into two distinct goals. The first was the development of a communications protocol between the FPU and the host microcontroller. The second was the design of the four arithmetic functions in the FPU.
The proposed
solution was to design the FPU using MAX+plus II 10.2 software and implement it
on the Altera UP1 Education Board with a chip from the FLEX10K family. The FPU was intended to interface via an
eight bit parallel communications handshaking protocol with a microcontroller
from the MC68HC11 family, hosted on an evaluation board offered by
Technological Arts. Such an arrangement
would be used to demonstrate and verify the design.
This project was
particularly relevant to the Queen’s ECE department because the hardware
involved was used in many of the undergraduate courses. Thesis adviser Dr. Afsahi was also instructor
of ELEC 374 which had a large project component using the Altera UP1 Education
Board. Dr. Afsahi previously supervised
a Sci-02 498 group who attempted this same project, but was not satisfied with
the results. Dr. Afsahi requested that
this group proceed in an independently of the previous design; consequently the
previous year’s project was used only in the initial phases of this project to
guide the organization of the components required. The major development form this was the
decision to carry out the project, incrementally between the two goals
(communications and arithmetic) rather than one and then the other.
The project had both software and hardware components, and met all its goals. The result was a working model in a presentation/debugging format. A figure of the final model is given below.

The FPU was designed using MAX+plus II 10.2 software
and implemented on the Altera UP1 Education Board with a chip from the FLEX10K
family. The FPU was used by a
microcontroller from the MC68HC11 family, hosted on an evaluation board offered
by Technological Arts. This arrangement
was used to demonstrate and verify the design.
A communications protocol was successfully designed to interface the two
devices. The FPU successfully carried
out addition, subtraction, multiplication and division modeled after the
IEEE-754 floating point standard. The
user interface designed in software for the MC68HC11 was meant for debugging
and demonstration purposes. It connected
to a PC terminal window via a serial port, and accepted keyboard commands. Furthermore, a VB application was developed
in order to automate the verification of results generated by the prototype
model.
In order to
implement communication between the two devices, hardware in the FPU had to be
designed to mirror existing communications protocols in the MC68HC11. The 8-bit parallel ports of the MC68HC11 were
chosen as the communication medium and full handshaking was chosen as the
desired protocol. In order to design
floating point arithmetic functions in the FPU, the IEEE-754 standard had to be
followed. Details are given in the
following sections
Input handshaking for the MC68HC11 is given below. Output handshaking is essentially the same in reverse. Since the clock of the FLEX10K is considerably faster than the clock of the MC68HC11 its responses can be considered instantaneous in most cases. In order to establish the interface between the devices, receiving-end designs were created in the FPU for both input and output handshaking.

Each floating point value is 32 bits; therefore, when the MC68HC11 requests a floating point operation from the FPU it must send nine packets and receive four packets. Since the 8-bit parallel ports were chosen as the communications medium, each 32-bit operand can be sent as four 8-bit packets. The ninth sending packet represents the op-code. As a debugging interface for the prototype device, the MC68HC11 can be connected to a PC terminal (via a COM port) and can receive keyboard commands.
The IEEE-754 Standard is outlined in the figure below. The first bit represents, the sign; the next 8 bits represent the exponent in excess-127 format; and the trailing 23 bits represent the bits of the mantissa following an implicit leading one.

The above figure represents a simplified view of the format. The following table gives some example numbers. It is important to note special representation of certain values such as zero and infinity; furthermore, guard bits are generally used to prevent precision loss in rounding. Guard bits are extra hidden bits kept at the end of the mantissa. Representation of special numbers, guard bits and rounding are beyond the scope of this project; trapping overflow when doing floating point operations is also beyond the scope of this project.
Table: Example Numbers from the IEEE-754 Format
|
Value |
Sign |
Exponent |
Fraction |
|
+1.101 x 25 |
0 |
1000 0100 |
101 0000 0000 0000 0000 |
|
-1.01011 x 2-126 |
1 |
0000 0001 |
010 1100 0000 0000 0000 |
|
+1.0 x 2127 |
0 |
1111 1110 |
000 0000 0000 0000 0000 |
|
+0 |
0 |
0000 0000 |
000 0000 0000 0000 0000 |
|
-0 |
1 |
0000 0000 |
000 0000 0000 0000 0000 |
|
+¥ |
0 |
1111 1111 |
000 0000 0000 0000 0000 |
|
+1.0 x 2-128 |
0 |
0000 0000 |
010 0000 0000 0000 0000 |
|
+NaN |
0 |
1111 1111 |
011 0111 0000 0000 0000 |
The following table lists the specific milestones for this project and the duration spent on each milestone. Note that all milestones were completed as specified.
|
Milestone |
Duration |
|
Research Addition/Subtraction |
October 1 – November 1 |
|
Research Communications |
October 1 – November 1 |
|
Research Multiplication/Division |
October 1 – November 1 |
|
Design Addition/Subtraction |
November 1 – January 15 |
|
Design Altera Communications |
November 1 – January 15 |
|
Design Motorola Communications |
November 1 – January 15 |
|
Design Motorola User Interface |
November 1 – January 15 |
|
Verify Addition/Subtraction with software
simulation |
November 1 – January 15 |
|
Verify Addition/Subtraction with hardware
simulation |
November 1 – January 15 |
|
Verify Altera Communications with software
simulation |
November 1 – January 15 |
|
Verify Altera Communications with hardware
simulation |
November 1 – January 15 |
|
Verify Motorola Communications with software
simulation |
November 1 – January 15 |
|
Verify Motorola Communications with hardware
simulation |
November 1 – January 15 |
|
Design Multiplication/Division |
January 15 – February 26 |
|
Verify Multiplication/Division with software
simulation |
January 15 – February 26 |
|
Verify Multiplication/Division with hardware
simulation |
January 15 – February 26 |
|
Resolve Timing Between Boards |
February 26 – March 12 |
|
Full System Verification |
March 12 – March 26 |
|
Improve Components |
March 26 – April 2 |
The following
table lists the division of labor. An x
represents partial responsibility.
|
Milestone |
Lee |
Mike |
Fadi |
|
Research Addition/Subtraction |
x |
x |
|
|
Research Communications |
|
x |
x |
|
Research Multiplication/Division |
x |
x |
|
|
Design Addition/Subtraction |
x |
x |
|
|
Design Altera Communications |
|
x |
x |
|
Design Motorola Communications |
|
x |
x |
|
Design Motorola User Interface |
|
x |
|
|
Verify Addition/Subtraction with software
simulation |
x |
x |
|
|
Verify Addition/Subtraction with hardware
simulation |
x |
x |
|
|
Verify Altera Communications with software
simulation |
|
x |
x |
|
Verify Altera Communications with hardware
simulation |
|
x |
x |
|
Verify Motorola Communications with software
simulation |
|
x |
x |
|
Verify Motorola Communications with hardware
simulation |
|
x |
x |
|
Design Multiplication/Division |
x |
x |
|
|
Verify Multiplication/Division with software
simulation |
x |
x |
|
|
Verify Multiplication/Division with hardware
simulation |
x |
x |
|
|
Resolve Timing Between Boards |
|
x |
x |
|
Full System Verification |
|
x |
x |
|
Improve Components |
x |
x |
x |
The debugging
user interface was written in assembly code for the MC68HC11. It was designed to allow the user to input two
32-bit operands and an op-code. The
operands must be represented by eight hexadecimal digits, each representing
four bits of the 32-bit value. The interface
was designed to only accept valid keyboard input (0-9,a-f,A-F) for the operands and (+, -, *, /) for the
op-codes. A separate function had to be
created to convert between ASCII and binary values. Values received from user input or as a FPU
result are stored in memory on the chip at specified addresses. The following figure is a screen capture of
the user interface after a user has keyed in their input and the result has
been received.

The goal of the communications requirement was the successful transfer of data packets between the MC68HC11 and the FPU. The 8-bit parallel ports of the MC68HC11 were chosen as the communication medium and level sensitive, full handshaking was chosen as the desired protocol. Although the MC68HC11 offers less rigid handshaking protocols such as pulse mode handshaking, employing the rigid level sensitive protocol allowed for easier debugging of communications problems. This was due to the fact that pulses were difficult to detect when debugging using the instruments at the team’s disposal. Testing requirements were also considered in the two port implementation of the communications design. Separate dedicated input and output ports simplified the testing by removing any ambiguities about whether the data being observed was input or output data. This also eliminated any concerns about damage to the devices due to competition on the communication lines.
Since the handshaking protocol
specifications were clearly outlined in the MC68HC11 reference manuals, the FPU
component of the design consisted of implementing
those clearly defined requirements.
The FPU communications design used a combination of register and control logic to accept the data that had been transferred from the MC68HC11 and send the appropriate acknowledgement signals. A finite state machine, coded in VHDL was designed to keep track of the state of the FPU and respond to the MC68HC11 with appropriate handshaking signals. It was also made to generate appropriate control signals such that the value on the input lines were clocked into sign, exponent and mantissa registers for each operand. Similarly, when outputting to the MC68HC11 the finite state machine was made to generate appropriate control signals such that the correct bits were driven onto the output port.
The MC68HC11
aspect to the communications component of the project involved writing a
communications driver in assembly code that was custom fit to the FPU
design. As previously discussed, the
driver must configure the MC68HC11 appropriately and perform the correct
handshaking algorithms. The final
working algorithm that was used in the project was modeled after the provided
specifications as listed in the Motorola reference manuals and was further
adjusted to the projects particular requirements.
The
communication driver had two main component routines: send and receive. Both routines where programmed with polling
methodology rather then interrupts.
Initially, the MC68HC11 is driven by the send routine. When the all data is sent and confirmed by
the FPU, the MC68HC11 then switches to the receive routine. At this point the driver then waits for all
the required data. After the all
transactions are complete, the debugging user interface routine displays the
result on the screen.
It is important to note that design of the MC68HC11 communications drivers was an iterative process that was conducted in conjunction with the FPU communications design. This was done to ensure a smooth convergence of the two communication components of the project (MC68HC11 & FPU).
Floating point addition can be carried out on any combination of positive and/or negative operands; therefore, once an addition module exists, subtraction can be achieved simply by changing the sign of the second operand and passing it through the addition module.
The addition/subtraction module was designed as an asynchronous device with inputs and outputs for operands and result grouped as separate sign, exponent and mantissa lines. After the design of the other components, the addition/subtraction module was made synchronous solely for the purpose of outputting a ‘done’ signal two clock cycles after its inputs were made valid. This was required only to standardize the addition/subtraction component with the other components which were designed as synchronous machines.
The addition/subtraction design follows the following standard sequential algorithm. The operation (addition/subtraction) is determined by the XNOR of the input signs. The mantissa of the operand with the lesser exponent is shifted right by the difference between the exponents. The operation (addition/subtraction) is carried out on the mantissas. The resultant exponent is set equal to the larger input exponent minus the number of leading zeros of the resultant mantissa. The resultant mantissa is shifted left until there are no leading zeros, and this is used as the final result for the mantissa. The resultant sign is set to positive/negative based on the magnitudes of the altered input mantissas and the determined operation.
The design of
floating-point multiplication was more complicated because it must perform many
additions. A Design implementing a
single addition of all the mantissa results was made but the hardware would not
support such a device due to its complexity and the number of logic cells
required; therefore, the following sequential synchronous algorithm was
implemented instead. The resultant sing
is set equal to the XOR of the input signs.
The resultant exponent is set equal to the addition of the input exponents
minus 127. The design must then deal
with the mantissa. The first mantissa is
used for the multiplicand and the second is used as the multiplier. The multiplier is checked to see if the value
is a one or a zero. If the multiplier is
a one, then the multiplicand is added to the product and then shifted one space
to the left. If the multiplier is a
zero, it is simply shifted and a row of zeros is added to the product. This design requires twenty-four cycles
because there are twenty-four checks and additions in the mantissa and each
requires one clock cycle. The design
then gives the mantissa result, the exponent result and the sign bit. As in addition/subtraction, the resultant
mantissa and exponent are normalized based on leading zeros in the mantissa.
As the FPU
design became more complicated, it was apparent that the design was too large
for the hardware that was supplied. The
Group advisor, Dr. Afsahi, immediately addressed this problem by providing the
group with a larger chip capable of handling the growing design. The new chip met all the requirements for the
design.
The design of
floating point division was further complicated compared to floating point
multiplication. First, the design shifts
the first mantissa twelve spaces to the right and adds twelve to the
corresponding exponent. The design then
performs and XOR function on the sign bit.
It then does a subtraction of the exponents and adds one hundred and
twenty seven. The scheme of performing
the division on the mantissa follows the non-restoring division algorithm. Firstly, a row of 32 bit zeros is named the
dividend. The second mantissa is named
the divisor with zeros extended to the right.
The dividend is shifted one space to the left and the twenty-fourth bit
of the first mantissa is added to the zero bit of the dividend. The first mantissa is then shifted one space
to the left. The twenty-fifth bit of the
dividend is checked to see if it a zero or a one. If it is a zero, the dividend is added to the
divisor. If it is a one, dividend is
subtracted by the divisor. This forms a
temporary answer. The first bit of the
temporary answer is checked to see if it is a zero or a one. If it is a zero, a one is added to the last
bit of the quotient. If it is a one, a
zero is added to the last bit of the quotient.
The quotient is then shifted one space to the left ready for the next
bit. The temporary answer now becomes
the dividend. The process is then
repeated 32 times. This process requires
many more clock cycles because second cycle requires an input from the first
cycle and so on. After the process
completes, the result of the mantissa, the result of the exponent and the sign
bit are outputted.
Very minimal
testing was required for the debugging user interface. The debugging interface was designed to only
accept valid keyboard inputs. This was
manually verified, as were the conversion algorithms between ASCII and binary. This ensured that the screen output was an
actual representation of the data sent and received on the port lines.
More detailed testing was required for the communications protocol. The communications protocol was independently verified during design first by software simulation, then by hardware simulation. This was done with the two devices independently. In order to facilitate hardware simulation, the device clocks were slowed and dip switches were used as inputs while LEDS were used as outputs. Only minor changes were required when the devices were actually linked in hardware. The nature of the communications protocol was such that it either functioned correctly or not at all; therefore after initial success, only minor testing was carried out to verify that it operated correctly in all expected environments.
As listed in Appendix A, several test numbers were generated and analyzed by hand. These were shown in several representations, including those that would be observed traveling over the communication lines. During the design of each arithmetic operation, these values were used as the test numbers, and success was assumed when the simulation results matched the expected results. Testing of each arithmetic operation was carried out only against one such dataset due to the difficulty of calculating expected results by hand. These simulations were carried out using only Max+plus II simulation software, but were expected to be an accurate representation of the hardware’s behavior.
An independent Visual Basic 6.0 application was designed in order to facilitate rigorous physical testing of the final hardware model. Its graphical user interface is shown in the figure below.

This application has many features and is capable of automatically running the debugging interface of the MC68HC11; however, it requires specific initialization as follows:
- create a new HyperTerminal session of the name “fpu”
- connect to the MC68HC11
- load the MC68HC11 software interface
- begin execution of the MC68HC11 program
The Generate button creates a random input string (operand op-code operand) in the upper left text box, and shows its representation in the boxes at the bottom of the window. The Calculate button does the same but uses the manually entered string from the boxes directly below it. The Generate button only uses the operations specified in the box to its right and also outputs the expected result directly below the generated string. Both buttons are associated with a Send Keys button. This will automatically send the operation to the MC68HC11 which will in turn output the result calculated by the FPU. The Turn button converts a Hex string (in floating point format) into the decimal equivalent. The results from the fpu can be put into the box below the Turn button and converted in order to see if they match the expected results. The Generate button also has a Rolling Send option which simply repeats the function the number of times specified in the box to its right. The Stop button stops the rolling send.
This application
was used to rigorously test the hardware model for all four arithmetic
operations. This revealed some timing
problems with multiplication and division, and a precision error with the
division algorithm. After these design
errors were corrected the testing verified all expected behavior. It is important to note that some test cases
gave imprecise or incorrect results.
This was expected as a result of the limited scope of this project. These inconsistencies were the result of the
lack of rounding and overflow control.
Thorough testing
revealed that the final product was fully operational and functioning as
required. As outlined in this report, it
was the group’s goal to produce a product that had clear and modular design in
hopes that it could serve as a building block for future groups. Not only were the arithmetic and
communication designs completed, but also, a debugging user interface was
created on the MC68HC11 for use with a PC, and a separate PC application was
created for independent rigorous verification of the arithmetic results.
Given that the testing verified the project design, there remained several issues worth consideration. The IEEE-754 standard calls for the use of guard bits which were not within the scope of this project. Guard bits are used when performing rounding which is also beyond the scope of this project. As a result there were small discrepancies with certain numbers in the testing, these were assuredly due to the limitations of the scope of this project, and did not reflect on flaws in the design. It is also important to note that overflow was beyond the scope of this project and test numbers resulting in overflow do not imply design flaws.
There are
numerous promising possible expansions on this project such as: overflow
trapping, guard bit compatibility, implementation of special numbers such as
infinity and NaN, refinement of division and multiplication algorithms for
faster response, convolution, integration or other complex functions.
The current
project design was implemented using both the Motorola and Altera evaluation
packages. Typically, a marketable
version of this project would include the smaller industrial versions of both
chips, along with one hard-wired bi-directional communications port instead of
the two port structure used in this design.
Such marketability adjustments are also recommended as future goals.
In conclusion,
the project was completed on time, and remained within the projected budget constraints
as originally outlined. Despite some
understandable difficulties in the testing and design phase the group managed
to adapt to these issues and complete the project on time. It is the group’s hope that future work will be
done, using the current design to expand the FPU and MC68HC11 to perform some
of the many promising upgrades to the design.
-
Computer Arithmetic: algorithms and hardware designs,
Behrooz Parhami, Oxford University Press, 2000.
-
Altera University Program Design Laboratory Package, http://www.ece.queensu.ca/hpages/courses/ELEC374/upds.pdf
- Microprocessor Systems using the Motorola 68HC11, Naraig Manjikian Campus Bookstore 1813, 2001.
- Motorola 68HC11 Technical Manual.
-
Motorola 68HC11 Reference Guide.
- Dr. Ahmed Afsahi (consultation).
- ELEC 374 course material
-
ELEC 371 course material
OPCODES
ADD 08 0000
1000
SUBTRACT 09 0000
1001
MULTIPLY 0A 0000
1010
DIVIDE 0B 0000
1011
INPUT NUMBERS
X_sign 0 0
X_exponent_excess127 30
00110000
X_mantessa_explicit B60000 10110110 0000 0000 0000 0000
X_pipe 18360000 0001100000110110 0000 0000 0000 0000
Y_sign 0 0
Y_exponent_excess127 38
00111000
Y_mantessa_explicit 920000 10010010 0000 0000 0000 0000
Y_pipe 1C120000 0001110000010010 0000 0000 0000 0000
Z_sign 0 0
Z_exponent_excess127 7F
01111111
Z_mantessa_explicit 920000 10010010 0000 0000 0000 0000
Z_pipe 3F920000 0011111110010010 0000 0000 0000 0000
Q_sign 0 0
Q_exponent_excess127 33
00110011
Q_mantessa_explicit FF0000 11111111 0000 0000 0000 0000
Q_pipe 19FF0000 0001100111111111 0000 0000 0000 0000
R_sign 0 0
R_exponent_excess127 30
00110000
R_mantessa_explicit 810000 10000001 0000 0000 0000 0000
R_pipe 18010000 0001100000000001 0000 0000 0000 0000
DEFINE A = X + Y
S
= X - Y
M
= X * Z
D
= Q / R
A_sign 0 0
A_exponent_excess127 38
00111000
A_mantessa_explicit 92B600 10010010 1011 0110 0000 0000
A_pipe 1C12B600 0001110000010010 1011 0110 0000 0000
S_sign 1 1
S_exponent_excess127 38
00111000
S_mantessa_explicit 914A00 10010001 0100 1010 0000 0000
S_pipe 9C114A00 1001110000010001 0100 1010 0000 0000
M_sign 0 0
M_exponent_excess127 30
00110000
M_mantessa_explicit CF9800
11001111 1001 1000 0000 0001
M_pipe 184F9801 0001100001001111 1001 1000 0000 0001
D_sign 0 0
D_exponent_excess127 82
10000010
D_mantessa_explicit FD05F4 11111101 0000 0101 1111 0100
D_pipe 417D05F4 0100000101111101 0000 0101 1111 0100
TESTS
ADD PIPE IN 08 18 36 00 00 1C 12 00 00
PIPE OUT 1C
12 B6 00
SUBTRACT PIPE IN 09 18 36 00 00 1C 12 00 00
PIPE OUT 9C
11 4A 00
MULTIPLY PIPE IN 0A 18
36 00 00 3F 92 00 00
PIPE OUT 18
4F 98 01
DIVIDE PIPE IN 0B 19 FF 00 00 18 01 00 00
PIPE OUT 41
7D 05 F4