ECE 715/815

FALL 2005

 

INTRODUCTION TO VLSI

PROJECT ASSIGNMENT: FAST FOURIER TRANSFORM COMPONENTS

 


PROJECT GOAL

This project is a continuation of an effort initiated a few years ago. In the figure below (left) is a hardware part of the pattern recognition application developed by FPR Corp. The typical pattern recognition task involves pattern collection preprocessing and matching. Those tasks are computationally complex and often it is necessary to perform them fast. Speed is especially critical in real time applications. The ISA card below represents an outdated technology that was implemented on a single FPGA - Field Programmable Gate Array chip (right) as a part of a project conducted at UNH. This ISA card although bulky provided superior matching speed thanks to parallel processing and other innovative approach involving fast comparison of images.

Figure 1. Technology Migration FPR ISA card (left) and Virtex-II FPGA (right).

An FPGA is a programmable technology that allows for implementation of digital functions and since recently even complex systems. Systems implemented in FPGA use configurable logic blocks, which for many designs are not utilized in 100%. An alternative to FPGA is a custom technology – ASIC. An ASIC – Application Specific Integrated Circuit – is a more efficient way to use silicon area but its design takes longer and is also more expensive than an FPGA approach. ASIC’s are typically used for high volume application, in which case the so-called nonrecurring engineering costs are dissipated over a large production volume.

The 4 design stages that were followed during development at UNH are depicted in the figure below.

Figure 2. Microsystem Development Stages.

This project is designed to support the stage 4 – building an ASIC version of a fingerprint authenticator, which requires a number of complex operations involving a frequency spectrum domain filtering. The frequency domain filtering is an efficient way of filtering digital signals; a convolution in time domain is replaced by a multiplication in a frequency spectrum. In order to ensure that this approach results in performance gain compared to time domain filtering, an efficient FFT algorithm needs to be used, such as the one in the scope of this project.

The purpose of the project is to develop a library of components for a 16 – point, radix 4 FFT algorithm. The detailed specifications are presented below:

  • suggested clock speed: 100 MHz (note: if the speed of 100 MHz can not be achieved in AMI05 technology then a lower frequency is acceptable)
  • algorithm radix-4
  • arithmetic: 16 bit fixed point, 2’s complement
  • complex arithmetic operations should be divided into smaller tasks (pipelining) if the assumed target frequency could not be achieved otherwise

All control and status signals should follow the same standard; i.e. if an active low reset is chosen then it should be active low for all components. All components should have a clock input (CLK), reset (RST). Arithmetic operators should contain a status line (OVFL) indicating whether an overflow has occurred. The cooperation among the design teams is encouraged.

The library of components will allow for building FFT algorithms of the sizes of 4*N, where N – transform length. Prior to implementing an FFT ASIC it is required to estimate its power consumption, size and cost to make sure the design constraints are met given the technology at hand. Such an estimate can be done using dedicated tools, such as InCyte [2] from GigaIC Inc..

FAST FOURIER TRANSFORM

Fast Fourier Transform (FFT) is an efficient algorithm for calculation of Discrete Fourier Transform (DFT); it is an efficient way of calculating DFT. Below are well known equations for calculating DFT:

DFT:

DFT-1:

N – number of samples of the transformed sequence

fk – time domain samples, k=0, 1…N-1

Fn – frequency domain coefficients, n=0, 1…N-1

i = Ö -1

The amount of operations needed to compute DFT using equations above requires 4*N-1 additions and 4*N multiplications to calculate 1 output of N-point transform. This number can significantly be reduced by using an FFT algorithm. Cooley – Tukey [1] algorithm requires only 3*N*log2N additions and 2*N*log2N multiplications for the entire sequence of length N. For the case of 1024-point sequence it is equally computationally complex to calculate 6.25 frequency outputs using standard DFT equations as it is to calculate the whole 1024 points using Cooley-Tukey FFT.

Figure 3. A 16-point radix-4 Fast Fourier Transform algorithm diagram [1].

The figure above is meant to give an idea about functioning of FFT. FFT requires the entire sequence to be provided on its input port; it can be seen that all 16 points are supplied in parallel to the inputs of the FFT. The inputs need to be rearranged [1] before they are sent to the basic radix-4 building blocks. Butterflies perform operations of complex additions, subtractions and multiplications. After the series of arithmetic operations the output – frequency components need to be rearranged again to restore the correct order. For details on operation of 16 point radix-4 FFT refer to [1].

PROJECT ASSIGNMENTS

  1. Multiplier
  2. A 16 bit complex multiplier; a synchronous multiplier triggered on a rising edge of the system clock. Different multiplier algorithms are acceptable provided that they obey the general specifications listed under the Project Goal.

  3. Adder
  4. A 16 bit adder; see requirements for multiplier

  5. Butterfly
  6. Radix – 4 butterflies – a basic building block for an FFT algorithm - consists of multipliers, adders and a twiddle factor memory [3].

    Figure 4. A butterfly -- a basic FFT building block [ref].

  7. Input/ Output Interface
FFT algorithm requires the entire sequence of input samples to be provided on its input port in order to initiate the operation. The input interface should capture all input samples and order it in a way specified for a chosen FFT algorithm. After the entire sequence is collected and properly ordered, the input interface should generate a START signal indicate that data are ready for processing. Similar (but reversed) operation should be performed on the output of FFT. DONE signal indicating completion of processing should be followed by data on the following 16 rising edges of the clock.

RECOMMENDED READING

[1] W.W. Smith, J.M. Smith, Handbook of Real-Time Fast Fourier Transform. Algorithms to Product Testing, JohnWiley and Sons, 1995

[2] online: www.gigaic.com

[3] D. Bjoerkland, H. Enqvist, Parallel Radix-4 DIF FFT Hardware Implementation Using VHDL, 2002

[4] S. Pandey, M. Glesner, M. Muehlheuser, Architecture Level Design Space Exploration and Mapping of Hardware


ECE 715 Home