School of Engineering
University of Guelph
Hanaa M. Hussain, Khaled Benkrid
School of Engineering
Edinburgh University, Edinburgh
Scotland, U.K.
{h.hussain, k.benkrid}@ed.ac.uk
Dunia Jamma
PhD Student
Prof. Shawki Ariebi
Course Instructor
Huseyin Seker
Bio-Health Informatics
Research Group
De Montfort University,
Leicester
England, U.K.
[email protected]
1
Outline
Introduction
Background of KNN
KNN and FPGA
The proposed architectures
Dynamic Partial Reconfigurable (DPR) part
The achievements
Advantages and Disadvantages
Conclusion
2
Introduction
K-nearest neighbour (KNN) is a supervised classification
technique
Applications of KNN (Data Mining, Image processing of satellite
and medical images ... etc.)
KNN is known to be robust and simple to implement when dealing with data
of small size
KNN performs slowly when data are large and have high dimensions
KNN classifier is sensitive to parameter (K) Number of nearest neighbours
The selection of the label for the new query depends on voting on those K
points.
3
1-Nearest Neighbor
3-Nearest Neighbor
4
KNN Distance Methods
To calculate the distance between the new queries and
the K’s points the Manhattan distance was used
The Manhattan is chosen in this work for
Xi: The new query’s matrix
Yi: The trained sample's matrix
K: # of samples
its simplicity and lower cost compared to
the Euclidean
5
KNN and FPGA
KNN classifiers can benefit from the parallelism offered by FPGAs
Distance computation is time consuming
Parallelizing the distance computation part
They propose two adaptive FPGA architectures (A1 and A2) of the KNN
classifier, and compare the implementations of each one of them with
an equivalent implementation running on a general purpose processor
(GPP)
They propose a novel dynamic partial reconfiguration (DPR)
architecture of the KNN classifier for K
6
Used tools
Hardware implementation:
The hardware implementation targeted the ML 403 platform board which
has a Xilinx XC4VFX12 FPGA chip on it
JTAG cable
Xilinx PlanAhead 12.2 tool along with Xilinx partial reconfiguration
flow (DPR)
Software implementation:
Matlab (R2009b) bioinformatics toolbox
Intel Pentium Dual-Core E5300, running at 2.60 GHz and 3 GB RAM
workstation
Using of Verilog as HDL configuration language
7
The used data
M
N
Y=
L
Factors
M: training samples
N: Training Vectors
L: Label
Y: trained data
X: New query
X=
8
The proposed architectures
The KNN classifier has been divided into three modular blocks
(Distance computation, KNN finder, and Query label finder) + FIFO
M-Dist PEs
K-KNN PEs
N-Dist PEs
N- KNN PEs
PE = 2N +1
A1 Architecture
A2 Architecture
PE = M + K +1
9
The functionality of PEs
Previous
accumulative
distance
Yi
Dist 2 L2
Dist1
L1
Min
Max
10
Distance computation
The distance computations are made in parallel every
clock cycle
The latency of Dist PE is M cycles
A1: the throughput is one distance result every clock
cycle
A2: the throughput is one distance result every M
clock cycle
Complete Training
11
Dist PE inner architecture
12
K-Nearest Neighbour Finder
This block becomes active after M clock cycles
The function
of this block is
completed
after an M + N
clock cycle
13
Dynamic Partial Reconfigurable part (DPR)
The value of K parameter was dynamically reconfigured, when N, M, B, and
C are fixed for a given classification problem.
Two cores (A1)
Distance computation core - Static
KNN core (KNN PE, Label PE) - Dynamic
The size of the RP is made large enough to accommodate
the logic resources required by largest K
Advantages: saving in reconfiguration time, Power
Difficulties:
Limitations (resources), the cost, the verification of the interfaces between
the static region and RP for all RMs
14
The achievement
This DPR implementation offers 5x speed-up in the reconfiguration time of a KNN
classifier on FPGA
15
Advantages
Variation which allows the user to select the most appropriate
architecture for the targeted application (available resources,
performance, cost)
Enhancement in Performance
Parallelism-speed up
DPR-reconfigurable time
Efficiency in term of KNN performance - the DPR for K
Using the Manhattan’s theorem (simplicity and lower cost)
16
Disadvantages
The amount of used resources
The not worthy achieved speed (5X) for DPR part
comparing to the amount of used resources and effort
Constraints in A2 architecture and the DPR (area)
The latency due to pipelining manner of producing
the results
17
Conclusion
Efficient design for different KNN classifier applications
Two architectures A1 and A2 and the user can choose one of
them
A1 can be used to target applications whereby N>>M,
whereas A2 is used to target applications whereby N<<M
DPR part (could be reproduced with ICAP)
Achievements comparing to GPP
Speedup by 76X for A1 and 68X by A2
Speedup by 5X in DPR
18
Any question?
19
Extra Slides
20
Memory
Each FIFO is associated with one distance PE
The query vectors gets streamed to the PEs to be stored in registers-
they will be required every clock cycle
Where:
B is the sample wordlength
M is the number of samples
N is the number of training vectors
21
Class Label Finder
The block consists mainly of C counters each associated with
one of the class labels
The hardware resources depends on user defined parameters
K and C
The architecture of this block is identical in both A1 and A2
22
23
A2 Architecture
N FIFOs are used to store the training set with each of them having a
depth of M
The class labels get streamed and stored in registers within the distance
PEs
A2 requires more CLB slices than A1, when N, M, and K are the same
the first distance result becomes ready after all samples are processed
i.e., after M clock cycles
24
DPR for K
Maximum BW for JTAG is 66Mbps
Maximum BW for ICAP is 3.2Gbps
ICAP > 48x JTAG
25
Dynamic Partial Reconfigurable part
(DPR)
The JTAG was used
(BW = 66Mbps)
Using of ICAP
instead would
decrease the
configuration time
(BW = 3.2Gbps)
26
26
© Copyright 2026 Paperzz