testing digital integrated circuits with novel low power high fault

T ESTING DIGITAL INTEGRATED CIRCUITS WITH
NOVEL LOW POWER HIGH FAULT COVERAGE
TECHNIQUES AND A NEW SCAN ARCHITECTURE
JĘDRZEJ S OLECKI
POZNAŃ U NIVERSITY OF TECHNOLOGY
FACULTY OF E LECTRONICS AND TELECOMMUNICATIONS
Ph. D. Thesis
Supervisor: prof. dr hab. inż. Jerzy Tyszer
Abstract
The growing complexity of VLSI integrated circuits is driven by the advances
in the manufacturing processes. Together with these innovations, however, new
challenges for manufacturing test emerge. Recently, a number of new fault models were proposed which test for the new types of defects introduced by the new
technology nodes. The resulting elevated pattern count inflates test application
time and test data volume. Thus, new testing techniques are required to counterbalance these phenomena. Moreover, reliable testing must consider power consumption during test application so that the functional power limits are not exceeded. As more and more electronic devices are being now embraced by various
safety critical applications, their lifetime reliability is of crucial importance. The
present thesis introduces new methods that address current and future requirements for high quality reliable and robust test.
First, a low power programmable generator capable of producing pseudorandom test patterns with desired toggling levels and enhanced fault coverage gradient compared to the best-to-date BIST-based PRPGs is presented. It is comprised
of a linear finite state machine (a linear feedback shift register or a ring generator) driving an appropriate phase shifter, and it comes with a number of features
allowing this device to produce binary sequences with preselected toggling
(PRESTO) activity. A method to automatically select several controls of the generator is introduced that offers easy and precise tuning. The same technique is subsequently employed to deterministically guide the generator toward test sequences with improved fault-coverage-to-pattern-count ratios. Furthermore, a
low power test compression method is presented that allows shaping of the test
power envelope in a fully predictable, accurate, and flexible fashion by adapting
the PRESTO-based logic BIST infrastructure. The proposed hybrid scheme efficiently combines test compression with logic BIST, where both techniques can
work synergistically to deliver high quality tests.
Next, a novel deterministic built-in self-test (BIST) scheme is shown that allows
elevating of the compression ratios to values typically unachievable through conventional reseeding-based solutions. The solution seamlessly integrates with onchip EDT-based decompression logic and takes advantage of clustering of specified positions within ATPG-produced test cubes and similarities between test cubes that detect faults within a given subcircuit of the tested design. The presented
approach offers either reduced memory footprint to arrive with full test coverage
or offers faster test coverage ramp-up for the restricted memory capacity.
i
Finally, this work presents TestExpress – a novel scan-based DFT paradigm.
Compared to the conventional scan, the presented approach either significantly
reduces test application time while preserving high fault coverage, or allows applying much larger number of vectors within the same time interval. Power dissipation with TestExpress remains similar to that of the mission mode. Several techniques are introduced that allow easy integration of the proposed scheme with the
state-of-the-art test generation and application methods. In particular, the new
scheme uses redesigned scan cells to dynamically configure scan chains into different modes of operation for use with the underlying test-per-clock principle. All
of the presented solutions are verified with experimental results obtained for
large and complex industrial ASIC designs.
ii
Streszczenie
Bezprecedensowy rozwój technologii wytwarzania cyfrowych układów scalonych wielkiej skali integracji oznacza w szczególności wykładniczy wzrost złożoności produkowanych urządzeń – zjawisko powszechnie znane jako tzw. prawo
Moore’a. Taki stopień miniaturyzacji sprawia, że nawet niewielkie niedoskonałości procesu produkcji lub stres środowiskowy prowadzą do wystąpienia uszkodzeń i nieprawidłowej pracy układów. Technologie poniżej 32nm charakteryzują
się ponadto nowymi typami defektów, które w wielu przypadkach pozostają niewykrywalne dla metod testowania stosujących klasyczne modele uszkodzeń. Wykorzystanie nowych i bardziej złożonych modeli uszkodzeń, uwzględniających na
przykład strukturę układu na poziomie tranzystorów, oznacza z reguły istotne
zwiększenie ilości danych testowych, a zatem także wydłużenie czasu testowania,
nierzadko poza akceptowalne granice. Dodatkowo, ze względu na istotnie podwyższone zapotrzebowanie na energię w trakcie testowania, sterowanie pracą
układu w trybie testowym staje się odrębnym i złożonym zagadnieniem zarówno
teoretycznym jak i praktycznym. Istotnie, pobudzenia testowe mogą średnio
zwiększać zużycie energii dwu- lub trzykrotnie przy nawet 30-krotnym wzroście
chwilowym. W takich przypadkach możliwa jest degradacja testowanych układów, błędne klasyfikowanie układów poprawnych jako uszkodzonych, a nawet
ich nieodwracalne uszkodzenie. Wymagania stawiane nowym technologiom testowania kształtowane są także przez coraz większy udział elektroniki cyfrowej
w urządzeniach o krytycznym znaczeniu dla zdrowia lub bezpieczeństwa. Niezawodność techniki cyfrowej w medycynie, obronności, w systemach bezpieczeństwa transportu lotniczego, kolejowego i samochodowego lub w łączności bezprzewodowej jest w niekwestionowany sposób zależna od jakości testowania i
stałego monitorowania poprawności pracy układu. W pracy zaproponowano
nowe metody testowania układów i systemów cyfrowych, które w przekonaniu
autora pomagają w rozwiązaniu wielu z powyższych problemów.
W pierwszej części pracy zaproponowano programowalny, sprzętowy generator testów pseudolosowych umożliwiający kontrolę liczby przełączeń w wektorach testowych. Działanie generatora opiera się na właściwym sterowaniu rejestrów umieszczonych między rejestrem liniowym generatora i jego przesuwnikiem fazy. Wszystkie dane sterujące generatora są wyznaczane automatycznie w
oparciu o żądaną aktywność układu. Przedstawiono także metodę deterministycznego generowania danych sterujących w celu poprawy pokrycia uszkodzeń.
Jedna z możliwych realizacji prezentowanego układu umożliwia nie tylko gene-
iii
rowanie pseudolosowych wektorów testowych, ale także wsparcie kompresji danych testowych z zachowaniem obniżonego zapotrzebowania na energię. Dzięki
temu ten sam moduł pozwala połączyć zalety testowania deterministycznego
oraz autotestu.
W kolejnej części pracy przedstawiono metodę autotestowania deterministycznego wykorzystującego dane testowe zredukowane w stopniu nieosiągalnym dla stosowanych obecnie technik kompresji na potrzeby procedur testowych. Opracowana metoda rozszerza możliwości konwencjonalnego dekompresora danych testowych o funkcję programowalnego negowania wektorów testowych w dowolnie wybranym cyklu pracy. Właściwe wykorzystanie rozkładu wyspecyfikowanych bitów w ścieżkach testujących pozwoliło uzyskać bardzo wysoki stopień kompresji. Proponowane podejście jest pierwszym znanym autorowi rozwiązaniem, w którym aplikacja pobudzeń deterministycznych stała się
możliwa w trybie autotestu elastycznie zarządzającego ilością danych testowych.
Istotną przeszkodę w dalszym rozwoju efektywnych metod testowania stanowi obowiązujący od blisko 40 lat paradygmat projektowania układów łatwo testowalnych w oparciu o ścieżki testujące. Jakkolwiek konwencjonalna ścieżka testująca gwarantuje prosty dostęp do wszystkich zmiennych stanu w układzie, jej
użycie w systemach zawierających dziesiątki tysięcy elementów pamięci prowadzi do szybkiego wzrostu czasu, a zatem i kosztu, testowania. W ostatniej części
pracy zaproponowano rozwiązanie istotnie zwiększające efektywność wykorzystania czasu testowania w porównaniu z powszechnie stosowanymi metodami.
Nowe podejście umożliwia aplikację deterministycznych wektorów testowych
oraz rejestrację odpowiedzi układu w każdym cyklu zegara, co z kolei oznacza
możliwość użycia wielokrotnie większej liczby wektorów testowych niż było to
możliwe dotychczas przy założonych ograniczeniach czasowych. Prezentowane
rozwiązanie utrzymuje ponadto większość układu w trybie zbliżonym do funkcjonalnego, dzięki czemu energia na potrzeby testu pozostaje na poziomie bliskim
nominalnemu. Opracowane algorytmy automatycznego generowania testów,
tworzenia ścieżek testujących oraz wyboru efektywnych konfiguracji umożliwiają
zastosowanie nowej technologii we współczesnych układach scalonych wielkiej
skali integracji. Teza ta jak również wszystkie pozostałe rozwiązania szczegółowe
przedstawione w pracy zostały zweryfikowane i potwierdzone w trakcie rozległego programu badań eksperymentalnych przeprowadzonych z wykorzystaniem
aktualnie wytwarzanych układów i systemów cyfrowych oraz opracowanego
przez autora oryginalnego oprogramowania będącego nietrywialnym rozszerzeniem istniejących narzędzi komercyjnych.
iv
Acknowledgments
First and foremost, I would like to thank my Ph.D. supervisors, Professors Janusz Rajski and Jerzy Tyszer. Professor Rajski has taught me endlessly how good
research and engineering are done. I appreciate all his contributions of ideas,
time, and funding to make my Ph.D. experience productive and stimulating. The
joy and enthusiasm he has for research was contagious and motivational for me,
even during tough times in the Ph.D. pursuit. I am also thankful for the excellent
example he has provided as a successful scientist and industrial practitioner. I am
also very grateful to Professor Tyszer for his scientific advice, knowledge, and
many insightful suggestions and discussions we had. He was my primary resource for getting all my questions answered and was instrumental in helping me
crank out this thesis, everything in a very short period of time.
Many solutions and experimental results presented in my thesis would not
have been possible without productive discussions, help and assistance offered
by Dr. Grzegorz Mrugalski of Mentor Graphics Poland. He has been extremely
helpful and patient in providing advice virtually every day during my Ph.D. studies.
I would like to extend my thanks to all colleagues from Mentor Graphics Corporation who supported me in pursuing my doctoral degree. Some of them have
contributed immensely to my professional and personal time at Mentor Graphics.
In particular, I appreciate very much insightful suggestions and support received
from Dr. Chen Wang, Łukasz Rybak, Dr. Wu-Tung Cheng, and Dr. Xijiang Lin. They
all have been a source of friendship as well as good advice and collaboration. Finally, I gratefully acknowledge the funding that made my Ph.D. work possible. The
Mentor Graphics scholarship I have been receiving for four years allowed me to
attend many international conferences and to successfully join the international
test community. This work was also partially supported by the Semiconductor
Research Corporation.
Lastly, I would like to thank my family: my parents who supported me in all
my pursuits, and most of all, my loving, supportive, encouraging, and patient wife
Barbara whose faithful support during all stages of this Ph.D. process is so appreciated. Thank you.
v
Contents
Chapter 1
Introduction ....................................................................................................... 1
1.1
Preamble ...................................................................................................................... 1
1.2
Design for testability ............................................................................................... 2
1.3
Power consumption during test ......................................................................... 6
1.4
DFT prospects ............................................................................................................ 8
1.5
Motivation ................................................................................................................... 9
Chapter 2
Low-power hybrid BIST ............................................................................. 11
2.1
State of the art ......................................................................................................... 11
2.2
Low power random test generator ................................................................. 13
2.2.1 The basic architecture ......................................................................... 13
2.2.2 Fully operational generator ............................................................... 15
2.2.3 Experimental results ............................................................................ 18
2.2.4 Automatic selection of control values ........................................... 21
2.2.5 Validating experiments ....................................................................... 23
2.3
Improving fault coverage gradient ................................................................. 26
2.3.1 Guiding PRESTO with on-chip deterministic data .................... 27
2.3.2 Experimental results ............................................................................ 29
2.4
Hybrid BIST .............................................................................................................. 30
2.4.1 PRESTO-based low power decompressor ................................... 31
2.4.2 Encoding algorithm .............................................................................. 33
2.4.3 Experimental results ............................................................................ 37
2.5
Silicon area requirements .................................................................................. 38
Chapter 3
Deterministic BIST ....................................................................................... 41
3.1
Motivation - compression-based BIST .......................................................... 42
3.2
The new deterministic BIST architecture .................................................... 44
3.3
Implementation flow ............................................................................................ 46
3.3.1 Comprehensive search ........................................................................ 46
3.3.2 Iterative approach ................................................................................. 48
vi
3.4
Experimental results ............................................................................................ 49
3.5
Remarks .................................................................................................................... 54
Chapter 4
Deterministic test-per-clock scan-based scheme ............................. 57
4.1
Introduction ............................................................................................................. 57
4.2
TestExpress architecture .................................................................................... 60
4.3
Scan stitching........................................................................................................... 65
4.4
Selecting test configurations ............................................................................. 68
4.5
Test generation ....................................................................................................... 70
4.6
Experimental results ............................................................................................ 72
4.6.1 Applicability of test data compression .......................................... 73
4.6.2 Power consumption .............................................................................. 74
Chapter 5
Conclusions ..................................................................................................... 81
References ................................................................................................................................ 84
vii
List of Abbreviations
Abbreviation
Description
ATE
ATPG
BIST
CAD
CMOS
CUT
DFT
DPM
DUT
EDA
EDT
HC
IC
LBIST
LFSR
LP
LPT
MISR
PRESTO
PRPG
ROM
SCOAP
Automatic test equipment
Automatic test pattern generator
Built-in self-test
Computer-aided design
Complementary metal-oxide semiconductor
Circuit under test
Design for testability
Defects per million
Design under test
Electronic design automation
Embedded deterministic test
Hold cycle
Integrated circuit
Logic built-in self-test
Linear feedback shift register
Low-power
Low-power template
Multiple-input signature register
PRESelected TOggling
Pseudo-random pattern generator
Read-only memory
Sandia Controllability/Observability Analysis Program
System on a chip
Self-testing using MISR and parallel shift register sequence generator
Test pattern generator
Toggle cycle
Very-large-scale integration
Weighted switching activity
Weighted transition metric
SoC
STUMPS
TPG
TC
VLSI
WSM
WTM
The most important conference names:
ATS
DAC
viii
IEEE Asian Test Symposium
ACM/IEEE Design Automation Conference
DATE
ETS
ICCAD
ICCD
ITC
VTS
Design Automation and Test in Europe
IEEE European Test Symposium
ACM/IEEE International Conference on ComputerAided Design
IEEE International Conference on Computer Design
IEEE International Test Conference
IEEE VLSI Test Symposium
ix
Chapter 1 Introduction
Introduction
1.1
Preamble
Through the years, electronic devices have been increasingly affecting every aspect
of our lives, changing the way we work, communicate, travel, and entertain. Recently, the number of devices connected to the Internet outnumbered the human
population and is still growing [19]. To keep up with the ongoing Digital Revolution
and satisfy the demand for storage, computing and networking resources, integrated circuits (IC) undergo continuous shrinkage, performance improvement and
cost reduction. As observed by Gordon E. Moore [70], the complexity of integrated
circuits doubles every two years. The major enabling factors for this trend, known
as Moore’s Law, are novel techniques being uninterruptedly introduced to the silicon manufacturing processes.
While continuous miniaturization enables production of ever more sophisticated
devices, it also increases the probability of manufacturing defects, and consequently
the cost of the semiconductor test. In a contemporary production process of 22nm
and smaller, the feature size is measured in tens of atoms, while the transistor isolation layers may be a few atoms thick [17]. Although various techniques enabled
the use of the typical 193-nm wavelength argon fluoride laser to achieve these dimensions, the resulting shapes are deformed in a probabilistic manner. Therefore,
even subtle manufacturing errors or imperfections may cause the process variability margins to be exceeded, resulting in a faulty circuit. Additionally, environmental
variations in supply voltage or temperature during device operation or test can
cause uncertainty in electrical characteristics of the integrated circuit. Moreover,
the new production techniques introduce defects that are not detectable by using
the fault models that were successfully applied for the preceding technology nodes.
Thus, more sophisticated test methods are required to guarantee identification of
faulty circuits as soon as possible. It is crucial since the cost of detecting a fault increases by an order of magnitude at each stage of manufacturing (as stated by the
rule of ten [107]). All these factors contribute to the total cost of test which over the
years has become the single most costly component of the production process of
high performance very large scale integration (VLSI) designs. Although test methods
successfully pursued the advances in production processes, current testing techniques may quickly prove to be inadequate for future technology nodes.
1
As our dependability on the ubiquitous electronic devices grows, the reliability
and robustness over their expected lifetime becomes an important concern. These
aspects are especially pronounced if we consider the safety-critical applications including automotive, medical, public infrastructure or home automation. Thus, faults
introduced by the material aging and wear-out need to be monitored during the device lifespan. This is commonly achieved by using on-chip logic built-in self-test
(BIST) infrastructure. However, the quality of typical BIST schemes is often insufficient for the safety-critical applications. Both issues – the growing manufacturing
cost and the lifetime reliability - are the crucial aspects that need to be addressed by
new testing technologies.
1.2
Design for testability
First reliable methods for testing combinational logic were introduced in the 60’s
[93]. However, with the growth of the integrated circuits complexity, and in particular with the increased number of sequential elements, controlling and observing
the internal states became very cumbersome. Indeed, the sequential test generation
algorithms were unable to efficiently handle the large, and continually growing state
space. As a matter of fact, only the integration of both the design and test phases of
an integrated circuit development process made the final VLSI devices testable. Initially, the ad hoc design for testability (DFT) methods were introduced to assist in
testing parts of the integrated circuit that are difficult to control or observe. Typically, they constituted a set of design rules and a modest amount of design modifications, e.g., test point insertion for direct access to the internal nodes. The selection
of a particular method was directly dependable on the design.
Figure 1.1: D-multiplexed scan cell.
On the other hand, the structural DFT techniques that emerged in the late 60’s
are universally applicable to a wide range of digital integrated circuits. The most
prominent representative of this trend is scan [28], [31], [56]. It enables direct controllability and observability of all sequential elements within the design under test
(DUT) by converting them into a shift register accessible from outside the circuit
2
during a test mode. In order to implement such functionality, every flip-flop needs
to be redesigned into a scan cell. A typical D-multiplexed scan cell is shown in the
Figure 1.1. It supports two modes of operation. In the scan shift mode, when a scan
enable (SE) signal is asserted, every flip-flop of the sequential logic is transformed
into a stage of a shift register connected to the preceding scan cell or a primary input
(if it is the first flip-flop in the scan chain) through the SI input. On the other hand,
during the capture mode (SE=0), scan cell captures the next state of the device
through the D input, the same way as during the functional operation. Access to the
internal state through a scan enables testing sequential circuits using much more
mature and efficient techniques developed for combinational circuits. The high controllability and observability of internal nodes enables automatic test generation of
high quality test. It also provides good diagnostic capabilities, thus enabling fast defect localization crucial for improving the production yield. Scan is also helpful in
verification and debugging silicon prototypes. Moreover, the structured architecture of scan simplifies their automated stitching and insertion, and over the years
has become the foundation for many other advanced DFT techniques.
Despite scan’s inherent simplicity and benefits, its adoption was ambiguous as
scan cells introduce area overhead (about 7-20% real estate) as well as performance
degradation due to additional multiplexers included in the designs functional paths.
Initially, in the mid-70’s only a few mainframe and minicomputer manufacturers,
including IBM, NEC, and Fujitsu [109], employed the structured DFT. For many silicon industry companies the return on investment of scan was not viable. Instead,
they chose to use the ad hoc techniques together with functional test.
In order to restrict the overhead of full-scan techniques, partial scan [3] was considered an alternative. Various methods proposed for effective selection of flip-flops
to include in the scan chain ensured decent test quality while keeping the costs low.
However, the test flow with non-structured techniques, functional test or partial
scan was increasingly laborious and did not meet the quality and time to market
requirements. Eventually, with an advent of deep submicron technologies, the cost
of silicon real estate decreased, and shift to the full-scan design became evident.
Augmented by various electronic design automation (EDA) tools, scan design became one of the most influential DFT techniques, and it provides a way to produce
testable and reliable semiconductor devices. It is worth noting that the non-structured approaches, such as test point insertion, are still used to support the scan operation. Also, partial-scan is sometimes applied to the most performance-sensitive
blocks of integrated circuits.
As the silicon manufacturing process progresses, the number of flip-flops in the
ever more complex designs increases exponentially, posing new challenges to the
scan-based test. Typically, scan chains are interfaced with an external tester which
supplies them with test data. It is worth noting that the cost of automatic test equip-
3
ment (ATE) depends on its memory capacity. Since every scan cell is assigned a certain value by test vectors, the test data volume expands at the same pace or even
faster due to additional fault models. Moreover, the number of scan chain inputs and
outputs needs to be kept within the design limits. As a result, the growing scan
chains length significantly elevated the test application time. These two factors had
a profound impact on the economics of integrated circuit manufacturing. The
roadmap published by the Semiconductor Industry Association [32] in 1997 reported that the test capital costs, normalized per transistor, had been relatively flat
for almost two decades, after the adoption of scan test. At the same time, the cost of
fabricating a transistor was steadily decreasing, following the Moore’s Law [70]. The
alarming conclusion was that in the following years it might cost more to test a transistor than to manufacture it. The discussed trends are shown in Figure 1.2.
100
Cents (US)/transistor
10-1
10-2
SI capital/transistor
10-3
10-4
Test capital/transistor
10-5
10-6
1982
Year
1985
1988
1991
1994
1997
2000
2003
2006
2009
2012
Figure 1.2: Cost of manufacturing and test according to 1997 SIA Roadmap.
One method to solve the testing cost issues was to test the integrated circuit by
means of additional facilities embedded into the same chip, without the need for a
sophisticated external tester. Such an approach, called built-in self-test (BIST), employs on-chip pattern generation, clocking, test response evaluation and diagnostic
test functionality. Applied initially for the board-level testing, it was embraced in
various forms for the manufacturing and in-field testing of integrated circuits. The
most commonly deployed self-testing using multiple-input signature register
(MISR) and parallel shift register sequence generator (STUMPS) [7] architecture is
shown in Figure 1.3. It comprises a pseudo-random pattern generator (PRPG), typically implemented as a linear feedback shift register (LFSR) or a ring generator [80],
that provides stimuli to a number of scan chains. The test responses are then compacted in a MISR. At the end of the test session the signature is verified. The design
under test often needs to be specifically prepared for BIST. It involves insertion of
test points to target the random pattern resistant faults and application of the Xbounding logic to prevent propagation of the unknown states to the MISR. BIST was
4
expected to eliminate the need for test vector development, to allow for the use of
inexpensive ATE, and to enable robust at-speed testing. However, with the development of advanced automatic test pattern generation (ATPG) methods, test quality
offered by BIST was often unable to meet the growing requirements. Despite many
research efforts aimed at reducing the shortcomings of BIST, either area overhead,
fault coverage or diagnostic capabilities were unacceptable to fully replace the deterministic test. Still, however, BIST is a vital tool for in-field testing of the missioncritical applications. Moreover, it is very well suited for testing well-structured devices, hence its wide adoption in memory test.
Figure 1.3: STUMPS architecture.
A new emerging technology that defused the test cost crisis in the early 2000s
was test data compression [9], [87], [96]. It leverages the low fill rates of ATPG test
patterns to arrive with over 100x reduction of test data volume. A general architecture of a typical compression scheme is shown in Figure 1.4. It uses an additional
on-chip hardware to decompress the ATE-stored test data before they are fed to
scan chains, and to compact [10], [69], [72], test responses before they reach back
the ATE for verification. As the number of decompressor input channels can be much
smaller than the number of its outputs, the scan chain length can be decreased, thus
Figure 1.4: Test data compression and compaction.
5
reducing test application time. In contrast to the typical BIST approach, test compression applies the ATPG-generated deterministic test patterns to come up with
very high test quality. Moreover, since test compression is typically built on top of a
standard scan-based infrastructure, its adoption for a wide range of integrated circuits was relatively straightforward.
The economic impact of test data compression is shown in Figure 1.5 as a comparison of two averaged trends represented by a revenue per transistor and an ATE
cost per transistor in US Dollars [91]. The technological progress is expressed here
as a cumulative number of manufactured transistors in the integrated circuits. The
realignment of both curves after introduction of test compression is clearly visible.
Since then, the relation between manufacturing cost and test cost has been stable
for the following decade.
1.3
Power consumption during test
Test data volume and test application time are the major contributors to the cost of
test. However, reliable testing must also consider power consumption during test.
In complementary metal-oxide semiconductor (CMOS) devices, power dissipation
comprises the static and dynamic components. Static consumption occurs when all
circuit states and inputs are held at the same level. It includes subthreshold conduction between the transistor source and drain, tunneling currents through the gate
insulator and leakage current through the reverse-biased parasitic diodes between
diffusion regions and the substrate. This is, however, the dynamic component related to the state changes that dominates the overall consumption of a typical CMOS
10-3
Cost per transistor [USD]
10-4
10-5
IC revenue per transistor
10-6
10-7
ATE cost per transistor
10-8
10-9
10-10
10-11
1015
1016
1017
1018
1019
Cumulative IC Transistors
Figure 1.5: Impact of test data compression on test cost.
6
1020
device. The state switching has an impact on the short-circuit current which is present when two complementary transistors turn on briefly at the same time, as well
as the switching current required to charge the capacitances of the internal or external nodes. As a result, the number of state transitions within a circuit is directly
proportional to the power consumption.
The emphasis on power consumption of the electronic products is influenced by
the large number of battery-powered mobile devices, wireless sensor networks, biomedical applications, but also high performance server processors has driven the
development of various low-power IC design techniques. During test, however, the
pseudo-random nature of BIST, as well as the commonly used random-fill of unspecified values in deterministic test vector, significantly elevates the switching activity.
Full-toggle scan patterns may dissipate several times higher power when compared
to functional operation (up to threefold increase was observed in [117]), and this
trend continues to grow. As a result, exceeding the power consumption during test
beyond the mission mode limits has been of concern for years. The test-induced
power issues are a result of multiple phenomena. Power consumption may rise as
clock gating or clock and voltage scaling facilities are often disabled to improve the
quality of test. The uncorrelated structural test vectors often ignore the functional
power constraints and apply the non-functional clocking procedures. Moreover,
since the essence of the DFT techniques is to stress the circuit, and stimulate as many
nodes in the shortest time as possible, very often multiple system on a chip (SoC)
cores are tested in parallel, while the circuit under test (CUT) may not be designed
to function properly in such scenarios. Also, as the structural test vectors sets can be
unaware of the legal states of tested circuit, they can cause, in the extreme cases,
short circuits between the power supply and ground as a result of, for example, a
bus contention. This in turn can permanently damage a device.
Another source of issues related to power consumption during test is the interface between CUT and the tester power supply which might provide limited power
availability and quality. Various types of power noise (power droop) are associated
with the dynamics of the power demands and are caused by the parasitic components of interconnects and power delivery networks. They may lead to voltage drop
or rise in the whole circuit or in some specific areas. In at-speed testing the IR-drops
on the power grid increase path delays. Therefore, power delivery issues can cause
a functional device to act faulty, thus producing a yield loss.
The most common metrics that describe the power dissipation during test are
energy, peak power and average power. Energy is the total switching activity recorded throughout the test session and is important for the battery powered self-test
scenarios. Peak power is the highest power value generated at a given instant. Elevated peak power over a number of cycles can lead to permanent damage, severe
reliability issues or product lifetime degradation. Average power describes the av-
7
erage power distribution during test. High average power increases the temperature of the CUT, which when not properly dissipated causes hot spots and structural
damage in the silicon or bonding wires. Moreover, the increased and non-uniformly
distributed die temperature also affects the carrier mobility, which can manifest itself as delay faults observable only during test.
1.4
DFT prospects
The advances in manufacturing processes enable successful deployment of subsequent technology nodes. Recent innovations include the shift to three dimensional
transistors (e.g., FinFET) to overcome the limitations of electrostatics and shortchannel effect, the use of low-k inter-metal dielectrics or air-gapped interconnects
to reduce parasitic capacitances, novel doping techniques and many more [82]. This
evolution introduces, however, new test challenges. In particular, the dramatic reduction of feature sizes leads to the increase of defect types that are not targeted by
the common fault models. As a result, many defective parts escape testing. One of
the solutions is to shift away from the abstract fault models to the more fine-grained
techniques. For example, cell aware test [43] enables post-layout transistor-level
ATPG. The new fault modeling methods are essential to detect defects in the contemporary process nodes. However, they come at the price of elevated pattern
counts which directly translates into the ATE volume and test application time. As
the demand for the increased tester memory is reduced due to test data compression and advances in the ATE industry, the test application time becomes the main
contributor to the overall test cost. For example, the number of cell-aware defects
is, on the average, four times greater than the number of stuck-at faults, which translates into over 50% more test patterns.
The requirements for the new silicon test technologies are also shaped by the
safety-critical applications, such as automotive or medical. They often call for 0 defects per million (DPM) quality goal, which requires virtually 100% fault coverage.
It is worth noting that in order to cover the final few percent of the faults, the number of test patterns often needs to be doubled or even tripled. Obviously, for the noncritical applications, going beyond the point of diminishing returns is often not necessary. However, with the 2.5D or 3D integrated circuits, where silicon dies are interconnected through an interposer or through silicon vias the DPM levels need to
be very low. It is crucial to reject faulty dies before they are subject to an expensive
integration and packaging.
With the large number of low-power integrated circuits produced worldwide, the
power consumption during test is another significant issue. Multiple power-management techniques are employed to meet power margins during the operation of
the device. During scan-based test however, as described in the previous section, the
switching activity associated with shifting and applying the test data often exceeds
8
the limits which the circuit under test was designed to function under. The resulting
power delivery and thermal issues, may lead to malfunctioning of the device during
test, thus reducing yield. Moreover, the device reliability can degrade causing
shorter lifetime. Eventually, the testing-induced silicon hot spots can permanently
damage the device.
1.5
Motivation
There are a number of challenges for testing that emerge with current and future
technology nodes. The new fault models that are developed to sustain the low defect
per million (DPM) rates of modern VLSI circuits can greatly increase the number of
test patterns and consequently the costly tester time. Moreover, lifetime reliability
is a key requirement for the increasing number of critical applications. Also, power
consumption during test requires thorough control to prevent test-induced yield
loss and device degradation. As a result, new DFT techniques need to reduce the ATE
time, provide high quality in-field testing capabilities, and allow accurate power control during test application.
One approach to tackle these requirements is a synergistic merge of BIST with
test compression. With both capabilities available, the tradeoff between cost and
quality can be easily adjusted depending on a product type. For example, BIST can
be used for burn-in testing or at the selected stages of the manufacturing process.
Since there is no need for sophisticated ATE when BIST is applied, the testing cost is
reduced. On the other hand, when the highest quality is needed, the compressed topup patterns can be applied from an ATE or on-chip memory. Moreover, the BIST
module can be used in-field for online or offline testing in order to ensure correct
functioning throughout the chip lifetime. If both capabilities – BIST and compression
– could be realized by a shared hardware, the area overhead would be minimal. Supported by the dedicated software, this flow can become easily adoptable.
Recently, various solutions addressing the aforementioned problems were proposed by the author [30], [71], [73], [74], [75], [76], [77], [78], [79], and are gathered
herein with a comprehensive description and analysis. The thesis is organized as
follows. Chapter 2 describes the new low-power PRESTO scheme. Depending on the
requirements for test quality it can operate in one of three modes: a standalone
PRPG, a PRPG supported by a minimal amount of deterministic test data to improve
fault coverage gradient, and a hybrid approach that combines the advantages of
BIST and test data compression. The scheme is accompanied by a number of dedicated algorithms for automatic selection of setup values, generation of deterministic
control to improve fault coverage, and compaction of the ATPG-generated test vectors. In all operational modes it allows for precise selection of toggling rates. Chapter
3 is devoted to a fully deterministic BIST solution that stores compressed test data
on-chip. The approach extends the typical compression techniques to minimize test
9
data volume while preserving the complete fault coverage. Alternatively, given a
memory usage it allows for faster test coverage ramp-up and returns higher test
coverage compared to the reference compression scheme. Finally, a novel testing
paradigm called TestExpress is presented in Chapter 4. It is a scan-based approach
that allows for time-efficient application of deterministic tests in a test-per-clock
fashion. Every scan chain can be dynamically configured into one of three operational modes: stimuli, compaction or functional. Specialized algorithms for test generation, selection of scan chain modes, and scan chain stitching are implemented to
enable comprehensive application of the proposed technique. Additionally, TestExpress offers inherent low-power capabilities by setting most of the state elements of
a DUT to the functional mode. The chapter includes discussion of various practical
aspects regarding the proposed scheme. The thesis concludes with Chapter 5. Every
solution proposed in this thesis is thoroughly verified with experimental results obtained for large and complex industrial ASIC designs.
10
Chapter 2 Low-power hybrid BIST
Low-power hybrid BIST
In the following sections, a PRPG for low-power BIST applications is proposed. The
generator primarily aims at reducing the switching activity during scan loading due
to its PRESelected TOggling (PRESTO) levels. It can assume a variety of configurations that allow a given scan chain to be driven either by a PRPG itself or by a constant value fixed for a given period of time. Not only does the PRESTO generator
allow loading scan chains with patterns having low transition counts, and thus significantly reduced power dissipation, but also it enables fully automated selection
of its controls such that the resultant test patterns feature desired, user-defined toggling rates. This flexible programming can be further used to produce tests superior
to conventional pseudorandom vectors with respect to a resultant fault-coverageto-test-pattern-count ratio. Then, it is shown that the PRESTO generator can also
successfully act as a test data decompressor, thus allowing one to implement a hybrid test methodology that combines Logic BIST (LBIST) and ATPG-based embedded test compression. This is the first low-power (LP) test compression scheme that
is integrated in every way with the BIST environment and lets designers shape the
power envelope in a fully predictable, accurate, and flexible fashion. As a result, it
creates an environment that can be used to arrive at an efficient hybrid solution
combining advantages of scan compression and logic BIST. Moreover, both techniques can complement each other to address, for example, a voltage drop caused
by a high switching activity during scan testing, constraints of at-speed ATPG-produced test patterns, or new fault models.
2.1
State of the art
Power reduction techniques for scan based test have become a vital research area
[38]. The ad-hoc solutions implemented by the industry include oversizing the
power supply networks or packages, as well as decreasing the shift frequency or
introducing breaks in the test procedure. Numerous solutions were devised for general scan-based testing or specifically for BIST applications in order to keep the
switching activity below a given threshold. One of the most significant contributors
to the power consumption during test is shifting pseudo random data through the
scan chains. The multitude of state transitions propagate not only within scan chains
but also through a combinational logic. Preventing such propagation, by modifying
scan cell behavior, leads to reduction of test power. This can be achieved by inserting
gating logic between scan cell outputs and logic they drive [45], [84], [115]. A more
area friendly design has been proposed in [11] where a single power supply gating
transistor is added to the first-level gates at the output of scan cells. A test vector
11
inhibiting scheme of [36], as well as the pattern suppression technique of [33], mask
test patterns generated by an LFSR as not all produced vectors, often very lengthy,
detect faults. Elimination of such tests can reduce switching activity with no impact
on fault coverage. For an increased granularity, a synergistic test power reduction
method of [116] uses available on-chip clock gating circuitry to selectively disable
scan chains while employing test scheduling and planning to further decrease BIST
power in the Cell processor. Supported by a test planning technique that manages
the configuration of the scan clock gating logic, a pseudorandom BIST can obtain 4060% power reduction [49]. Also partitioning the scan path into a number of segments which are shifted one after the other while the inactive chains are not clocked
can keep the power consumption low [12], [108].
The advent of low-transition test pattern generators (TPG) has added a new dimension to power aware BIST solutions [95]. The hybrid cellular automata based
generators can be optimized for power consumption by selection of their non-linear
versions [22]. For example, a device presented in [106] employs an LFSR to feed
scan chains through biasing logic and T-type flip-flop. Since this flip-flop holds the
previous value until its input is asserted, the same value is repeatedly scanned into
scan chains until the value at the output of biasing logic (e.g., a k-input AND gate)
becomes 1. Depending on k, one can significantly reduce the number of transitions
occurring at the scan chain inputs. The adaptive version of this approach, with an
additional feedback signal from scan chains to the TPG, has been proposed in [68].
A dual-speed LFSR of [105] consists of two LFSRs driven by normal and slow clocks,
respectively. The switching activity is reduced at the circuit inputs connected to the
slow-speed LFSR, while the whole scheme still ensures satisfactory fault coverage.
Another approach uses mask patterns to mitigate the switching activity in LFSRproduced patterns as shown in [92], whereas a bit swapping of [2] achieves the same
goal by equipping the LFSR with additional multiplexers. A gated LFSR clock of [37]
allows activating only half of LFSR stages at a time. It cuts power consumption as
only half of the circuit inputs change every cycle. Combining the low transition generator of [106] (handling easy-to-detect faults) with a 3-weight PRPG (detecting
random pattern resistant faults) can reduce BIST switching activity and provide
good fault coverage, as demonstrated in [104]. The scheme of [54] monitors the
number of LFSR transitions and suppresses them before feeding the scan chain
whenever a specified transition limit is exceeded. Combining a bipartite LFSR technique with bit injection [99] can be used to generate highly correlated, and thus lowtoggling test patterns for testing both combinational and sequential circuits. Finally,
a random single-input change generator can produce low power patterns in a parallel BIST environment, as shown in [13].
12
2.2
Low power random test generator
2.2.1
The basic architecture
Figure 2.1 illustrates the basic structure of a PRESTO generator. An n-bit PRPG connected with a phase shifter feeding scan chains forms a kernel of the generator producing the actual pseudorandom test patterns. A linear feedback shift register or a
ring generator can implement a PRPG. More importantly, however, n hold latches
are placed between the PRPG and the phase shifter. Each hold latch is individually
controlled via a corresponding stage of an n-bit toggle control register. As long as its
enable input is asserted, the given latch is transparent for data going from the PRPG
to the phase shifter, and it is said to be in the toggle mode. When the latch is disabled,
it captures and saves, for a number of clock cycles, the corresponding bit of PRPG,
thus feeding the phase shifter (and possibly some scan chains) with a constant value.
Figure 2.1: The basic architecture of PRESTO generator.
It is now in the hold mode. It is worth noting that each phase shifter output is obtained by XOR-ing outputs of three different hold latches. Therefore, every scan
chain remains in a low-power mode provided only disabled hold latches drive the
corresponding phase shifter output.
As mentioned above, the toggle control register supervises the hold latches. Its
content comprises 0s and 1s, where 1s indicate latches in the toggle mode, thus
transparent for data arriving from the PRPG. Their fraction determines a scan
switching activity. The control register is reloaded once per pattern with the content
13
of an additional shift register. The enable signals injected into the shift register are
produced in a probabilistic fashion by using the original PRPG with a programmable
set of weights. The weights are determined by four AND gates producing 1s with the
probability of 0.5, 0.25, 0.125, and 0.0625, respectively. The OR gate allows choosing
probabilities beyond simple powers of 2. A 4-bit register Switching is employed to
activate AND gates, and allows selecting a user-defined level of switching activity.
For example, the switching code 2 (or 0010 in binary representation) will set to 1,
on the average, 25% of the control register stages, and thus 25% of hold latches will
be enabled. Given n weights determined by n AND gates, the probabilities of injecting 1 into the shift register can be described using an inclusion-exclusion principle
formula for the probabilities:
𝑛
𝑃𝑆 = ∑ (−1)𝑘−1
𝑘=1
(
∑
𝜌 (𝑆)
∑ 𝑖 𝑖
,
𝐼∈{1,…,𝑛} 𝑖∈𝐼
|𝐼|=𝑘
2
(1)
)
where the internal sum iterates over all subsets of the set of indices of size k, and
𝜌𝑖 (𝑆) equals 1, if the digit at index i in the binary representation of S is 1, and 0 otherwise. The resulting probabilities for the 4 AND gate scheme shown in Figure 2.1
are gathered in Table 2.1. Given the phase shifter structure, one can assess then the
number of scan chains receiving constant values, and thus the expected toggling ratio.
Table 2.1: The probabilities of injecting 1 into the shift register.
Switching register
value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
14
1-probability
0.5
0.25
0.625
0.125
0.5625
0.34375
0.671875
0.0625
0.53125
0.296875
0.6484375
0.1796875
0.58984375
0.384765625
0. 69238281
An additional 4-input NOR gate (Figure 2.1) detects the switching code 0000
which is used to switch the LP functionality off. It is worth noting that when working
in the weighted random mode, the switching level selector ensures statistically stable content of the control register in terms of the amount of 1s it carries. As a result,
roughly the same fraction of scan chains will stay in the LP mode, though a set of
actual low toggling chains will keep changing from one test pattern to another. It
will correspond to a certain level of toggling in the scan chains. With only 15 different switching codes, however, the available toggling granularity may render this solution too coarse to be always acceptable. Moreover, with low toggling rates selected, the majority of scan chains would hold the constant value throughout the
whole pattern which can result in excessive reduction of fault coverage. The following section presents additional features that make the PRESTO generator fully operational in a wide range of desired switching rates.
2.2.2
Fully operational generator
Much higher flexibility in forming low-toggling test patterns can be achieved by deploying a scheme presented in Figure 2.2. Essentially, while preserving the operational principles of the basic solution of Figure 2.1, this approach splits up the shifting period of every test pattern into a sequence of alternating hold and toggle intervals. To move the generator back and forth between these two states, a T-type flipflop is used that switches whenever there is a 1 on its data input. If it is set to 0, the
generator enters the hold period with all latches temporarily disabled regardless of
the control register content. This is accomplished by placing AND gates on the control register outputs to allow freezing of all phase shifter inputs. This property can
be crucial in SoC designs where only a single scan chain crosses a given core, and its
abnormal toggling may cause locally unacceptable heat dissipation that can only be
reduced due to temporary hold periods. If the T flip-flop is set to 1 (the toggle period), then the latches enabled through the toggle control register can pass test data
moving from the PRPG to the scan chains.
Two additional parameters kept in 4-bit Hold and Toggle registers determine
how long the entire generator remains either in the hold mode or in the toggle mode,
respectively. To terminate either mode, a 1 must occur on the T flip-flop input. This
weighted pseudorandom signal is produced in a manner similar to that of weighted
logic used to feed the shift register. The T flip-flop controls also four 2-input multiplexers routing data from the Toggle and Hold registers. It allows selecting a source
of control data that will be used in the next cycle to possibly change the operational
mode of the generator. For example, when in the toggle mode, the input multiplexers
observe the Toggle register. Once the weighted logic outputs 1, the flip-flop toggles,
and as a result all hold latches freeze in the last recorded state. They will remain in
15
Figure 2.2: Fully operational version of PRESTO.
this state until another 1 occurs on the weighted logic output. The random occurrence of this event is now related to the content of the Hold register which determines when to terminate the hold mode.
A scan switching profile when deploying the PRESTO generator in a hypothetical
environment with 15 scan chains is illustrated in Figure 2.3 for two test patterns.
Blue (0s) and red (1s) stripes make up the low power-toggling pattern, while gray
areas correspond to periods of toggling. All-blue and all-red scan chains are fed by
the constant values only. Note that their quantity does not change between patterns
though they are not exactly the same in each case. As can be seen, test patterns are
divided into hold and toggle intervals of random length, while LP scan chains remain
still for the entire duration of a single test pattern.
Figure 2.3: Switching activity in scan chains.
16
In scan-based testing, the overall power demand can be assessed by the number
of transitions passing through a scan chain while shifting. The switching activity
during the shift-in operation depends on the relative position of the transition in a
test pattern. Clearly, the transition that needs to propagate throughout the whole
scan chain invokes much more toggles than a transition at the scan cells close to the
input. The Weighted Transition Metric (WTM) counts the transitions in scan cells
taking into account they relative positions [94], and is defined as follows:
𝑑−1
−1
𝑃 = 2[𝑑(𝑑 − 1)]
∑(𝑑 − 𝑖)(𝑏𝑖 ⊕ 𝑏𝑖+1 ) ,
(2)
𝑖=1
where d denotes the length of scan chain, and T=bd…b2b1 represents a test vector
with bit bk scanned in before bk+1. The power dissipated during test application is
the average of the above results over all scan chains. Typically, a randomly generated test pattern results in a 50% WTM. Although the above formula describes shiftin operation only, it has been shown to have good correlation with the power consumption during capture and test response shift-out.
The actual switching activity results are presented in Figure 2.4. They were obtained from Monte Carlo simulations run for a PRESTO generator employing a 32bit ring generator feeding 128 scan chains, each 128-bit long, and using the switching value S=1, i.e., injecting 1s into the shift register with the probability of 0.5. In
other words, the average number of 1s within the shift and control register was 16.
Also the diagram consists of 7 individual curves corresponding to 7 different values
of the toggle (T) parameter. Finally, the switching activity is plotted against the hold
(H) parameter, ordered by the probability of injecting 1 into the T flip flop (as shown
in Table 2.1). As can be seen, the PRESTO generator offers a comprehensive range
of switching rates one can choose from. For example, if T = H, i.e., when the durations
of Toggle and Hold periods are stochastically equal, then the switching activity codes
determine certain toggling levels that can be regarded as reference values (here
roughly 22%). As can be seen, for a given T curve, increasing the probability associated with the H value, i.e., the probability of ending the Hold phase, results in increased average switching activity. Thus, with the help of parameters H and T one
can migrate either up or down from these thresholds to arrive with a desired
amount of toggling, while trading-off other performance-related characteristics
such as state coverage, average and peak power consumption, fault coverage, and
test length, as discussed in the next sections.
When using the PRESTO generator with an existing DFT flow, all LP registers are
either loaded once per test or every test pattern. The registers loaded only once act
as test data registers or are parts of an IEEE P1687 [48] network, and are initialized
by the test setup procedure. They are triggered using a slow scan shift clock and
operate at a very low speed thereby imposing no timing constraints. Although the
remaining registers are loaded once per test pattern (also at the scan shift speed),
17
timing is not compromised because of shallow logic generating bits to be loaded serially into the registers. With the help of shadow registers, values remain unchanged
during capture. Clearly, it suits LBIST applications where the shift speeds are quite
high.
50
45
T
40
8
WTM [%]
35
30
4
25
12
20
2
15
14
10
13
5
15
0
8
4
12
2
10
6
14
1
9
5
13
3
11
7
15
H
Figure 2.4: Switching activity for S=1.
2.2.3
Experimental results
This section presents experimental results obtained for the PRESTO generator
shown in Figure 2.2 and eight industrial designs whose characteristics are given in
Table 2.2. For each test case, the table provides the number of gates, the number of
scan chains, and the size of the longest scan chain. Furthermore, the column TC reports the resultant test coverage after applying 128k purely pseudorandom test patterns produced by the PRESTO generator with its all low power features disabled.
The next column (EP) lists the corresponding number of test patterns that effectively contributed to that level of fault coverage. Finally, the last two columns provide the weighted transition metric (WTM load) for scan shift-in operations and the
Table 2.2: Circuit characteristics – 128k patterns.
Gates
# scans
longest
chain
TC [%]
EP
WTM load
WSA [%]
D1
590k
35
686
90.59
7,583
49.84
21.80
D2
830k
84
416
91.07
13,161
49.75
25.04
D3
500k
128
353
85.36
9,362
49.71
18.94
D4
1.4M
160
541
93.06
10,688
49.84
15.05
D5
1.3M
203
300
91.18
17,066
49.67
22.51
D6
220k
122
104
92.63
3,450
49.06
15.27
D7
1.9M
524
258
85.89
19,929
49.60
28.42
D8
3.6M
104
3,218
84.51
16,458
49.98
11.98
18
weighted switching activity (WSA) during the capture operation. As can be seen,
WTM remains close to 50%, as typically observed in scan vectors produced in a
pseudorandom fashion.
The primary objective of the experiments was to measure test coverage as a function of several parameters, including:
•
the number of test patterns,
•
the switching code,
•
the duration of Toggle (T) period, and
•
the duration of Hold (H) period.
The actual results are presented in Table 2.3 and Table 2.4 for the industrial designs of Table 2.2. In all experiments reported in the remaining part of this section,
the PRESTO generator with a 32-bit ring generator producing 128k pseudorandom
test patterns in a low power mode were used. Table 2.3 is vertically partitioned into
three sections corresponding to the following target toggling rates: 5%, 10%, and
15%. Four different switching activity codes as well as appropriate parameters H
and T decide on approximate values of switching rates, as shown in the previous
sections. The actual toggling observed during the experiments for our three target
rates ended up within the following intervals: (14.16 – 16.51), (9.12 – 10.56) with a
few exceptions going down to 7%, and (3.92 – 5.88), respectively. All the PRESTO
setup data deployed during the experiments are displayed in the table header. The
columns of Table 2.3 list the fault coverage for successive test cases. As can be seen,
the resultant fault coverage remains close to the reference coverage reported in Table 2.2, while the switching activity is significantly reduced in all examined cases.
Note that several experimental results indicate higher fault coverage if the scan
chains receive the low toggling patterns rather than conventional pseudorandom
vectors. Even if this is a circuit-specific feature, it nevertheless appears to be the case
across several designs.
The objective of the analysis summarized in Table 2.4 was to determine the impact of the low power test generator performance on a pattern count. Alternatively,
we would like to assess how long it takes to match fault coverage of purely pseudorandom test patterns (shown in the middle column of Table 2.2) with vectors produced by the PRESTO generator. Let L(p) and R(p) denote fault coverage obtained
by applying p low toggling and purely random test patterns, respectively. Clearly,
there are two possible scenarios: either L(p) < R(p) or L(p) > R(p). In the first case,
we can assess a pseudorandom test length q to get fault coverage L(p), where q < p.
The other case is symmetrical: we need to find the number of low power test patterns r that suffice to match fault coverage R(p), where r < p. The entries of Table
2.4, corresponding directly to those of Table 2.3, are ratios v which (depending on
one of the above scenarios) are either equal to p/q or r/p. Clearly, v < 1 indicates
cases where a low power test is shorter than its random counterpart. If v > 1, then
the presented values are indicative of how many additional low power test patterns
19
must be applied to obtain R(p). Table 2.4 two horizontal segments present results
for two values of p: 16k and 128k. As an example, the entry 3.01 for design D6, 16k
vectors, and WTM = 10% indicates that the resultant fault coverage due to 16k low
toggling test patterns can be reached three times faster by using pseudorandom
tests. On the other hand, the entry 0.52 for design D2 and otherwise similar conditions indicates that low power tests can offer the same fault coverage as that of 16k
Table 2.3: Fault coverage – 128k low toggling test patterns.
WTM ≈ 5%
WTM ≈ 10%
WTM ≈ 15%
S:0001
S:0010
S:0100
S:1000
S:0001
S:0010
S:0100
S:1000
S:0001
S:0010
S:0100
S:1000
H:4 T:1
H:6 T:3
H:5 T:3
H:6 T:5
H:4 T:2
H:6 T:4
H:4 T:4
H:2 T:3
H:3 T:2
H:3 T:3
H:3 T:5
H:1 T:1
D1
88.63
87.45
88.36
87.37
89.72
88.83
89.87
90
90.35
90.41
90.22
89.96
D2
90.05
89.92
90.34
89.74
91.49
91.14
91.66
91.47
91.96
91.97
91.87
91.52
D3
85.62
84.67
85.46
84.5
86.23
85.31
86.29
86.11
86.12
86.17
86.47
85.6
D4
85.97
85.68
86.44
85.63
89.91
88.35
90.3
89.82
91.84
92.07
91.41
89.35
D5
85.57
85.51
86.34
86.32
87.96
87.46
88.88
88.7
89.43
89.85
89.63
88.21
D6
89.95
90
89.99
89.98
90.98
90.71
91.2
91.08
91.62
91.68
91.76
90.98
D7
81.91
81.4
82.04
81.65
84.07
83.33
84.55
84.55
85.24
85.57
85.39
84.07
D8
83.38
82.56
83.09
82.54
84.39
83.58
84.49
84.43
84.89
85.04
84.91
84.48
Table 2.4: Test pattetrn count vs random vectors.
WTM ≈ 5%
WTM ≈ 10%
WTM ≈ 15%
S:0001
S:0010
S:0100
S:1000
S:0001
S:0010
S:0100
S:1000
S:0001
S:0010
S:0100
S:1000
H:4 T:1
H:6 T:3
H:5 T:3
H:6 T:5
H:4 T:2
H:6 T:4
H:4 T:4
H:2 T:3
H:3 T:2
H:3 T:3
H:3 T:5
H:1 T:1
After 16k test patterns
D1
13.16
27.78
17.86
27.78
6.41
15.63
6.41
4.81
2.23
2.17
4.1
3.97
D2
1.08
1.59
1.34
2.29
0.46
0.85
0.58
0.52
0.29
0.28
0.46
0.57
D3
1.45
2.4
1.48
2.78
1.04
1.69
0.96
0.94
0.85
0.76
0.83
1.01
D4
5.81
8.33
7.35
8.93
3.33
5.43
3.33
3.73
2.12
2.02
2.48
3.33
D5
13.16
17.86
13.16
14.71
6.41
9.62
4.63
4.9
3.62
2.84
3.05
5.1
D6
4.72
6.25
5.81
6.58
2.94
4.31
2.98
3.01
2.23
1.91
1.79
2.63
D7
6.58
8.62
7.35
8.93
3.42
5.21
3.25
3.25
1.84
1.77
1.95
2.81
D8
2.23
3.38
2.81
4.03
1.58
2.55
1.66
1.53
1.14
1.12
1.28
1.32
After 128k test patterns
D1
25.97
51.28
32.26
54.05
9.13
22.22
7.35
5.88
2.24
1.98
3.44
6.49
D2
2.28
2.44
1.81
2.79
0.59
0.96
0.47
0.55
0.22
0.18
0.26
0.51
D3
0.9
1.47
0.95
1.62
0.65
1.04
0.64
0.69
0.67
0.63
0.57
0.86
D4
10.87
11.49
9.8
11.63
4.27
6.37
3.75
4.41
2.1
1.86
2.53
4.96
D5
24.39
28.57
19.61
19.61
9.01
12.2
5.45
5.83
4.03
2.75
3.34
7.87
D6
6.39
6.35
6.39
6.39
3.19
3.8
2.59
2.84
1.91
1.8
1.66
3.19
D7
11.05
13.51
10.47
12.35
3.59
5.54
2.69
2.69
1.64
1.25
1.45
3.59
D8
2.42
3.95
2.91
4.00
1.12
2.11
1.02
1.08
0.77
0.71
0.78
1.03
20
random patterns in approximately half shorter test time. One may also observe that
for some test cases the ratio v is quite large. It occurs for aggressively low toggling
rates (such as 5% in the table) and in some designs where certain groups of faults
are much more difficult to detect by means of test patterns with relatively low diversity of binary sequences.
2.2.4
Automatic selection of control values
As proven by the experimental results demonstrated in the previous section,
PRESTO offers a wide range of control over the toggling rates of the generated test
patterns. The actual parameters required to arrive with the preselected switching
activity depend on the scan chain configuration and the size of the generator. Moreover, there may exist multiple configurations that result in a similar switching activity. In order to keep the application of the proposed scheme convenient for a wide
range of designs, an automatic selection of all control values is necessary. As shown
in the previous sections, performance of the PRESTO generator depends primarily
on the following three factors (note that in the BIST mode they are delivered only
once, at the very beginning of the entire test session): the switching code, the hold
duty cycle (HC) and the toggle duty cycle (TC).
Given the size of PRPG, the number of scan chains and the corresponding phase
shifter, the switching code as well as HC and TC values can be selected automatically
in such a way that the entire generator will produce pseudorandom test patterns
having a desired level of toggling T provided the scan chains are balanced. The procedure of selecting these parameters consists of the following steps.
1. For each switching code Sk, k = 1, …, 15, determine the corresponding probability Pk of injecting a 1 into the shift register (Table 2.1).
2. As can be seen in Figure 2.2, the values Pk obtained in step 1 determine as
well the probability of asserting the T flip-flop input for each hold (toggle)
code Sk, and then the corresponding duration hk (tk) of the hold (toggle) duty
cycle. Clearly, hk = tk = 1 / Pk.
3. Given the size n of PRPG, determine, for each switching code Sk, the average
number nk of 1s occurring in the control register. As can be easily verified, nk
= Pk × n.
4. For each value of nk (the number of enabled hold latches), find the average
number ak of active scan chains, i.e., scan chains that are not in the LP mode.
This number is deter-mined by the phase shifter architecture, and it also depends on the actual locations of 1s in the control register. Therefore, 1,000 nbit random combinations having exactly nk 1s are generated to obtain the
number of active scan chains in each case, and finally the number ak of active
scan chains is averaged over all 1,000 samples.
21
5.
Given a desired level of toggling T (%), one can determine the resultant (hypothetical) number A of active scan chains from the following equation:
A = (T × C) / 50,
6.
7.
(3)
where C is the total number of scan chains. The above proportion assumes
that if all C scan chains are active, then the resultant toggling is about 50%.
For each switching code Sk, and thus the resulting number ak of active scan
chains, determine how many additional scan chains should be disabled. In
each case, this quantity is given by dk = ak – A. If dk ≤ 0, then disregard the next
steps, as the switching code k does not guarantee even the smallest (required) number of active scan chains.
Since disabling extra scan chains cannot be implemented through the control
register, this action is carried out by equivalent disabling – with help of hold
duty cycles – of selected cells belonging to active scan chains. The value of dk
is therefore converted into the number of corresponding cells in active scan
chains. This approach is depicted in Figure 2.5, where a rectangular scan architecture is considered with S scan chains of length L. The grey area in the
top figure represents the dk scan chains that would need to be disabled,
whereas the equivalent number of scan cells in hold phases that cover the
same area are shown in the bottom figure. Clearly, we have:
dk × L = (ak – A ) × L = ak × hk × v
(hk + tk) v = L,
(4)
where v accounts for the number of hold (toggle) duty cycles. From the above
formulas we get that
r = hk / tk = (ak / A) – 1.
(5)
8.
9.
Ratio r is now evaluated for each value of hk and tk (in total 15 × 15 = 225
combinations) to find the best matching between the actual value of r and the
theoretical value of the expression (ak / A) – 1.
Values of switching, hold and toggle codes that yield ratio r with the smallest
deviation from the theoretical value are selected as the PRESTO setup parameters.
Results presented in the next section demonstrate that despite certain simplifications (in particular, it is assumed that hold and toggle duty cycles are of the same
constant size; further-more, certain cell locations, typically used in computing
weighted transition metrics, are not taken into account when looking for active scan
cells to be disabled), the above procedure yields controls that allow producing pseudorandom patterns with switching activities tracking very closely the desired levels
of toggling.
22
Figure 2.5: Computing hold and toggle duty cycles.
2.2.5
Validating experiments
The approach presented in the previous section was validated by experiments run
on five different scan architectures (number of scan chains  scan chains length):
203  300,
122  104,
84  416,
128  353,
160  54,
used in five industrial designs, and with a 33-bit ring generator implementing a primitive polynomial x33 + x25 + x16 + x8 + 1 and feeding 33-input phase shifter for 10,000
pseudorandom test patterns. The average toggling rates measured by means of the
weighted transition metric (WTM) are plotted in Figure 2.6 against successive values
of a desired switching activity (the requested toggling rate). The standard deviation
is used to assess a possible dispersion from the average toggling values. Clearly, the
lower the values of the standard deviation, the smaller the spread of toggling activity
with respect to the desired level of switching activity. The plot of Figure 2.6 consists
of four different curves. The central red line represents the average value of the toggling ratio computed over all examined designs and all test patterns for successive
values of the desired (user-selected) toggling rate varying from 1% to 45% in steps
of 1%. Two black lines correspond to standard deviation bounding the average value
curve from the top and the bottom. The last (blue) curve represents maximal values
(averaged over maximal values obtained for all examined designs) recorded for each
toggling rate. As can be seen, the resultant switching activity follows closely, with
small values of standard deviation, the requested rates.
23
Figure 2.6: Toggling (WTM) for five designs and 33-bit PRPG.
Figure 2.7: Toggling (WTM) for five designs with 32- and 33-bit PRPGs.
Figure 2.7 gathers experimental results similar to those of Figure 2.6 but obtained
in a slightly different way. Before plotting the actual values of toggling rates and the
remaining statistics, experiments for every single toggling rate were performed for
32- and 33-bit PRPGs (the 32-bit ring generator uses a primitive polynomial x32 +
x25 + x15 + x7 + 1). Note that phase shifters are separately synthesized in each case.
24
The resultant toggling rates were compared, and switching activity with a smaller
absolute dispersion from the expected value was chosen as the final result. It appears that in certain cases it is preferable to pick a 32-bit PRPG rather than a 33-bit
one, or vice versa. This strategy yields virtually a straight line with respect to toggling rates, as shown in Figure 2.7, hence offering accurate mapping between the
user-selected values of switching activity and the actual circuit response. One can
also observe reduced maximal values and smaller standard deviations in this case.
Table 2.5: Fault coverage – 128k low toggling test patterns.
Requested WTM
5%
10%
15%
20%
25%
D1
83.13
84.08
84.29
84.58
84.74
D2
89.13
89.76
90.03
90.10
90.14
D3
85.55
86.21
86.07
86.16
86.52
D4
86.37
88.50
90.20
92.37
92.63
D5
85.61
87.41
88.16
89.64
89.45
D6
89.68
90.97
91.26
91.73
92.07
D7
81.78
83.73
84.56
85.59
85.80
D8
83.53
84.27
84.47
85.25
85.12
The objective of the second group of experiments was to evaluate tests produced
by a 32-bit PRESTO and determine their fault coverage for various requested toggling
rates. The actual results are presented in Table 2.5. Switching activity codes as well
as parameters H and T were selected automatically, according to the procedure
shown in Section 2.2.4. The columns of Table 2.5 list the fault coverage for successive
test cases. Again (similarly to the manually selected settings – Section 2.2.3), the resultant fault coverage remains close to the reference coverage reported in Table 2.2,
while the switching activity is reduced to the desired levels of toggling.
The detailed fault coverage results for one of industrial designs (D5) used for the
following experiments in the thesis are presented in Figure 2.8a. Similar outcomes
for a design with introduced test points and X-bounding logic (later referred to as
BIST-ready design) are shown in Figure 2.8b. The curves correspond to (requested)
toggling rates from 5% to 25% in steps of 5%. In each test case, an additional red
curve reports a reference fault coverage obtained by applying purely pseudorandom
test patterns with the effective toggling rates around 50%. As can be seen, performance of the PRESTO generator remains highly predictable. In particular, with the
increasing switching activity single stuck-at fault coverage increases as well. In fact,
in some designs (Figure 2.8b) fault coverage of certain LP tests can be higher than
that of conventional pseudorandom patterns. Typically, however, one may observe a
gap between PRESTO-produced tests and their random counterparts. Fortunately,
25
PRESTO has the ability to reduce this gap by a proper selection of the control register
content as will be demonstrated in the following section.
a)
b)
Figure 2.8: Fault coverage for different toggling rates a) typical designs b) BISTready designs.
2.3
Improving fault coverage gradient
The results presented in the previous section show that the test coverage tends to
decrease as the power limit is tightened, which is an expected behavior. However,
many applications, including safety-critical and high reliability devices, require
higher quality test at a cost of additional test data stored on-chip. On the other hand,
the same data can be used to arrive with satisfactory test coverage in shorter time.
A quest to achieve higher BIST fault coverage with shorter test application time generated an immense amount of research in the past. In BIST-ready designs, it is a
common practice to employ test points which are inserted into the circuit under test
to increase observability and controllability of critical areas of the circuit so that
faults in these areas can be detected by pseudorandom test patterns. Another approaches, that do not involve changes in the CUT address the test generators. Typically, LFSR-based pseudorandom test sequences were modified either by placing a
mapping logic between the PRPG outputs and inputs of a circuit under test or by
adjusting the probabilities of outputting 0s and 1s so that the resultant vectors capture characteristics of test patterns for hard-to-detect faults [112]. In [16] the pattern mapping circuit is separately synthesized for every output of a generalized
LFSR based on a minimal-overhead mapping between the PRPG patterns and target
patterns. The technique of [102] incrementally adds the cube mappings to transform
the ineffective patterns into patterns that detect new faults, until the test length and
fault coverage goals are satisfied. Both techniques guarantee that the faults targeted
by the introduced mappings are detected. On the other hand, the weighted random
patterns increase the probability of detecting the random pattern resistant faults.
26
The weights can be derived from a number of test cubes that contain only the specified bits instrumental in detecting the associated faults, as shown in [81]. A statistical analysis of such test cubes is used to guide the insertion of weighting logic between scan cells. The 3-weight pseudorandom test generation scheme of [86] assigns every LFSR output a programmable weight of 0, 0.5 or 1. The concept of [85]
derives correlation-based input biasing probabilities from a deterministic test set.
In order to reduce the memory requirements for storing the weight sets, additional
look-up table can be employed [50]. Along the same lines, we will demonstrate that
PRESTO-produced LP test patterns, supported by the deterministic data, are also
capable of visibly improving a fault-coverage-to-pattern-count ratio.
2.3.1
Guiding PRESTO with on-chip deterministic data
Assuming that the toggle control register can also be driven by deterministic test
data (by means of additional multiplexer in the front of a shift register), test patterns
can be produced with better-than-average fault coverage. The proposed method begins by computing the PRESTO parameters, as described in Section 2.2.4. Subsequently, ATPG is repeatedly invoked until either a desired PRESTO pattern count or
a fault coverage limit is reached. ATPG produces test cubes in one per fault fashion.
The number of generated test cubes is limited (in each iteration) for performance
reasons. As confirmed by many additional experiments, increasing the limit beyond
a certain point has a negligible impact on test quality. The obtained test cubes are
now deployed to arrive with the content of the control register, as described by the
following pseudocode:
while configuration limit is not reached do:
run ATPG to fill the test cube pool
for every test cube find its base
remove the bases that exceed the limit of active phase shifter inputs
assign every base its weight w
move maximum-weight base to an initially empty set C
while active phase shifter inputs < limit and set of bases in not empty do
for each remaining base b:
Assign cost bc the number of active phase shifter inputs that cover {C ∪ b}
Add the minimum-cost (maximum-weight when costs are equal) base to C
use the set of active phase shifter inputs that cover C to set PRESTO toggle control register
fault-simulate p PRESTO-generated test patterns
remove the simulated faults from f
Given the PRESTO switching code, our goal is now to find the corresponding distribution of 1s in the control register that maximizes the fault detection probability.
The procedure starts by reducing each ATPG-produced test cube to a set of scan
chains containing more than one specified bit. This set will be further referred to as
a base. For example, let a test cube feature the following specified scan cells: {(s, c):
(4, 13), (4, 2), (13, 34), (13, 31), (45, 11)}, where s is a scan chain, and c is a cell
27
location within the scan chain. The base is thus given by {4, 13}; note that chain 45
is not included as it features only one specified scan cell. A good chance (50%) of
producing a given logic value in a purely pseudorandom fashion is a rationale behind
excluding from any base scan chains hosting a single specified bit. As a result, more
bases can be subsequently combined together to produce a single control setting.
Given the phase shifter architecture, one can determine, for each base, the minimal number of phase shifter inputs – or equivalently the number of 1s in the toggle
control register – required to activate the specified scan chains. These inputs are
obtained by solving the minimum hitting set problem, where we find, in a greedy
fashion, the minimal set of phase shifter inputs that intersects all subsets of phase
shifter inputs capable of activating specified scan chains of a given base. Recall that
the number of such inputs (and thus the number of 1s in the control register) is further constrained by the preselected switching code. For example, the switching code
0100 sets the limit on the number of 1s in the 32-bit control register to 8. Hence, if
a base exceeds the limit, it is excluded from subsequent steps of the procedure. Finally, each base is assigned weight w which is simply the number of specified bits in
the corresponding test cube. It is worth noting that a reciprocal of w can be regarded
as the likelihood of yielding the test pattern by a generator of purely pseudorandom
vectors.
Let C be an initially empty set of bases. Once all weights are determined, we add
a maximum-weight base to C. Next, every remaining base B is assigned a cost value
which is equal to the smallest number of 1s in the control register that would be
required to activate all scan chains in {C ∪ B}. A minimum-cost base (or a maximumweight base if there are two or more bases with the same minimal cost) is then
added to C, and costs associated with the remaining bases are recomputed accordingly. The procedure continues until either the limit of 1s in the control register is
reached or all bases are already in C. The control register content that activates all
scan chains from C is then provided to PRESTO
For each control register setting, PRESTO is run to produce a certain number of
pseudorandom test patterns. These patterns are subsequently fault-simulated, and
detected faults are dropped from the list. The objective of the following experiments
is to assess effectiveness of the scheme, i.e., to measure the degree of test time reduction that one can achieve when using a pre-computed deterministic content of
the control register as compared to application of pseudorandom patterns with otherwise similar power constraints. The experimental results are presented for industrial designs D1 – D6 whose characteristics are given in Table 2.2.
28
2.3.2
Experimental results
All experiments were conducted using a 32-bit PRESTO generator producing 1k test
patterns for each of 128 predetermined control register settings. Hence, the total
amount of control data is limited to 32 × 128 = 4,096 bits for 128k patterns. The
number of test cubes generated in each iteration was set to 1,000 resulting in typically 3 different control register settings per iteration (see Section 2.3.1). Moreover,
in order to minimize the average number of specified bits occurring in test cubes,
ATPG used a SCOAP-based decision order [39].
Figure 2.9: Pattern count savings for 5%, 10%, and 15% WTM.
The experimental results for 5%, 10%, and 15% toggle rates represented by the
weighted transition metric are shown in Figure 2.9. The presented curves correspond to the designs of Table 2.2 as follows. For BIST-ready designs D1 and D2, their
individual curves are depicted, while (in addition to their individual curves) a bold
red line is averaging results over test cases D3, D4, D5, and D6. Given a number t of
LP pseudorandom PRESTO-generated test patterns (and hence the corresponding
fault coverage C not shown in the figure), a single entry in these plots demonstrates
a difference (or equivalently a gain) t – g, where g is the number of test patterns
applied by a deterministically controlled PRESTO to arrive at the fault coverage C.
29
For example, consider circuit D2 and its gain curve. As can be seen, one needs
roughly 70k fewer vectors to reach the same fault coverage as that of 100k PRESTOproduced pseudorandom test patterns with the same switching activity. Clearly, the
test application time is reduced in this case by more than half. In the large majority
of test cases, the deterministic control data allowed reduction of the number of test
patterns, and thus test application time, in a similar fashion. In particular, BISTready designs with a moderate number of scan chains witness considerably steep
gain curves. Also little improvement in test time reduction for a few non-BIST-ready
circuits was noticed. It appears that these designs feature a large number of scan
chains driven by a relatively small phase shifter.
Figure 2.10: Fault coverage for two BIST-ready designs.
Figure 2.10 plots fault coverage results obtained for two BIST-ready designs D1
and D2 while choosing different toggling rates and sweeping the number of applied
test patterns. As can be seen, in all examined cases fault coverage of test patterns
generated by a deterministically controlled PRESTO (solid lines) is visibly improved
over the baseline results (dashed lines) obtained for PRESTO-produced pseudorandom patterns with a similar switching activity. The improvement in fault coverage
occurs systematically across all toggling rates, and the deterministically controlled
PRESTO outperforms its conventional counterpart for virtually all examined test durations.
2.4
Hybrid BIST
The scheme described in the previous section allows one to reduce test application
time or increase the fault coverage by embedding a small amount of additional data
to support the random pattern generator. However, for manufacturing test, where
test equipment with large memory is available, fully deterministic data can be supplied to guarantee very high fault coverage. Interestingly, nowadays logic BIST is indeed gaining acceptance for production test as it and is used increasingly often with
30
test compression to provide very robust DFT. This hybrid approach seems to be the
next logical evolutionary step in DFT. It has potential for improved test quality; it may
augment the abilities to run at-speed power aware tests, and it can reduce the cost of
manufacturing test while preserving all LBIST and scan compression advantages.
Attempts to overcome the bottleneck of test data bandwidth between the tester
and the chip have made the concept of combining logic BIST and test data compression a vital research and development area. In particular, several hybrid schemes
support the existing BIST hardware with the compressed deterministic data. For example, a multiplexer placed at the input of every scan chain can be used to determine
whether the scan data comes from a tester or PRPG. Partitioning the scan chains into
groups, and selecting whether a specific group is driven with pseudo-random or deterministic data can lead to overall reduction of test data volume [24]. A partially
rotational scan proposed in [47] partitions every scan chain into a number of blocks
that can be configured either as a regular shift register or a circular buffer. The test
data targeting the random-pattern resistant faults is supplied through the scan input
every time the contents of each block is fully rotated. A conventional STUMPS architecture with an additional external input connected to the feedback loop of the LFSR
through an XOR gate can be used guide the generator with the deterministic data
[62]. The RESPIN architecture [25] augmented with constrained ATPG [26] reuses
the internal cores of a design for testing purposes. In particular, the scan chains in
one of the cores (embedded tester core) are used to decompress the data provided
by a narrow interfaces from a tester.
If BIST logic is used to deliver compressed test data, then underlying encoding
schemes typically take advantage of low fill rates, as originally proposed in LFSR
coding [24]. The technique subsequently evolved into many forms of static LFSR reseeding including multiple-polynomial LFSR reseeding [44], reseeding with additional seed compression [63], [66], X-tolerant DBIST [110], [111], reseeding combined with dictionary-based methods or test embedding [41], [42], and finally, the
dynamic LFSR reseeding [58], [61], [87]. Thorough surveys of relevant test compression techniques can be found, for example, in [53] and [100]. When test application
is based on mixed-mode approach where first a number of random patterns followed by the deterministic test is applied, the balance between the two approaches
can be found as described in [52]. As shown in the following section, the straightforward modifications of the proposed scheme presented in Section 2.2.2 can enable
the continuous decompression flow, thus providing a complete, low-power DFT solution for manufacturing test.
2.4.1
PRESTO-based low power decompressor
In order to facilitate test data decompression while preserving the original PRESTO
functionality, the circuitry of Figure 2.2 has to be re-architected. This is illustrated
31
in Figure 2.11. The core principle of the decompressor is to disable both weighted
logic blocks (V and H) and to deploy deterministic control data instead. In particular,
the content of the toggle control register can now be selected in a deterministic manner due to a multiplexer placed in front of the shift register, the same way as the
deterministic data are supplied in the scheme from Section 2.3. Furthermore, the
Toggle and Hold registers are employed to alternately preset a 4-bit binary down
counter, and thus to determine durations of the hold and toggle phases. When this
circuit reaches the value of zero, it causes a dedicated signal to go high in order to
toggle the T flip-flop. The same signal allows the counter to have the input data kept
in the Toggle or Hold register entered as the next state.
Both the down counter and the T flip-flop need to be initialized every test pattern.
The initial value of the T flip-flop decides whether the decompressor will begin to
operate either in the toggle or in the hold mode, while the initial value of the counter,
further referred to as an offset, determines that mode’s duration. As can be seen,
Figure 2.11: LP decompressor – modules in gray are disabled. The red items
have been added.
32
functionality of the T flip-flops remains the same as that of the LP PRPG (see Section
2.2.2) but two cases. First of all, the encoding procedure (Section 2.4.2) may completely disable the hold phase (when all hold latches are blocked) by loading the
Hold register with an appropriate code, for example, 0000. If detected (No Hold signal in the figure), it overrides the output of the T flip-flop by using an additional OR
gate, as shown in Fig. 8. As a result, the entire test pattern is going to be encoded
within the toggle mode exclusively. Moreover, all hold latches have to be initialized
properly. Hence, a control signal First cycle produced at the end of the ring generator initialization phase reloads all latches with the current content of this part of the
decompressor.
Finally, external ATE channels (feeding the original PRPG) allow one to implement a continuous flow test data decompression paradigm such as the dynamic
LFSR reseeding. Given the size of PRPG, the number of scan chains and the corresponding phase shifter, the switching code, the offset, as well as the values kept in
the Toggle and Hold registers, the entire de-compressor will produce deterministic
(decompressed) test patterns having a desired level of toggling provided the scan
chains are balanced. The corresponding encoding procedure, including an appropriate selection of the aforementioned parameters, consists of steps described in the
next section.
2.4.2
Encoding algorithm
The decompressor architecture presented in Figure 2.11 is tightly coupled with the
compression procedure. It partitions a given test pattern into several blocks corresponding alternately to hold and toggle periods. Recall that in the hold mode all
phase shifter inputs are frozen due to disabled hold latches, whereas the toggle
mode allows certain inputs of the phase shifter to receive data from the ring generator provided the corresponding bits of the toggle control register are asserted.
Since this register is updated once per pattern, scan chains driven only by disabled
hold latches are loaded with constant values, and thus remain in the LP mode for the
entire pattern. The remaining chains receive either constant values (in the Hold
mode) or results of XOR-ing certain outputs of PRPG among which at least one is
enabled (during the Toggle mode). The actual toggle rate (TR) percentage, measured
as a weighted transition metric, is given by:
𝑇𝑅 = 50
𝑛
𝑇
(
),
𝑆 𝑇+𝐻
(6)
where n is the number of scan chains driven by at least one enabled phase shifter
input, S is the total number of scan chains, T and H correspond to the durations of
toggle and hold periods, respectively. It is also assumed that switching at the level
of 50% corresponds to an LP mode turned off. The values of T and H, the offset cycles, as well as the content of the toggle control register form LP templates (LPTs).
33
They are determined prior to further encoding steps based on the analysis of test
cubes forming a cube pool. As a result, they allow merging and encoding successive
test cubes in an incremental fashion, with no repetitions in a flow, as explained in
the following.
…
x x x x 0 x x x x1 x x x 0 x x 0 x x x x x 0 x x x x x x 1 x x 1 x x x x x 0 x x x x
x x x x x x 0 x x0 x x x 1 x x 0 x x 1 1 1 1 x x x x 0 x x x x x 0 x x x x x 1 x x
Figure 2.12: Transitions (arrows) in a test cube.
First, c test cubes from the cube pool are used to initialize c LPTs. We begin by
mapping the test cubes into lists of transitions. Each transition is determined by two
successive specified bits of the opposite logic values located in the same scan chain.
In addition to its flanking bits x and y, each transition is characterized by a span, i.e.,
the number of clock cycles separating x from y. It is worth noting that some specified
bits contribute to two transitions, whereas other bits are not involved in forming
any transitions, as shown in Figure 2.12. Having instantiated a given empty template, the corresponding list of transitions is used to arrive with the initial durations
1
2
T
3
T
too long !
4
T
T
H too long !
5
T
H
T
H
T
H
T
H
T
H
T
H
T
H
T
H
6
O
H
7
O
H
T
H
T
H
T
H
Figure 2.13: Steps to determine H, T, and O.
34
T
H
of the toggle (T), hold (H), and offset (O) periods. These values are chosen conservatively so that the ratio T/H is minimal, and there are no transitions within a single
hold period. The former condition ensures that the template can still accommodate
some of newly produced test cubes. The latter condition can be rephrased as follows: for each transition either its span is greater than H or at least one of its flanking
bits lies within a toggle period. The actual algorithm to yield the desired values of T,
H, and O can be summarized as follows (see Figure 2.13):
1. Given a test cube and its transitions, find the earliest transition ending point e (a
black triangle in the figure) and assign a single bit toggle phase (T = 1) to cycle e.
2. Mark all transitions crossing e, as they will not end up within a single hold period.
3. Increase the toggle period by extending it up to the next unmarked transition
starting point. Repeat this step as long as the duration of the toggle period does
not exceed a certain threshold (in this thesis – 10 cycles).
4. Find the next unmarked transition ending point e’ – it determines a duration H of
the hold period unless H is larger than a certain threshold. In the former case go
to step 6, otherwise invoke step 5.
5. Find the value of H that minimizes the ratio T/H and, by adding new hold and
toggle phases, keeps the cycle e’ within a toggle period.
6. Set the offset period O to e mod (T + H) – H, if we begin with a partial toggle period,
and O = e mod (T + H), otherwise.
7. Adjust the values of H, T, and O if some of the remaining unmarked transitions lie
entirely within a single hold period (Fig. 10 illustrates this phenomenon for a
newly added red transition that must not stay within the hold period). Ensure
that the sum T + H remains unchanged. The ratio T/H, on the other hand, may
vary, thus its minimizing can guide this step towards the most optimal solution.
Note that, for example, enlarging the toggle period reduces the length of the hold
period and it may also impact the number of offset cycles.
Once the above procedure completes, one has to make sure that all scan chains
hosting transitions are enabled. This can be achieved as long as there is at least one
enabled phase shifter input that feeds a given scan through an XOR gate within the
phase shifter. Finding the minimal subset of the control register stages needed to
activate the required scan chains is equivalent to solving the minimum hitting set
problem. Furthermore, the switching activity associated with the template is
checked by using formula (1) and compared against the desired toggling ratio τ. If
the resultant toggling is below τ then the test cube can be finally accepted as a part
of the template. Otherwise, the test cube is not compressible given power constraints and is discarded. The template returns to its initial status.
When all templates have been initialized, we attempt to link them with the remaining (new) test cubes. If a template cannot accommodate certain transitions fea-
35
tured by a newly picked test cube, then the durations of toggle, hold, and offset periods can be further adjusted in a similar fashion to that of step 7 of the algorithm
presented above. If the cube fits to the template, and new active scan chains are
known, then we recalculate both the content of the toggle control register and the
toggling rate. Again, if the toggling is above  then the template returns to its previous form while the test cube is passed to the next template. Moreover, if none of the
existing templates can accommodate the cube, it remains in the pool until another
set of templates is generated such that this particular cube can be eventually assigned to its designated LPT.
The compression of test cubes treats the external test data as Boolean variables
used to create linear expressions filling conceptually all scan cells. However, an
equation assigned to a given scan cell depends not only on what is yielded by the
ring generator but also on whether a given phase shifter input is enabled or not. If a
scan chain is disabled, then a single expression, produced during the first shift-in
cycle, represents all of its cells. On the other hand, if a cell belongs to an active scan
chain, then its equation is formed by XOR-ing (1) the corresponding outputs of the
ring generator if they are enabled through the hold latches, and (2) expressions produced during the first shift-in cycle on the disabled ring generator outputs. This expression will be used provided a scan cell is in the toggle mode. If it enters the hold
mode, then its equation is going to be the same as that of the preceding and nearest
cell which is in the toggle mode and belongs to the same chain. Since we only use 3input XOR gates to create a phase shifter, there are 7 different scenarios with at least
one XOR tap enabled. Consequently, prior to any compression actions and to save
the CPU time, we prepare all possible equations for each scan cell, and subsequently
select an appropriate expression when working with a particular low power template.
Having prepared all necessary equations, one can proceed with the test cube encoding. This is carried out in a manner similar to that of the conventional EDT flow.
It is worth noting, however, that participation of a given test cube in a template does
not guarantee its actual merging and compression because of either conflicts on certain specified bits with other test cubes or limited encoding capabilities. Another
notable difference between the presented approach and the traditional EDT scheme
is the way compression aborts are reported. Typically, a test cube is regarded uncompressible if it cannot be encoded when merged as the first component of a test
pattern. Here, the test cube is first employed, with other cubes, to form a template,
which in turn modifies equations. Hence, an abort is reported only if the cube is used
to make up a low power template, is then chosen as the first component of a test
pattern, and its encoding fails. All compressed test cubes are removed from the cube
pool, which is subsequently refilled. The algorithm continues by creating a new set
of templates as long as the pool is not empty.
36
2.4.3
Experimental results
In this section, we experimentally assess performance of the compression-enabled
version of the PRESTO hardware. Experiments are run on industrial designs whose
characteristics are given in Table 2.6. Table 2.7 presents results of experiments conducted with 64-bit decompressors and the desired scan shift-in switching level set to
5, 10, and 15%. Again, the average WTM estimates the resultant switching activity
for scan shift operations, while the average WSA measures toggling in the capture
mode by observing the switching activity at each gate in the circuit. All experiments
are conducted in such a way that the original EDT-based test coverage is always preserved.
Table 2.6: Circuit characteristics.
The longest
chain
EDT inputs
160
541
2
127k
523
256
2
3.6M
297k
104
3,488
10
C4
1.3M
60.5k
203
300
2
C5
1.0M
75k
160
470
2
C6
226k
16.8k
122
138
2
C7
1.1M
110k
861
128
2
Design
Gates
C1
1.4M
86.4k
C2
2.0M
C3
Scan cells Scan chains
As can be seen, in all examined test cases the resultant scan shift-in switching
activity (WTM load) remains very close to the requested one. We have also observed
a similar trend for other switching rates, for which results are not reported in Table
2.7. It is worth noting that reducing the load switching has a positive impact on the
switching activity during capture and unloading of scan chains. The corresponding
two figures of merit are included in the table as “Capture WSA” and “WTM unload”.
It is also worth observing that the proposed solution is the first LP compression
scheme that offers a mechanism to shape the power envelope in such a flexible and
accurate fashion.
The last column reports the ratio Vp/Ve, where Vp in the volume of test data used
to control the proposed scheme and Ve is the corresponding amount of data used by
a regular EDT scheme without the low power capabilities. In addition to the actual
seed variables, Vp comprises bits employed to feed the toggle control register, the
Hold and Toggle registers and the offset. The low power capabilities require on average 3.2, 2, 1.5 times more data than the full-toggle EDT patterns for 5%, 10%, and
15% target WTM, respectively. Compared to other EDT-based low-power solution
proposed in [29] , the proposed scheme increases the test data volume by 1.05 times
37
on average for otherwise similar test coverage and switching activity. At the same
time the proposed technique delivers substantial functionality gains as it is inherently capable of working as a programmable LP PRPG.
Table 2.7: Experimental results.
Design
WTM load [%]
Capture WSA [%]
WTM unload [%]
Data volume vs full
toggle EDT
Target WTM: 5%
2.5
C1
C2
C3
C4
C5
C6
C7
5.16
7.24
5.69
6.41
4.94
5.72
5.63
C1
C2
C3
C4
C5
C6
C7
9.84
11.64
9.48
9.59
8.80
10.07
9.48
C1
C2
C3
C4
C5
C6
C7
15.05
15.55
14.21
14.60
13.61
14.98
14.52
15.16
12.29
25.96
11.22
11.72
10.66
21.73
26.12
6.13
5.97
18.30
16.42
28.69
27.03
Target WTM: 10%
14.33
16.39
26.28
14.25
13.00
14.63
21.28
26.97
6.65
9.60
18.97
21.04
29.51
28.56
Target WTM: 15%
13.66
20.64
26.53
16.97
12.04
18.93
20.76
29.14
6.28
14.07
19.31
25.11
29.77
30.47
3.17
3.48
3.22
3.98
2.41
3.78
2.39
1.59
2.33
1.97
2.51
1.86
2.46
1.50
1.06
1.46
1.25
1.65
1.75
1.80
1.16
Silicon area requirements
The silicon real estate taken up by the proposed test logic amounts to an equivalent
area of 2-input NAND gates, as shown in Table 2.8. It provides the actual area cost
computed with a commercial synthesis tool for three architectures shown Figure
2.1, Figure 2.2, and Figure 2.11 by using 32- and 64-bit ring generators (in the table
denoted as F1-32, F2-64, F11-32, and so on) feeding either n = 100 (the upper part)
or n = 500 (the lower part) scan chains. All components of the test logic were synthesized using a 90nm CMOS standard cell library under 3.5ns timing constraint.
The table reports the resultant silicon area with respect to combinational and non-
38
combinational devices. The total area is then compared with the corresponding area
occupied by a conventional PRPG (typically, the XOR network of a phase shifter consists of n 3-input gates in addition to m flip-flops forming the ring generator – this
reference area is reported in the rows labeled as PRPG). For example, a 64-bit LP
generator of Figure 2.2 is 4.57 times larger than its standard counterpart, whereas
it offers exceptional LP features. Consequently, the numbers of Table 2.8 make the
proposed scheme attractive as far as its silicon cost is concerned.
500 scan
chains
100 scan chains
Table 2.8: Area overhead.
Scheme
Combinational
Non-combinational
Total
Ratio
PRPG-32
621
307
928
1.00
F1-32
3,708
1,335
5,043
5.43
F2-32
3,703
1,567
5,270
5.68
F11-32
3,790
1,661
5,451
5.87
PRPG-64
825
613
1,438
1.00
F1-64
4,189
1,902
6,091
4.24
F2-64
4,246
2,324
6,570
4.57
F11-64
4,333
2,418
6,751
4.69
PRPG-64
2,746
613
3,359
1.00
F1-64
10,854
2,473
13,327
3.97
F2-64
11,096
2,894
13,990
4.16
F11-64
11,177
2,985
14,162
4.22
39
40
Chapter 3 Deterministic BIST
Deterministic BIST
The hybrid solution combining deterministic test and BIST proposed in the previous
chapter can work in one of the three modes – autonomous pseudorandom test generator, pseudorandom test generator supported by the additional control data, and
test data decompressor. The first mode requires no additional test data, apart from
the PRESTO settings, but offers somehow limited fault coverage, common to the
pseudorandom testing. In the second mode, the additional deterministic toggle control register settings can improve the fault coverage and test application time. Finally, the hybrid approach provides the highest test quality in the shortest time.
Compared with the other modes, however, it requires significant amount of test data
that is typically stored on a tester. Overall, the PRESTO scheme is a complete lowpower solution that allows for trading-off test coverage, data volume, test time and
toggling rates in a very flexible manner. There are, however, applications where test
coverage loss in either production or in-field test is unacceptable. Such applications
include the mission critical components in the automotive, avionics, space, military,
healthcare systems, or various automation and robotics systems. For example, the
standard ISO 26262 defining functional safety features for automotive equipment
applicable throughout the lifecycle of all electronic and electrical safety-related systems clearly (in its fifth part related to the product development at the hardware
level) calls for at least 90% single point fault coverage attainable in a very short period of test application time [46]. Furthermore, as RF transceivers are smaller than
ever and consume less power, the semiconductor test industry is also embracing the
opportunity to incorporate wireless communications into on-chip LBIST solutions
that ensure highly reliable device operations throughout its lifespan.
Since LBIST fault coverage can be unacceptably low for a feasible pattern count as
compared to deterministic test sets, early BIST schemes employed weighted random
patterns or mapping logic, as discussed in Section 2.3. However, these approaches
may not meet the requirements either for fault coverage or test application time for
the latest integrated circuits. In order to arrive with a near-perfect test coverage
while testing the contemporary integrated circuits, a certain amount of on-chip deterministic data is usually required. Such an approach will be called deterministic
BIST throughout this thesis. The simplest way of handling the patterns is to store
them on an on-chip read-only memory (ROM). Clearly, the test data volume is often
prohibitive for contemporary devices, even with the use of ROM compression techniques proposed in [1], [23], [27]. The bit-flipping and bit-fixing approaches [35],
[101], [113], alter the output of LFSR by means of additional logic in order to embed
the deterministic test cubes into the pseudorandom sequences. Another approach
41
[67] uses a reconfigurable interconnection network for a similar purpose. There also
exist hybrid BIST schemes (see Section 2.4) that can store the deterministic on-chip
in a compressed form and then apply them using the existing BIST infrastructure.
However, most of these solutions are designed to work with an external tester which
makes them inapplicable for high quality in-field test.
A solution that is worth noting, especially in the context of the scheme proposed
later in this chapter, works with clusters of patterns comprising a deterministic parent central vector and its derivatives produced in a random or a deterministic fashion. This approach was pioneered by the Star-BIST scheme of [103], where parent
ATPG patterns were subject to selective random flipping of its bits. Although effective, the scheme requires complex on-chip test logic whose implementation makes
use of scan order, polarity between the neighboring scan cells, control points inserted between them, and a waveform generator. With these features the scan may
behave like a ROM capable of encoding several deterministic test vectors.
Contrary to the above solution, a new deterministic BIST scheme [76] proposed
in this chapter deploys much simpler hardware with no scan modifications and uses
both the EDT-based compression and the deterministic diffraction of decompressed
test patterns to outperform the state-of-the-art on-chip test compression solutions,
and to offer superior performance in terms of data reduction and flexible tradeoffs
between test coverage and volume of stored test data while maintaining test time
within reasonable limits.
3.1
Motivation - compression-based BIST
The generic architecture of a conventional deterministic BIST scheme is shown in
Figure 3.1. Typically, it operates in two steps. First, a pseudorandom pattern generator (PRPG) is run to produce a predetermined number of random stimuli that essentially detect random (or easy) testable faults. The duration of this phase depends
on how long it takes to reach certain test coverage or when a test coverage curve
enters the diminishing returns area. In order to avoid applying a prohibitively large
number of random test patterns, the second step targets the remaining, random resistant, faults by using ATPG patterns whose compressed forms are stored on chip
in a dedicated seeds memory. It is worth noting that the same hardware – PRPG and
the associated phase shifter – can now be reused to decompress test cubes and to
feed scan chains provided some sequential test compression, such as the LFSR reseeding [58], [61], [87], is deployed. These tests are often reordered to reduce their
quantity and to enable faster test coverage ramp-up.
The scheme of Figure 3.1 can offer tests of good quality with fast test coverage
ramp-up and attractive trade-offs between test application time and test data volume (or alternatively on-chip test logic area). However, in ultra-large-scale designs,
42
these two factors become serious constraints for the deterministic BIST applicability. This is because decreasing the on-chip test data memory may lead to a long test
time due to the number of random stimuli visibly elevated above what would be
considered acceptable. On the other hand, stringent requirements regarding the test
time inevitably result in a sizeable test memory. A new BIST scheme presented in
this chapter avoids these fundamental drawbacks by shifting the operational paradigm from the deterministic BIST to the exclusive use of ATPG test patterns whose
application is paralleled by their deterministic derivatives produced by simple onchip test logic.
Figure 3.1: Example of deterministic BIST architecture.
There are two key findings that the proposed deterministic built-in self-test rests
upon. First of all, as already observed in [103], it appears that some particular clusters of test vectors can detect many random pattern resistant faults. Typically, a
cluster consists of a parent pattern and several children vectors derived from it
through simple transformations. Consider a test pattern T that detects a stuck-at-0
fault at the output of an n-input AND gate (tree-like structures whose functionality
is equivalent to AND or OR gates with a large fan-in are commonly employed in real
and complex designs). This test sets all gate inputs to 1 (for large n, it would be difficult to do it randomly). Now, test patterns that detect stuck-at-1 faults at the same
gate inputs have the Hamming distance of 1 to T. Clearly, T is a good parent pattern
as it allows one to derive n additional tests through single-bit flipping of its successive components.
Another crucial observation is that a single scan chain typically hosts the vast
majority of specified bits of a given test cube. Although this phenomenon is to a
large extent dependent on the scan chain stitching method and the actual scan architecture, it nevertheless appears to be the case across many designs since forming scan chains takes into account the actual design layout as well as clock and
power distribution networks. Figure 3.2 illustrates distribution of specified bits
within test cubes for one of industrial designs used in this chapter. The first bar (1)
indicates the fraction, averaged over all test cubes, of specified bits that end up in
43
the most populated scan chain. The next bar (2) denotes the percentage of specified bits hosted by the second-most populated scan chain, and so on. As can be
seen, on average, a single scan chain hosts 45% of all specified bits of a given test
cube, whereas the second-most popular chain, for a single test cube, provides locations for less than half of this quantity. All presented data were collected for test
cubes containing at least 4 specified bits.
Figure 3.2: Specified bits locations.
3.2
The new deterministic BIST architecture
The observations reported in the previous section lay the foundation for the proposed deterministic BIST scheme. In particular, having a significant fraction of specified bits in a single scan chain allows one to complement all bits of a single time
frame, i.e., a single scan shift cycle, so that only a single specified bit of a given test
cube is most likely to be affected. After merging compatible test cubes into a test
pattern that can be encoded, one can arrive with a selection of shift cycles during
which complementing the content of the scan chains – once per pattern – may produce a group of additional and valuable stimuli. These vectors, due to a slight departure from the original parent sequence, detect many faults that otherwise would
need to be targeted separately (see Section 3.1).
The new BIST scheme architecture is presented in Figure 3.3. As can be seen, an
n-bit ring generator and a phase shifter make up a sequential test data decompressor feeding scan chains. The decompressor receives compressed test data from the
on-chip seeds memory through a number of input injectors. Furthermore, n twoinput XOR gates are placed between the ring generator and the phase shifter. All
XOR gates are controlled by a single complement signal. When it is asserted, the ring
generator outputs are inverted before reaching the phase shifter. Typically, the
phase shifter feeds every scan chain by means of a 3-input XOR gate [87]. Hence,
44
inverting all phase shifter inputs complements all phase shifter outputs. As a result,
setting the complement to 1 while decompressing a test pattern causes all bits of a
given cycle to flip.
Figure 3.3: The proposed BIST architecture.
This is a BIST controller that decides, based on the content of an additional frame
register, when to enable the complement signal. Basically, whenever the content of
the frame register matches the shift counter (the inherent part of any BIST circuitry),
the complement is set to 1 for a single cycle period. Clearly, this is only done if the
resultant test pattern is capable of detecting additional faults. Therefore, parent pattern seeds are assigned individual lists of effective frames (cycles), so that the BIST
controller can apply every parent pattern repeatedly, every time complementing –
once per pattern application – all positions of a single shift cycle. Recall that this is
done in a deterministic manner by using the frame register storing a cycle to be complemented for the current test pattern application. The frame register receives a
new content from the list of effective cycles kept in another on-chip memory.
Consider, for example, a seed s that produces a test pattern p. Let pk denote the
same test pattern p with all scan cells of time frame k complemented (see Figure 3.4
for a 5x5 scan configuration example of a parent pattern and the derived p2 pattern
with column 2 being inverted). If pa, pb, and pc are found to be effective enough (details of this process are presented in Section 3.3), then the seeds memory stores the
original seed s, while the cycles memory contains the binary-coded values of a, b,
and c. The operation of the BIST controller starts with loading scan chains with un-
45
modified test pattern p. This is easily accomplished by disabling the complement signal. Next, the value of a is loaded to the frame register, seed s is decompressed again,
and the resultant test pattern pa is applied to the circuit under test. The same steps
are repeated for b and c. Subsequently, the whole procedure continues for the remaining seeds.
Figure 3.4: Example of a parent and derived pattern.
It is worth noting that the way the test logic memory is architected may impact
test application time. For example, if the seeds and cycles memories are separate
modules, the frame register can be loaded in parallel with initialization of the ring
generator. Also, prior to reading the list of effective frames, the list size must be communicated to the hardware, for example by pre-loading their number or using a sentinel-based approach. The memory requirements for each of the n clusters, expressed in bits can be calculated as follows:
𝑉𝑖 = 𝑆 + 𝐸𝑖 ∗ ⌈lg(𝑁)⌉ + 𝐹,
(7)
where S is the size of a regular EDT seed, Ei is the number of effective slices recorded
for cluster i, and N is the length of the longest scan chain. Finally F is the size of the
#Frames register and is equal to 𝐹 = ⌈lg ( max 𝐸𝑖 )⌉ . The final volume of test data is
1≤𝑖≤𝑛
thus equal to the cumulative size of all clusters:
𝑛
𝑉 = ∑ 𝑉𝑖 .
(8)
𝑖=1
3.3
Implementation flow
3.3.1
Comprehensive search
By storing the most critical seed patterns and the associated complement cycles coordinates only, a considerable amount of chip area can be saved. The reduction in
on-chip test memory size comes at a price of customizing test generation and test
ordering processes. The savings in silicon real estate with respect to the on-chip test
logic, however, are the key factor that allows us to implement the proposed BIST
scheme. In the following, therefore, the basic test flow and its variant are proposed
46
that result in the highest compression ratios, the most aggressive test coverage
ramp-ups, and the most efficient usage of the on-chip memories.
The new approach begins by fault simulating a certain number ϑ of pseudorandom stimuli in order to identify easy to detect faults. It is important to observe that
these random patterns will not be eventually applied during the actual BIST sessions. The remaining random-resistant faults are then subjected to ATPG in order to
come up with a complete set of deterministic and compressible test patterns. Each
of these ATPG-produced stimuli is subsequently used to form a test cluster which,
besides the central pattern, comprises all of its children patterns obtained by systematically complementing the content of scan chains in successive time frames.
Consequently, at this stage, the test set consists of all test clusters, that is, all parent
patterns and all of their derivatives. The test clusters are now fault simulated (with
fault dropping disabled) by using the entire list of random-resistant faults. The primary objective of this step is to determine which faults can be detected by which
test clusters. Let Fc be a list of faults covered by test cluster c. These lists guide a
greedy algorithm that orders the test clusters as follows. First, the largest test cluster c, i.e., the one detecting the largest number of faults, is chosen, and faults occurring on list Fc are removed from lists associated with the remaining test clusters. In
the steps to follow, a similar process repeatedly selects the largest remaining cluster
as long as there are non-empty fault lists and there are test clusters unpicked yet.
Finally, all selected clusters are fault simulated again in accordance with the generated order and with fault dropping enabled. This time children patterns that do not
cover faults from the corresponding list are filtered out. As a result, the test set will
comprise an ordered list of the most efficient test clusters whose contents are refined in such a way that only the most efficient derivatives are kept.
Table 3.1: Test coverage [%].
EDT
D1
D2
D3
D4
98.87
98.67
97.81
92.30
Random
patterns
92.17
81.97
91.01
87.46
New
scheme
96.13
96.79
96.71
85.20
Recall that the above procedure works exclusively with the random-resistant
faults. Typically, however, patterns targeting this type of faults appear to be equally
suitable for the remaining ones as they can detect a significant majority of faults that
have been eliminated in the preliminary phase through the application of pseudorandom stimuli. This is best illustrated in Table 3.1 for four industrial designs
used in this chapter. The second column reports the reference single stuck-at fault
coverage that is obtained by using the EDT-based test compression flow. The next
47
column lists the resultant test coverage after applying 32,000 purely pseudorandom
test patterns. The last column demonstrates coverage yielded at this stage by the
new deterministic BIST scheme with ϑ = 32,000. As can be seen, the difference between the reference point and the actual coverage (these are faults that require additional patterns) is usually marginal. Only design D4 will require more test generation efforts to close a 7% gap comprising the uncovered faults.
For the reasons stated above, the test set is now fault simulated by using the list
of easy-to-detect faults to determine a small fraction of defects that still escape detection at this point. For this group of faults, the procedure presented in the preceding paragraphs is repeated until the complete test coverage is reached. As a result,
no random patterns are needed, and test application time is strictly dependent on
the number of test clusters and the number of individual components they feature.
It is worth noting that the presented algorithm produces a virtually minimal collection of test clusters in a manner that allows one to flexibly trade-off test coverage,
test logic complexity (here represented by the size of the on-chip test memories),
and the resultant test application time.
3.3.2
Iterative approach
Given the complete approach presented above, one can easily observe that the number of test clusters depends linearly on the deterministic pattern count targeting
random-resistant faults. Moreover, the children pattern count for each test cluster
initially matches the size of the longest scan chain (or the number of scan shift cycles). Needless to say, a random pattern resistant fault list may easily contain millions of faults. The implications of these facts become apparent as we begin to consider relevant memory and CPU time requirements. Clearly, with the increasing size
of designs, the presented flow may turn out to be less practical as the memory footprint and time needed to complete the ordering process can be prohibitive for large
circuits. Consequently, to alleviate these problems, a more pragmatic and scalable
solution is proposed below.
Essentially, a revised scheme is an iterative framework of the original algorithm
introduced in Section 3.3.1. It produces and verifies successive parent patterns and
their derivatives iteratively, one at a time. We first create a random pattern resistant
fault list f by fault simulating  random test patterns, and then iterate the following
process. Given list f, we generate a single parent pattern and all of its derivatives.
Typically, this pattern is going to be a result of merging of test cubes produced for
several faults picked from f. This cluster is fault simulated with list f, and all effective
patterns are stored. These patterns are again simulated, this time for all faults, including those recognized as easy to test. Results of this step are subsequently used
to update list f and the list of all faults. Once list f becomes empty, it is reloaded with
leftovers from the main fault list, i.e., with faults that have not been yet detected, and
48
the procedure repeats. Assuming that g is initially a list of all faults, the algorithm
can be summarized as follows:
simulate  random patterns to get list f of hard faults
repeat
while list f is not empty do
generate a parent test pattern
produce all children patterns
fault simulate the current test cluster on list f
save effective patterns
fault simulate effective patterns on list g
update lists f and g by removing detected faults
if g is empty break
reload f with g
sort test clusters
The resulting test clusters can be reordered so as to reflect their impact on the
test coverage. The sorting method is similar to that of presented earlier. However,
only the effective children patterns are fault simulated against the fault list that contains either all faults or a randomly selected sample. Additional experimental results
indicated that fault sampling has a negligible impact on the sorting results. Furthermore, as it will be demonstrated in the following section, the difference between
both presented flows (in terms of results they produce) is negligible.
3.4
Experimental results
The new deterministic BIST scheme was verified by conducting experiments with
large and complex industrial designs that feature on-chip EDT-based test compression. The experimental results of this section are presented for 6 such circuits. They
represent different design styles and scan methodologies. The basic data regarding
the designs, such as the number of gates, number of scan cells, scan architecture, and
the EDT input interface, are listed in Table 3.2. All results were obtained for stuckat test patterns. The last column of Table 3.2 reports the EDT-based test coverage
for each design.
Table 3.2: Circuit characteristics.
Gates
Scan cells
Scan
architecture
EDT size
Input injectors
Test
coverage
1
220k
13k
122 x 138
17
1
98.87%
2
450k
45k
226 x 200
32
4
98.67%
3
1.5M
145k
700 x 207
42
8
97.81%
4
1.2M
85k
427 x 200
32
4
92.30%
5
600k
20k
35 x 686
32
1
91.22%
6
840k
34k
84 x 416
32
1
92.30%
49
Figure 3.5 illustrates one of the advantages of the new BIST scheme – the ability
to significantly reduce the test data volume, here represented by the size of the onchip test memory (in bits) while preserving the attainable test coverage. The three
curves correspond to the conventional EDT-based compression (black), the most
comprehensive flow, as introduced in Section 3.3.1 (red) and, finally, its iterative
version of Section 3.3.2 (blue). Both BIST schemes use set to 32,000. All results
were obtained for design D1, but the described properties were observed through-
Figure 3.5: Test coverage vs. test memory size – different schemes for the
same fault list after 32k random patterns.
Figure 3.6: Test coverage (left axis) vs. test memory size obtained with the
iterative scheme for different fault lists and the corresponding test times (right
axis).
50
out all test cases. As can be seen, the difference between the BIST scenarios represented by the red and blue curves is insignificant. Consequently, the remaining results presented in this section were obtained by running the more practical iterative
version of the BIST scheme, only.
As can also be noticed, the new BIST scheme requires approximately 2.25 times
less test data than EDT to reach the reference 98.87% test coverage (see Table 3.2).
The same relation can also be examined for different sets of random-resistant faults
determined by using different quantities of random patterns. Figure 3.6 shows four
coverage curves corresponding to the following values of ϑ= 0, 128, 5,000, and
32,000, and obtained within the framework of the iterative BIST. The observed
trend clearly indicates that the relation between the test coverage and the test data
remains virtually independent of the number of targeted faults, including the case
where there is no prior fault dropping. The number of targeted faults does impact,
however, the test application time (reported as the number of test patterns), as
shown in the lower part of Figure 3.6 (the corresponding axis of ordinates is shown
on the right hand side of the figure). As can be observed, the less spacious the random-resistant fault list is, the shorter the resultant test time (and smaller memory)
for otherwise the same coverage of all detectable faults one can expect. Clearly, application of purely EDT patterns takes less time than in the case of BIST patterns.
How much of the on-chip test memory is actually needed to achieve a given level
of test coverage is shown in Figure 3.7 for three thresholds: 90%, 95%, and 99%
fractions of the reference EDT-induced test coverage. The presented results were
obtained for design D1. Here, the deterministic BIST (blue bars) with  = 5,000 is
compared against the conventional EDT (grey bars). As can be seen, the proposed
scheme requires visibly less memory than EDT to achieve the same level of test coverage. Similar results for other circuits, organized into a tabular form, are gathered
Figure 3.7: Test memory vs. desired test coverage.
51
in Table 3.3. Note that ϑ = 5,000 random patterns are used in the remaining experiments described in this section. The last column of the table (Comp.) reports the
effective test data compression over EDT. The reported values were averaged over
individual compression ratios in every row of the table.
Table 3.3: Memory requirements [× 103 bits].
D1
90%
EDT BIST
24
12
95%
EDT BIST
51
23
99%
EDT BIST
146
57
Comp.
2.26x
D2
294
64
648
140
1,491
388
4.36x
D3
378
57
1,017
140
2,960
490
6.65x
D4
1,181
105
2,511
228
6,385
645
10.72x
D5
100
63
178
101
367
187
1.77x
D6
44
37
121
79
410
210
1.56x
Given the previous results, of interest is then how the used on-chip test memory
delimits the resultant test coverage. This is illustrated in Figure 3.8 where memory
units are EDT seeds. That is, n seeds represent a data volume (memory bits) which
would be occupied by n EDT-based compressed test patterns in the same design.
Data, collected for design D1, are shown here for 64, 128, and 256 EDT-equivalentseed memory chunks. Clearly, given a pre-specified amount of test data, the new
BIST scheme (blue bars) yields higher test coverage than that of EDT (gray bars).
The accompanying Table 3.4 reports similar results for all designs employed in this
section. The last two bars and the last segment (max) of the table correspond to a
memory space which ensures the maximal attainable test coverage when using the
Figure 3.8: Test coverage vs. test memory.
52
proposed BIST scheme. The column EDT reports then the EDT-based test coverage
achievable with the same test data volume. For example, design D1 gets to the maximal test coverage – 98.87% (with the new scheme) by deploying 107,544 memory
bits. This is equivalent to 636 EDT seeds that yield 97.1% test coverage using the
conventional EDT flow.
Table 3.4: Test coverage [%].
D1
64 seeds
EDT
BIST
82.75 88.43
128 seeds
EDT
BIST
88.45 93.53
256 seeds
EDT
BIST
93.10 96.93
max
EDT
BIST
97.10 98.87
D2
74.77
87.86
81.25
92.49
86.71
95.98
95.34
98.67
D3
80.66
91.91
85.18
94.96
89.16
96.74
93.79
97.81
D4
55.09
78.63
62.9
83.63
69.61
87.66
83.78
92.3
D5
74.80
78.40
81.56
85.99
87.01
90.30
89.73
91.22
D6
81.26
81.16
84.68
85.84
87.50
89.35
91.30
92.11
Finally, Figure 3.9 illustrates how the test coverage rises with the increasing number of test patterns. The results are for design D1. The gray curve represents 32k
random test stimuli, whereas the red one represents the new BIST scheme, and then
the blue one corresponds to the conventional EDT flow. In this test case, the EDT
memory requirements were 2.25 times bigger than those of the proposed scheme.
The comparison of the standard BIST test coverage curve with that of the deterministic BIST clearly demonstrates the superiority of the latter both in terms of the test
time and the actual coverage of faults. In particular, given decent memory requirements of the deterministic BIST, the test time reduction is a spectacular gain, often
by almost an order of magnitude compared to the standard BIST. It is worth noting,
Figure 3.9: Test coverage vs. test patterns.
53
however, that typically it takes longer to apply new BIST patterns compared to EDT
stimuli for otherwise the same test coverage. For example, as shown in Fig. 7, the
new BIST scheme applies 4,665 patterns, whereas its EDT counterpart uses 1,647
stimuli, only. This phenomenon is detailed in Table 3.5, where the numbers of test
patterns that need to be applied by both schemes to get complete test coverage are
reported for all examined designs. The average test time increase over all designs is
equal to 7.92x.
Table 3.5: Test time [patterns].
3.5
D1
D2
D3
D4
D5
D6
EDT
1,647
2,368
2,540
11,968
760
1,853
BIST
4,665
19,491
28,807
37,537
11,271
13,209
Remarks
As shown in this chapter, the new deterministic BIST scheme can offer high quality
in-field or manufacturing test. Its comparison with the schemes presented in Chapter 2 is shown in a web chart of Figure 3.10, where power consumption, test application time, test data volume and test quality are rated from 1 to 5 (the higher the
better). Recall that there are three applications of the PRESTO-based scheme: a
standalone PRPG, a PRPG supported by a minimal amount of deterministic test data,
and a hybrid approach. Clearly, the basic PRESTO architecture (blue line) requires
Power
5
4
3
2
1
Coverage
0
PRESTO
PRESTO + Deterministic Settings
PRESTO Hybrid
Deterministic BIST
Volume
Figure 3.10: Tradeoffs between the proposed schemes.
54
Time
no additional test data but offers limited fault coverage as typically observed when
pseudorandom test patterns are applied. Adding the deterministic toggle control
register settings (orange line) improves fault coverage and test application time.
However, test data volume is increased, hence the lower date volume component
rating, as can be seen in Figure 3.10. In the presence of external tester, the hybrid
approach offers near-perfect fault coverage achievable in only slightly increased test
time (attributed to the low-power capabilities) when compared with EDT. This scenario requires, however, application of a number of deterministic test patterns thus
the volume rating is the lowest among the presented solutions. Finally, the deterministic BIST scheme (green line), due to the increased encoding efficiency, requires
much less test data to arrive with full test coverage. As a result, it is feasible to store
all seeds together with their inversion positions on-chip. The test application time
is increased, however, due to reapplication of the derived test patterns. Clearly,
since PRESTO-based approaches focus on power reduction, they all excel in this aspect. Although the new BIST scheme presented in this chapter is not designed to
reduce power consumption, it can be coupled with ATPG-based low power techniques, as the one presented in [29]. Analysis of such combinations goes beyond the
scope of this thesis.
55
56
Chapter 4 Deterministic test-per-clock scan-based scheme
Deterministic test-per-clock scanbased scheme
The solutions proposed earlier in the thesis address vital aspects of testing the current digital integrated circuits, including test power consumption, the cost of test
and the reliability throughout the expected lifespan. As verified by numerous experiments, the proposed concepts can be successfully applied to modern multi-million
gate designs. However, as noted in Section 1.4, the cutting-edge technology nodes
become increasingly challenging to reliable test. The main contributor to the arising
difficulties are the new types of defects that become more and more prevalent as the
feature sizes shrink. Very often these defects remain undetectable when using the
most common fault models. Consequently, more sophisticated models that consider
the transistor-level of design cells become indispensable to guarantee high quality
test. However, the increased number of targeted fault models directly impacts the
size of test sets, which can shortly become unbearable to the state-of-the-art testing
techniques. In order to prevent the test cost crisis, similar to the pre-compression
one of the early 2000’s, a radical departure needs to be found. An appropriate paradigm shift may involve changes in such fundamental techniques as scan design. In
this chapter a patented [73] deterministic test-per-clock solution is presented that
allows for very efficient application of test patterns with inherent low-power features.
4.1
Introduction
Since its debut in the late 60’s [28], [31], [56], scan has gained wide acceptance as a
structured design for test (DFT) methodology. Virtually all of its variants, including
a partial scan [3], [4], [18], [40], [64], share a key feature – they allow direct access
to memory elements of a circuit under test by forming shift registers in the test
mode. The resultant high controllability and observability of internal nodes made it
possible to (1) automatically generate high quality tests for large industrial designs,
and (2) use verification techniques to debug the first silicon. Scan, supported by
many electronic design automation tools, offers a systematic way to produce testable and reliable semiconductor devices. As shown in the previous chapters, scan is a
key part of many more advanced DFT technologies that aim to reduce test complexity and cost, including BIST and compression. Overall, due to its advantages, it has
become one of the most influential DFT techniques. This paradigm was firmly in
place for the following four decades.
57
Scan-based testing requires noticeable efforts to control power consumption.
There is a significant difference between the scan-induced shift mode and the functional (mission) mode of operation. As it turns out, shifting test stimuli and test responses may lead to excessive power dissipation, not observed in the mission mode.
Moreover, the transition between shift and capture modes even worsens the power
issues. Consequently, several scan-based low power test solutions were proposed
as shown in Section 2.1.
Clearly, scan is a mature and industry-proven technology, well documented in the
literature. Augmented by various techniques, it offers a complete and reliable test
solution. Yet scan has latent abilities that may be developed even further and lead
to spectacular future breakthroughs. This potential can allow scan to apply test patterns within much shorter durations than done today, or alternatively reach higher
defect coverage for otherwise similar test application conditions. Consider, for example, a design with 100,000 scan cells divided into 500 scan chains, each 200 cells
long. Assume the shift and the operating frequencies of 50MHz and 500MHz, respectively. Applying 20,000 double-capture test patterns will require 4,000,000 shift cycles at 50MHz and 40,000 capture cycles at 500MHz. Thus, as low as 1% of cycles,
or just 0.1% of time, is actually spent on testing. If the same design used logic BIST,
then the test time efficiency would be even lower. With 100,000 single-capture test
patterns, 20,000,000 cycles are needed for scan shifting, while only 100,000 cycles
are deployed to capture test responses. Again, given the clock frequencies, 99.95%
of test time is spent on scan shifting. This simple example indicates how large potential there is still to be explored as far as the scan methodology is concerned. This
observation is the driving factor for the approach presented in this chapter. It tries
to overcome some of the issues recalled above and utilize test application time in a
much more efficient manner.
Drawbacks of scan-based testing, as shown above, are mainly related to the fact
that all scan chains are typically filled with a test pattern before it is applied, as done
in the test-per-scan schemes. In alternative test-per-clock approaches, test stimuli
are applied every clock cycle, with no shift-based time overhead. In particular, one
of the first test-per-clock BIST schemes was designed for circuits with several independent combinational blocks. Input and output registers of each block were converted into built-in logic block observers (BILBO) [59] working either as test generators or MISR-based compactors, whenever needed. Another test-per-clock approach – a circular self-test path (CSTP) – was proposed in [60]. It forms a feedback
shift register by serially connecting selected circuit state elements. Contrary to the
scan chain, however, every clock cycle the circular register produces test stimuli and
compacts test responses. At the end of a test session, the resultant content of the
register is shifted out and checked for its consistency. Since on-chip resources are
used to generate test patterns, the test mode resembles the circuit functional operations. However, as all register stages shift and capture as well, it leads to elevated
58
power dissipation, which with tens of thousands of toggling flip-flops is unbearable.
Another concern is the time-consuming fault grading that CSTP requires. This is because fault detection can only be declared once all clock cycles are applied. In fact,
running fault simulation for large designs with no fault dropping becomes infeasible
and is typically replaced with a less accurate statistical analysis. Moreover, the fault
coverage tends to level off in a manner similar to other BIST schemes. Some works
address this issue by applying state skipping logic and using an appropriate selection of the CSTP initial state [21]. A technique similar to CSTP was also proposed in
[98], with additional partial scan chain features to handle deterministic test patterns. Furthermore, the E-BIST architecture [97], by building on the scheme of [8],
converts scan chains into MISRs using additional NOR and XOR gates associated
with all scan cells. Every clock cycle, these modified scan chains provide then test
stimuli and compact test responses.
Unlike BIST-centric solutions, an approach proposed in [51] allows one to apply
deterministic scan patterns as well as weighted random test-per-clock vectors during scan shifting. However, scan chains need to be modified to enable both their parallel load and PRPG capabilities. A tester feeds then the scan chains with the deterministic test patterns through a serial to parallel converter while the scan chains
apply pseudorandom test patterns to the CUT till they receive a complete deterministic vector. Another approach [83] allows deterministic data to be applied in a testper-clock manner while all primary outputs and scan cell outputs are being compacted. Both methods require routing all internal state variables to a compactor
which is basically infeasible for large designs. Moreover, the resulting toggling rates
are not controlled and can become unacceptably high. The technique of [90] can be
placed somewhere between the test-per-clock and the test-per-scan schemes. It
uses a part of the former test response to form a subsequent test pattern. A test-perclock access to a scan chain can also be accomplished by transforming the scan chain
architecture to make scan cells randomly addressable, similarly to storage elements
in random-access memories [5], [6].
To overcome the limitations of both test-per-scan and test-per-clock schemes, a
new technology is needed. The key requirement is to achieve high quality test offered by the conventional scan in much shorter time. This can be fulfilled by applying
test patterns every clock cycle, thus increasing the percentage of the test time used
for actual testing as opposed to the shift activity. As scan-related over-testing and
power dissipation have been of concern for years, the new approach should mimic
the mission mode during test, in particular, by reducing toggling to acceptable levels.
If power levels were kept low enough, the IR drop and di/dt issues associated with
changing from shift to capture mode (no abrupt changes of a scan enable signal)
could be successfully resolved as well. The same new technology should allow simple integration with BIST as well as with deterministic test methods such as on-chip
test data compression.
59
In the remaining part of this chapter a novel scan-based test paradigm – TestExpress [79] - is presented. TestExpress differs from the conventional scan in a fundamental way: it does not shift all scan chains. Nor does it capture test responses in all
scan chains. Instead, it dynamically reconfigures scan cells to work in different operational modes where they act as either mission memory elements, sources of test
stimuli, or test response compactors. In principle, the new scheme behaves in accordance with the test-per-cycle paradigm. Since test patterns are applied every
clock cycle, the scheme is extremely time-efficient and allows one to achieve highquality test for any fault model used today, as well as for any new fault models that
might be employed in the future. Alternatively, given a target defect coverage, it is
possible to reduce test application time drastically, thus decreasing the manufacturing cost. Since the proposed scheme maintains majority of scan cells in the mission
mode, its power consumption remains similar to that of functional operations. Furthermore, as scan chains are seldom reconfigured, the negative impact of the IR drop
and di/dt issues is notably reduced compared to the conventional scan. Finally, the
proposed solution allows testing either in a BIST mode, or in a deterministic manner, or by combining both techniques. We will present a number of methods that are
needed to automate instantiation and application of the TestExpress architecture,
including customized scan stitching, selection of scan chain configurations, and a
discussion of simulation and test generation algorithms.
4.2
TestExpress architecture
The operational paradigm of the generic scan architecture introduced almost 50
years ago is to employ ATE, a test data decompressor driven by ATE, or a PRPG as a
source of test stimuli feeding serial inputs of the scan chains, with the same ATE or
a test response compactor capturing test responses that leave the scan chains
through their serial outputs. Typically, all scan cells are controlled by a single scan
enable signal. Thus, all scan chains are functionally indistinguishable, i.e., they all
either shift data in and out or capture test results. In sharp contrast to this paradigm,
TestExpress partitions dynamically scan cells into three disjoint groups with every
cell executing in one of the following three operational modes (as shown in Figure
4.1): shifting test data in (a stimuli mode, S), compacting test responses (a compaction mode, C), or performing regular functions (a mission mode, M). Clearly, in the
first two cases, scan cells form the actual scan chains.
Scan chains in the stimuli mode (the blue chains in Figure 4.1) receive data from
a stimuli source. They resemble the conventional scan chains in the shift mode with
the asserted scan enable signal. However, the key difference here is that test data is
applied to the CUT every clock cycle. Moreover, scan chains in the stimuli mode do
not capture any test responses, thus they are only responsible for controlling the
internal states of a circuit. These are scan chains in the compaction mode (the yellow
60
chains in Figure 4.1) that capture and compact test responses, thus monitoring the
internal states of a CUT. Working synchronously with the scan chains that deliver
test patterns, the scan chains in the compaction mode accumulate test responses
every clock cycle using XOR gates interspersed between their successive scan cells.
At the same time, a single bit (per chain) of the resultant signature is always shiftedout. It is worth noting that the scan chains in the compaction mode are also meant
to drive the CUT. If needed, this functionality can be disabled, as discussed later in
the chapter. The remaining scan cells (the white ones in Figure 4.1) are kept in the
mission mode, performing the functional operations. Clearly, test results propagating through the combinational part of the circuit can also reach the scan cells in the
mission mode. These responses further circulate within the circuit and eventually
reach the observation scan chains during the subsequent clock cycles.
Figure 4.1: TestExpress architecture.
The TestExpress logic is inserted outside the design core and consists of three
main blocks. An on-chip decompressor or – alternatively – a switching network is
located between the external scan channel inputs and the internal scan chain inputs.
Essentially, both devices reroute test data provided by a few external ATE channels
to designated scan chains. Note that the decompressor feeds effectively scan chains
in both the S and C modes (scan cells in the mission mode do not accept data from
the decompressor), whereas the switching network can supply scan chains only in
the S mode at the cost of additional control data. An on-chip compactor is inserted
between the internal scan chain outputs and the external scan channel outputs. It
receives data from scan chains in both the C and S modes, which allows not only to
61
capture actual faults but also to detect failures within the scan chains. An additional
configuration register controls the mode selection. Each scan cell is assigned a control value that determines its functionality. The content of the configuration register
is typically reloaded during a test session. The circuitry providing subsequent configurations can be implemented depending on various requirements. In particular,
it can be driven by deterministic test data either stored on-chip or provided by an
external device.
As can be observed, TestExpress fulfills all requirements stated in the previous
section. First of all, the per-cycle test application allows one to efficiently use test
time, as the lengthy scan shift-in phase is not needed. Deploying every clock cycle
for the purpose of actual testing allows applying more test patterns at the same time.
As a result, this approach makes it possible to improve fault coverage, to detect various unmodeled faults, and to deploy a multiple-detection framework [34]. Alternatively, one can choose to apply the same number of test patterns as that of conventional scan, yet in much shorter time, thus significantly reducing the test cost without compromising the original test quality.
TestExpress also addresses power issues as the majority of scan cells are sustained in the mission mode, and the operational modes are seldom changed. Indeed,
logic states assumed by the scan cells kept in the functional mode closely resemble
those causing the switching activity to be limited by the margins the circuit is designed for. Note that the configuration signals remain static after a given configuration is established. They can be, therefore, placed and routed with no rigid timing
constrains similar to those of scan enable signals whose distribution and delivery,
especially for the at-speed test purpose, must meet non-flexible timing closure conditions. More importantly, the di/dt problem that occurs when using conventional
at-speed scan tests and is due to the sudden current changes within a few nanoseconds when applying the capture clock burst becomes much less aggravating. Finally,
Figure 4.2: TestExpress scan cell.
62
the inherent low-power capabilities of TestExpress allow applying test patterns at a
higher frequency than in other state-of-the-art schemes. Thus, TestExpress-based
at-speed testing can be easily adapted to further increase fault coverage metrics.
To implement multimode TestExpress scan chains, one can use a scan cell design,
as shown in Figure 4.2. It consists of two 2-input multiplexers that select the data
source for a D-type flip-flop. If the control input M1 is set to 0, then the scan cell
enters the mission mode. Consequently, it captures, every clock cycle, data arriving
from the CUT. It is important to observe that the performance overhead introduced
by the scan cell of Figure 4.2 remains the same as that of the conventional scan chain,
i.e., it adds a single 2-input multiplexer delay to the functional path. The control inputs M1M2 = 10 connect the flip-flop with the previous scan cell or a stimuli source,
if it acts as the first element of a given scan chain. Clearly, the shift register (scan
chain) formed this way is now in the stimuli mode. Finally, the compaction mode is
in place when the control inputs M1M2 are equal to 11. Before reaching the flip-flop,
data from the CUT is XOR-ed with the signal provided by the previous scan cell (or
the stimuli source).
Figure 4.3: Alternative design of TestExpress scan cell.
A scan chain in the compaction mode may generate above-the-average switching
activity due to its XOR gates. To alleviate this problem, the TestExpress scan cell deploys an optional AND gate to prevent the output signal from propagating back to
the CUT. Note that setting the control inputs M1M2 to 11 enables this functionality
unless gating logic is not used. The corresponding operation mode will be further
referred to as a compaction with blocking. Its performance cost remains the same as
that of conventional low power scan-based solutions [45], [84], [115]. It is worth
noting, however, that this is just an auxiliary technique as toggling reduction
achieved by TestExpress is primarily attributed to putting a significant majority of
scan cells into a functional mode.
63
Figure 4.3 illustrates an alternative design of the TestExpress scan cell. Its operation modes are defined in Table 4.1. This solution allows one to trade-off silicon
area and performance cost. As can be seen, the cell of Figure 4.3 requires only two
2-input NAND gates, whereas the cell of Figure 4.2 works with two 2-input multiplexers. However, an XOR gate placed in the front of the flip-flop may slow down a
circuit in the functional mode a bit more than a multiplexer does.
Table 4.1: Operation modes for scan cell of Figure 4.3
Mode
Stimuli
Mission
Compaction with blocking
CE
0
1
1
SE
1
0
1
The scan cell designs of Figure 4.2 or Figure 4.3 are employed to make up the
TestExpress scan chains as shown in Figure 4.4. Bold lines denote the data flow in
each mode. As can be seen, no scan chain is actually formed in the mission mode
(Figure 4.4a) where all flip-flops preserve their original functionality. Most of the
scan cells during test will stay in this mode to reduce the toggle rate and to mimic
functional switching characteristics. In the stimuli mode (Figure 4.4b) the scan cells
form a shift register. Test data are serially fed into the register through the scan serial input, while the register content drives the CUT. Note that after every single shift
cycle there is a test pattern presented to the CUT. Hence, one can stagger ATPG patterns based on this functionality. Only a small fraction of scan chains will be in the
stimuli mode during test. Finally, the scan cells become a finite memory test response evaluator in the compaction mode (Figure 4.4c). Test results are captured
Figure 4.4: Scan configurations. (a) Mission. (b) Stimuli. (c) Compaction.
64
from the CUT, XOR-ed with the current content of the scan chain, and a single signature bit is shifted out every clock cycle. Again, only a small percentage of scan chains
will stay in the compaction mode during test. As mentioned earlier, scan chains may
generate much switching activity in this mode, and, therefore, their parallel outputs
can be gated to reduce toggling. Interestingly, the same mechanism allows blocking
unrestricted fault propagation back to the CUT, thus preventing undesired fault
masking.
One of the pragmatic aspects of introducing a new DFT scheme is its test logic
silicon real estate. As shown in this section, TestExpress requires one XOR gate and
two 2-input multiplexers per scan cell. The TestExpress scan cell of Figure 4.3 is
equivalent to 9 two-input NAND gates, whereas a conventional scan cell is equivalent to 7 NAND gates. Since typically flip-flops use roughly 25% of the silicon space,
the area increase compared to a non-scan design is 10% for a conventional scan and
20% for TestExpress. This estimate may vary depending on the design and the manufacturing technology. Also, a 2s-bit control register, where s is the number of scan
chains, must be deployed together with two control signals routed to every scan cell.
However, the resultant area is comparable to other scan design-based DFT methods.
Indeed, in at-speed testing and especially with power constraints there is typically
some additional hardware required to: (1) activate a high-speed scan enable signal
either globally or locally, (2) moderate di/dt through a scan burst capability, (3) gate
scan cell outputs to reduce power dissipation during shift, and (4) gate scan out signal to reduce the power consumption during normal operations. It is worth noting
that TestExpress does not require a high-speed scan enable signal.
4.3
Scan stitching
The TestExpress approach may only succeed in breaking barriers of the conventional scan if memory elements of the design are properly assigned to successive
scan chains. Deploying a method to automate the scan insertion process is therefore
of crucial importance. Commonly used techniques, typically working with the layout
information to minimize the interconnection complexity [114], may not suffice to
meet the requirements of the new scheme. Since scan chains in the stimuli mode do
not capture test responses, only a clear separation of cells with good controlling and
observing abilities ensures high quality tests.
Consider a circuit shown in Figure 4.5. It consists of two scan chains driving two
gates. To detect a stuck-at-0 fault at line g, one can use a test pattern that assigns
scan cells A and C the value of 1, leaving initially other cells unspecified. Since cells
A and C are hosted by scan chains s1 and s2, respectively, both scan chains must be
set to the stimuli mode. However, to observe the fault in either cell D or E, either
scan chain s2 or s1 must be set to the compaction mode, which clearly contradicts the
earlier step. Consequently, the only viable test is the one that uses cells B = 1 and
65
C = 1 as stimuli and cell E as an observation point. On the other hand, test pattern
ABC = 001 detects exclusively a stuck-at-1 fault at line f. In this case, there is no scan
mode assignment enabling fault observation. Nevertheless, if cell A were part of scan
chain s2, this fault would be detected.
Figure 4.5: TestExpress-based testability.
In this section a new scan stitching method is proposed. It creates and configures
scan chains based on a sequential graph coloring. The algorithm consists of two major steps. During the first one, a structure graph (S-graph), which has already been
used in the partial scan synthesis [15], is created. In this graph, vertices represent
flip-flops. A directed edge from node f1 to node f2 indicates that there exists a combinational path between flip-flops f1 and f2, where f1 is a driver and f2 is a receiver.
Flip-flops f1 and f2 are said to be adjacent. The S-graph is subsequently modified in
such a way that it becomes undirected and with no self-loops.
Figure 4.6: Circuit and its S-graph. (a) Design. (b) Initial S-graph. (c) Modified
S-graph.
66
The next step basically runs the sequential graph-coloring algorithm [20] on the
S-graph, so that no two adjacent vertices share the same color. Consequently, flipflops having the same color are not directly connected through combinational logic.
This feature ensures that flip-flops controlling and observing the same combinational logic do not end up within the same scan chain. Stated differently: arranging
scan in such a way that every scan chain comprises only single-color flip-flops ensures that control and observation points for a given fault are hosted by disjoint scan
chains. Consider the circuit of Figure 4.6a. Figure 4.6b illustrates its initial S-graph
which is subsequently modified and colored, as shown in Figure 4.6c. As can be seen,
flip-flops of the same color form now groups corresponding to successive scan
chains with no functional constraints. Furthermore, a colored graph for the circuit
of Figure 4.5 appears in Figure 4.7. Clearly, assigning scan cells E and D to one scan
chain, and cells A, B, and C to another chain solves the problem of providing tests for
faults g0 and f1 in the circuit.
Figure 4.7: Example of structural graph coloring.
The process described above yields subsets (represented by colors) of flip-flops
associated with initial scan chain candidates. Typically, the resultant scan chains are
of diverse sizes, from single flip-flops to the values going beyond acceptable limits.
Thus, a method is required to spread flip-flops more uniformly over the desired
number of balanced scan chains. An approach to solve this problem is presented
next.
First, dividing the total number of scan cells by the requested number of scan
chains gives the desired scan chain size. All chains with a higher number of scan cells
than the preferable length are cropped. They are used to create as many scan chains
as possible. For example, if a preferable scan chain length is 100, and the actual size
of a given scan (after graph coloring) is 550, 5 new scan chains, each 100-cell long,
are created out of the original one. The last 50 cells with all remaining scan cells
participate now in the process of scan balancing, starting from scan cells hosted by
the shortest chains. Every existing scan chain s is assigned its individual cost s. It
measures scan s degree of self-adjacency, i.e., the number of propagation paths
through combinational logic due to the adjacent scan cells hosted by the very same
scan chain s. Initially, all scan chains have their costs set to 0 since the graph coloring
run earlier does not allow adjacent cells to be merged into a single chain scan.
67
Suppose we are now looking for a scan chain that could accommodate cell c. The
cost C of adding cell c to scan chain s is then given by the following formula:
C =  + s,
(9)
where  is the expected increase in the self-adjacency of scan chain s, if cell c is appended to it (in particular, it accounts for the number of combinational paths – back
and forth – between c and s, as shown by the corresponding S-graph). Scan cell c is
eventually moved to a scan chain with the lowest cost C, and with a user-defined
padding factor taken into account. Thus, the cost of adding a cell to a chain, which as
a result becomes too long, is set to infinity. If there are more scan chains with the
lowest value of C, then we choose the shortest scan chain to keep the scan chains
balanced. Finally, the cost of the selected chain is updated, accordingly. Note that
having highly coupled cells in different scan chains is highly desirable as they can be
used to test each other by assuming different operational modes. As can be seen,
scan chains are becoming gradually populated by adjacent scan cells, and, thus, their
costs are rising. However, the algorithm, in a greedy manner, tries to create the lowest cost scan chains by putting loosely coupled cells into the same scan chains. At the
same time, it does not allow any scan chain to get longer or significantly shorter than
the preferable limits. The whole procedure terminates when a satisfactory balance
of scan chains is achieved, or some of the chains become too costly. Experimental
evidences indicate that one can often balance scan chains in such a way that all
chains have their self-adjacency costs equal to 0. Thus, some final adjustments are
possible with respect to, for example, an order of scan cells within the same scan
chains or layout requirements. The ordering may impact such figures of merit as the
fault masking probability (aliasing) or physical design complexity.
It is worth noting that the engineering change orders may or may not force the
scan stitching to rerun. Since the scan stitching does not require the optimal (minimal) coloring of the S-graph, there is a substantial degree of freedom in moving scan
cells between scan chains as guided by layout requirements. The design changes
may also lead to local scan stitching adjustments or may require additional ATPG
effort to detect faults in the modified area.
4.4
Selecting test configurations
Recall that TestExpress assigns every scan chain one of three operational modes:
mission (M), stimuli (S), or compaction (C) with or without gating logic. For a set of
n scan chains, this can be represented by a vector {x0, x1, …, xn-1}, where xk  {M, S, C},
with M being a default option for a given scan chain. The actual assignment is then
uploaded to the configuration register. To ensure high quality test, it can be reloaded
multiple times during a test session. According to low power requirements, the majority of scan chains should remain in the mission mode. Hence, a careful selection
68
of scan configurations is a key factor for minimizing the number of different assignments required to obtain high fault coverage. In this section a fault-driven method
of selecting TestExpress configurations, further referred to as setups, is proposed.
The algorithm employs ATPG to arrive with suitable scan configurations. First of
all, ATPG is run for a given fault list to produce resultant test cubes. Note that if certain scan cells get specified values, their host scan chains cannot comprise any observation sites. The TestExpress-constrained ATPG works this way by propagating
fault effects to observation sites that belong to scan chains with no specified bits. If
a fault propagates exclusively to the stimuli scan chains, it becomes an ATPG abort.
During fault simulation, all detected faults and their propagation sites are recorded
for each test cube. When processing test cubes, scan chains with at least one specified bit are assigned the stimuli mode. Furthermore, at least one observation site per
fault must belong to a scan chain in the compaction mode. Typically, after fault simulation, there are several observation sites we can choose from. The algorithm,
therefore, postpones decisions on whether to assign the compaction mode to particular scan chains until merging of setups when scan chains can be assigned this mode
by taking into account degrees of compatibility between different setups (two setups are compatible if in every position where one of the setups has S or C, the other
one either features the same symbol or M). To reduce the number of specified bits
in test cubes, a SCOAP-based [39] decision order is used during test generation.
Once individual scan setups are created for each test cube, the algorithm attempts
to merge the setups in such a way that the number of resultant configurations is
acceptably low. This is accomplished by using the simulated annealing technique
[55]. It provides a means to escape local optima by allowing moves, which temporarily worsen the objective function value in hopes of finding a global optimum. A
starting point in the proposed approach is a solution obtained by a greedy merging
of some of the existing compatible setups. In fact, the initial setup is selected after a
few trials, where the greedy merging is run on different combinations of setups. Subsequently, the algorithm moves from one solution to its neighbor by randomly selecting a scan chain in the M mode, and by putting this chain into either the S or C
mode (both modes are equally likely). Moreover, another scan chain already in the
S or C mode is chosen at random, and its mode is brought back to M. For example, a
single move can be carried out as shown below:
M M S S M M C C M
M S S S M M C M M
We define the fitness function Fk to measure the value of a scan configuration obtained in iteration k as follows:
Fk = w  Ck – H,
(10)
where Ck counts the number of initial setups covered by a new solution, w is a userdefined weight (w = 2,000 in the chapter), and H determines an average “distance”
69
between the new solution and the remaining setups. H is computed over these setups by counting the number of positions on which the new solution and a given
setup differ in terms of the assigned modes. As can be seen, the greater the value of
H and the more significant migration from setups still awaiting a merge with others,
the lesser the chance of merging with them in the future, and, hence, a lower probability of accepting such a scan configuration right now. Clearly, if Fk > Fk- 1, the new
setup is accepted unconditionally. Otherwise, the new solution is accepted with the
probability p = ev, where v = (Fk – Fk – 1)/T, and T is the current value of “temperature”
decreasing at the “cooling” rate of 0.0001. The higher the temperature, the more
likely it is that a debilitating change of the current configuration will be accepted.
The whole procedure stops if there is no progress reported for a number of iterations. In such a case, all covered configurations are removed from the list, while the
final configuration is passed, as pin and cell constraints, to ATPG generating staggered test patterns. Once they are fault-simulated, the algorithm begins a new iteration with all test cubes corresponding to detected faults removed. Clearly, it obeys
the limit on non-mission scan chains.
4.5
Test generation
The state-of-the-art test pattern generation techniques can be easily adapted to
work with the TestExpress flow, including its second round when scan configurations are already defined. ATPG is then constrained to comply with every configuration mode. First, all mission mode scan cells are forced to operate as non-scan
memory elements. Next, since scan chains in the stimuli mode do not observe data
arriving from the CUT, the observation value of each scan cell is set to unknown.
ATPG is only allowed to specify stimuli and justification data within these scan
chains. Moreover, the control signals of each scan cell are constrained to appropriate
values. This prevents data produced by the circuit from being captured during subsequent cycles of sequential fault simulation. Finally, ATPG is not allowed to specify
bits (scan cells) hosted by scan chains in the compaction mode, i.e., they only capture
test responses. Whenever a fault propagates to a compacting scan chain, it is temporarily marked as detected. Clearly, a fault may not make it to the scan serial output
in an unlikely event of aliasing or X-masking. In such a case, the fault remains an
ATPG target. The probability of fault masking, however, is extremely small as a scan
chain put into the compaction mode forms a finite memory device [89], where after
several clock cycles (depending on a fault injection site) an error is shifted out. This
observation is fully supported by deferred fault crediting that indicates only highly
incidental cases of such events. This is also a fundamental difference between TestExpress and test-per-clock scenarios, e.g., [60], where fault detection can only be
determined at the end of test.
70
TestExpress runs a limited sequential depth ATPG on a pipeline structure whose
stages are comprised of the stimuli scan chains, the mission-mode flip-flops, and finally the compacting chains. Several commercial tools are perfectly capable of working in this fashion. Even full-scan complex designs, such as microprocessors, adapt
this flow routinely because of their large sequential depth. Moreover, the use of sequential ATPG allows TestExpress to handle inherently more complex and time-related failures, including transitions and various types of delay faults.
Figure 4.8: Test cube merging.
The TestExpress-constrained ATPG produces staggered test patterns by deploying a test cube merging technique that exploits the test-per-cycle application paradigm. In principle, the algorithm initially places a group of test cubes in a buffer.
Subsequently, every cycle it tries to merge the largest number of non-conflicting test
cubes from the buffer with the current content of scan chains. Consider the example
shown in Figure 4.8 with specified scan cells denoted in colors. In cycle c0, two test
cubes – t0 and t1 – are merged into an initially empty buffer. During the next clock
cycle (c1) the buffer content is shifted right, and, therefore, test cube t2 becomes now
compatible with the earlier result of merging t0 and t1. Note that the first specified
bit (the red one) leaves the buffer, and, thus, it becomes the first element of the final
7-bit test pattern. Finally, test cube t3 is merged in cycle c2. Whenever there are unspecified bits in a test sequence, a dynamic compaction is used. Clearly, the test cube
merging technique can increase the pattern fill rate even to the point where, for example, test data compression can be hardly possible. This issue is further discussed
in the next section.
Figure 4.9: Fault in a self-loop.
71
It is worth noting that some faults might be harder to detect with a conventional
ATPG due to TestExpress-related separation of control and observation roles. Consider a circuit in Figure 4.9, where A and B represent either primary inputs or scan
cells in the stimuli mode, while scan cell C belongs to a scan chain in the compaction
mode. For simplicity’s sake, scan enable signals and multiplexers are not shown in
the figure. Since it is impossible to assign scan cells in the compaction mode with
stimuli values, a 2-cycle test is required to detect a stuck-at-1 fault of Figure 4.9. Its
first cycle resets C, whereas the second one excites the fault and captures its effect.
The initialization of C, however, requires prohibitively complex justification of all
preceding scan cells. These types of faults can be addressed in various ways, including a test point insertion or customized ATPG with the ability to trace scan chains in
the compaction mode. Deploying the compaction-with-blocking mode (as described
in Section 4.2) is, therefore, the simplest way to reduce the negative impact of such
faults. Recall that the stitching algorithm breaks all combinational loops that involve
more than a single scan cell.
4.6
Experimental results
All algorithms presented in the preceding sections were implemented as a prototype code that was subsequently adapted in commercial EDA tools. In particular, the
staggered ATPG was integrated with the FastScan – Mentor Graphics’ commercial
ATPG tool. Fault simulation was performed with the help of a prototype sequential
fault simulator. Finally, scan chain insertion, S-graph algorithms, and scan stitching
were carried out by means of originally developed procedures interleaved with the
source code of Mentor’s DFTAdvisor.
In this section, five large industrial designs are used to demonstrate feasibility,
effectiveness, and substantial gains one may expect when deploying the TestExpress
methodology. The circuit characteristics are provided in Table 4.2. For each test
case, the table lists the number of gates, the number of scan cells, the number of scan
chains, and the size of the longest scan chain. The last but one column gives the size
of a collapsed stuck-at fault list comprising faults detected by deploying a conventional scan infrastructure except faults affecting clock trees and those detected by
implication. The last column lists the percentage of scan cells having a connection
through combinational logic to itself. All designs feature X-bounding logic (to prevent unknown values from propagating to scan chains in the compaction mode) and
test points. The number of test points is approximately equal to 1% – 2% of the entire scan cell population. As a result, the number of fortuitous detections increases,
and test cubes are less specified. All designs have a global reset capability and are
driven by a single clock.
72
Table 4.2: Design characteristics.
D1
D2
D3
D4
D5
4.6.1
Gates
Scan cells
Scan chains
2.6M
1.6M
1.1M
450k
475k
165k
145k
85k
45k
30k
1,142
735
433
230
195
The longest
chain
158
200
200
200
150
Faults
4.5M
1.9M
2.0M
0.7M
0.6M
Selfloops
17%
15%
71%
38%
73%
Applicability of test data compression
All experiments reported in this chapter were carried out within the framework
of test compression to arrive with a realistic assessment of such factors as the number of external test pins, complexity of the interface between ATE and the scan
chains, or the volume of involved test data. Clearly, one has to first consider feasibility of deploying a test data compression scheme in conjunction with TestExpress.
For simplicity’s sake, the following discussion refers to design D1, but the conclusions remain valid for all test cases. The experiments indicate that the number of
specified positions in TestExpress test patterns may vary significantly depending on
a shift cycle as well as on a scan configuration. Figure 4.10 shows that the fill rates
typically spike during the first two hundred clock cycles and drop gradually afterward. Furthermore, following highly specified tests of the first few scan configura-
Figure 4.10: Fill rates vs clock cycles for selected test configurations.
73
tions are many low-fill-rate test patterns associated with scan configurations activated later in the process. This trend – also illustrated in Table 4.3 for selected scan
configurations and the corresponding average and maximum fill rates – indicates
that these patterns could be the subject of effective compression, thus reducing the
test data volume. Moreover, having a test data decompressor feeding scan chains
can ease the problem of connecting ATE channels with scan chains in the S mode.
Due to its flexibility, a decompressor can drive any combination of scan chains in a
much simpler manner than any other switching device interfacing ATE with the scan
chains. To be successfully compressed, though, certain test patterns must have their
fill rates lowered. Consequently, the cube merging producing staggered patterns
(Section 4.5) needs to verify continuously whether the resultant test pattern remains compressible. This can be accomplished either by using an incremental solver
provided by the EDT technology, or by monitoring the relationship between the
amount of injected test data and the number of specified bits. If a test pattern in its
current form cannot be encoded, the newly added test cube is temporarily discarded, and the algorithm tries to accommodate another test cube. Note that the experimental results presented in the following part of the section were obtained for
the EDT-based compression platform with 10 external input channels employed to
deliver test data and the feed scan chains through a 32-bit decompressor.
Table 4.3: Fill rates.
4.6.2
Configuration
Average [%]
Maximum [%]
1st
31.04
79.00
5th
27.24
65.69
10th
17.58
40.60
20th
4.78
16.78
30th
2.66
10.56
Power consumption
To fully appreciate the TestExpress ability to reduce test time, the presentation
of experimental results begins with switching activity numbers. Table 4.4 illustrates
a breakdown of scan cells in designs of Table 4.2 into three operational modes with
the corresponding toggling rates for each group. The results are averaged over all
scan configurations used for the examined designs. For each configuration, 80% of
scan cells remain in the mission mode. The toggling rate presented in each column
of Table 4.4 is an individual percentage of the number of invoked transitions in scan
cells of a given operational mode with respect to the scenario, when every scan cell
switches in every cycle. Hence, in the worst case, the numbers reported in columns
Stimuli, Compaction and Mission would have the value of 100%. It is worth noting
74
that all transition counts were obtained with the help of a gate-level logic simulator.
However, a conventional weighted transition metric (WTM), traditionally used to
measure scan-based toggling, was replaced with a per-cycle counting, as for a testper-clock approach the conventional WTM reporting cannot be applied.
Table 4.4: Power dissipation.
Stimuli
Compaction
Mission
Weighted average
D1
48.27%
49.17%
12.95%
20%
D2
49.74%
48.54%
12.75%
20%
D3
49.64%
48.58%
6.15%
15%
D4
49.37%
48.96%
12.37%
20%
D5
49.56%
48.10%
3.32%
12%
The last column of Table 4.4 reports the weighted average switching values.
These numbers were obtained as follows. Consider, for example, design D1 with
165k memory elements and one of its test configurations where the amount of stimuli, compacting, and mission cells are 22k (13%), 13k (8%), and 130k (79%), respectively. Let a test pattern consist of 2,000 shift clock cycles. Consequently, the
ultimate number of transitions in each class of cells is equal to 44M, 26M, and 260M
(these numbers are products of the number of cells and the test length). The actual
observed numbers of transitions were 20M (45%), 13M (50%), and 33M (13%).
Eventually, the weighted switching w is therefore given by
w = 13%  45% + 8%  50% + 79%  13% = 20%.
As can be seen, the overall switching activity reported in the table is less or equal
to 20%. Since random values are typically used to fill unspecified positions in test
cubes, the toggling rate for a conventional scan-based test usually remains close to
50%. Consequently, TestExpress yields over 2-fold reduction of test power, thus alleviating problems related to average and peak power dissipation. By reducing the
degree of toggling during scan shifting and capture, other issues such as increased
temperature or voltage drop may become easier to handle in the test mode. Moreover, as the average power dissipated during test is proportional to a shift-clock frequency, reduction of a test power envelope creates a margin that allows for acceleration of scan shifting at a rate that maintains the same power consumption as that
of conventional test solutions. Interestingly, the quasi-static nature of the scan chain
mode selection signals alleviate the harmful di/dt issues common to transitions
from shift to capture mode in conventional scan chains. Clearly, both phenomena
can settle into halving of test application time. Furthermore, test power reduction
techniques based on filling the unknown values in test patterns with constant values
[14] can be used to further reduce the switching activity.
75
Figure 4.11 illustrates test coverage curves obtained for the test cases where
TestExpress stimuli scan chains receive test patterns through EDT compression
logic. Curve (1) plots reference test coverage as a function of time. For example, the
conventional test reaches a target coverage for design D1 after applying 1,141 patterns in 3,611 s, which is 181,577 cycles (1,142  158 shift cycles at 50 MHz plus
D1
D4
D2
D5
D3
Figure 4.11: Test coverage for compressed patterns.
76
1,141 capture cycles at 500 MHz). The remaining curves represent the following
TestExpress-based scenarios:
 test application with the same shift frequency as that of the conventional
scan-based test (2);
 test application with overlapping durations of two successive scan configurations (3); this scheme, after adding a shadow configuration register, allows
downloading a given scan chain in the mode C in parallel with uploading a
test stimulus to the very same chain which moves to the mode S in the next
configuration;
 test application with the shift frequency increased twice due to test power
margins (4), as described above.
All results are reported for the target test coverage of 99%. The leveled off curves
of Figure 4.11, and, thus, the incomplete test coverage, are attributed to faults affecting scan cell self-loops, as detailed in Section 4.5 (see Figure 4.9).
Back to design D1, the target test coverage is reached by curve (2) in 1,878 s, i.e.,
in 93,912 shift cycles at 50 MHz. The actual test time for scenarios (3) and (4) is
further reduced due to faster clocks and parallel application of some test data. Consequently, the same test coverage can be ultimately reached in 863 s (86,328 shift
cycles at 100 MHz), i.e., the overall speedup is 4.18x. Furthermore, the test data volume decreases 1.86 times. Results for the remaining circuits are gathered in Table
4.5, where a test time needed to achieve the target test coverage is reported for scenarios from (1) to (4) altogether with the resultant speedup. The next two columns
give the number of setups, and the reduction of the total test data volume relative to
the EDT scheme.
Table 4.5: Test time and test data reduction relative to EDT.
(1)
D1
D2
D3
D4
D5
Test time [μs] and speedup
(2)
(3)
(4)
3,611
1,878
1,726
863
1.00x
1.92x
2.09x
4.18x
1,156
1,065
829
414
1.00x
1.09x
1.39x
2.79x
2,641
1,520
1,252
626
1.00x
1.95x
2.44x
4.88x
1,284
889
654
327
1.00x
1.44x
1.96x
3.93x
732
773
614
307
1.00x
0.95x
1.19x
2.38x
# setups
Data
reduction
49
1.86x
60
1.15x
68
1.92x
62
1.80x
54
1.11x
77
As mentioned earlier, integrating test data compression with the TestExpress environment is a key factor for its successful application. This is because of additional
control data that one needs to deliver for each scan configuration. As shown in Section 4.2, the configuration register stores 2 bits per scan chain. In design D1, it
amounts to 1,426  2 = 2,852 bits per scan configuration. As the number of scan
configurations is equal to 49, the total number of control data that needs to be supplied during test equals 139,748 bits. This is roughly 15% of the total number of test
data (86,328 cycles multiplied by 10 channels) used to test the circuit.
Numerous features that are essential to making TestExpress practical for deployment have been identified. It appears, however, that the single largest factor shaping
the computational complexity of TestExpress is its APTG and fault simulation engines. In fact, the scan stitching, configuration selection and other processes have a
negligible impact on TestExpress run times, while being helpful in minimizing potential ATPG aborts. In addition to ATPG whose processing time scales linearly with
the pattern count, the sequential fault simulation is what determines the actual run
time in each test case. An un-optimized prototype sequential fault simulator was deployed. Hence, other than getting the exact coverage numbers, the resultant processing times cannot be considered as definite and final outcomes. As this is only the
first step, an attempt was made to use a deferred fault crediting just to obtain virtually the same results in a much shorter period of time. Consequently, there are very
good reasons to believe that the ultimate TestExpress run time is going to be roughly
10 times longer than that of conventional scan-based schemes.
The silicon real estate taken up by TestExpress test logic amounts to an equivalent area of 2-input NAND gates as shown in Table 4.6 (the same area is also reported in parentheses in terms of mm2). The presented numbers were computed
with a commercial synthesis tool for designs of Table 4.2. All components of the new
test logic were synthesized using a 65nm CMOS standard cell library under 2.5ns
timing constraint. The table reports the following quantities: the resultant silicon
area with respect to combinational and sequential devices for non-scan designs (the
first three columns), the total area taken by conventional scan-based circuits, and
then the percentage area increase (ΔN). Subsequently, the total TestExpress-based
area is presented and compared with the corresponding areas occupied by a nonscan design (ΔN) and a conventional scan-based EDT (ΔC). As can be seen, data reported in the table remain consistent with the area assessment presented at the end
of Section 4.2.
78
Table 4.6: Area overhead – 2-input NAND equivalent (and mm2).
Non-scan architecture
D1
D2
D3
D4
D5
Conventional scan
Sequential
Combinational
Total
Total
852,225
438,300
1,290,525
1,418,359
(1.70)
(0.88)
(2.58)
(2.84)
749,925
261,900
1,011,825
1,123,164
(1.50)
(0.52)
(2.02)
(2.25)
439,025
182,700
621,725
687,579
(0.88)
(0.37)
(1.24)
(1.38)
232,425
73,900
305,325
340,189
(0.46)
(0.15)
(0.61)
(0.68)
155,950
80,100
235,050
258,293
(0.31)
(0.16)
(0.47)
(0.52)
ΔN
TestExpress
Total
ΔN
ΔC
10%
1,563,237
(3.13)
21%
10%
11%
1,250,481
(2.50)
24%
11%
23%
11%
24%
12%
21%
10%
11%
11%
10%
762,213
(1.52)
379,701
(0.76)
284,634
(0.57)
79
80
Chapter 5 Conclusions
Conclusions
The miniaturization of integrated circuits and the associated innovations introduced
to the manufacturing process challenge the limits of widely accepted test schemes.
With the variety of emerging defect types, the amount of test data, and more importantly, the test application time needed to arrive with the high quality test becomes increasingly expensive to maintain. Moreover, preserving the power consumption constraints during test is vital for its reliability. As the electronic devices
made their way into the safety-critical applications, their condition needs constant
and thorough monitoring. It is not clear whether the mainstream test solutions, including on-chip test compression, will be able to keep up with the rapid rate of technological changes over the next decade. This thesis has presented a number of novel
techniques designed to address the demands of present-day test. Moreover, as the
near future may require disruptive paradigm shifts in testing of digital circuits, a
new time-efficient scheme able to significantly reduce the cost of test was shown to
address the emerging challenges.
In Chapter 2 a complete low-power hybrid of test data compression and BIST is
proposed which combines the advantages of both techniques in common hardware.
In principle, PRESTO is the low-power test generator able to produce pseudorandom test patterns with scan shift-in switching activity precisely selected. An automated selection of settings is proposed to guarantee the requested toggling rate.
The same features can be used to control the generator with small amount of deterministic data, so that the resultant test vectors can either yield a desired fault coverage faster than the conventional pseudorandom patterns while still reducing toggling rates down to desired levels, or they can offer visibly higher coverage numbers
if run for comparable test times. This LP PRPG is also capable of acting as a test data
decompressor with the ability to control scan shift-in switching activity through the
process of encoding. The proposed hybrid solution allows one to efficiently combine
test compression with logic BIST, where both techniques can work synergistically
to deliver high quality test. It is, therefore, an attractive LP test scheme that allows
for trading-off test coverage, pattern counts and toggling rates in a very flexible
manner.
A fully deterministic BIST that allows maintaining highly reliable operation of a
device throughout its lifespan is presented in Chapter 3. It reuses on-chip decompression logic that, in particular, makes it possible to apply pseudorandom test patterns. The new scheme, however, exclusively deploys compressed ATPG-produced
test patterns and their derivatives that are generated on chip in a deterministic fashion, as well. As a result, the presented solution outperforms earlier state-of-the-art
81
sequential test compression techniques in many aspects. For example, it allows a
faster test coverage ramp-up, requires less on-chip memory to achieve otherwise
similar test coverage (which is indicative of a higher test data compression and the
enhanced encoding efficiency), or, alternatively, it returns higher test coverage
numbers for a comparable memory usage. Furthermore, it enables more of complex
designs to be tested in a given session, thus complementing design-partitioning
schemes. Besides simple diffraction logic, an on-chip test memory is its major hardware component. Fortunately, the new scheme allows one to trade-off the memory
size and the resultant test coverage in a very flexible manner. It makes the proposed
BIST scheme relevant for a wide range of applications as it offers a smooth migration
path to the next-generation compression solutions while minimizing design and
manufacturing cost impacts.
Finally, a new scan-based test paradigm called TestExpress is proposed in Chapter 4 to significantly reduce the manufacturing test cost by providing time-efficient
scan-based test operations. TestExpress has inherent abilities to offer higher performance and lower power consumption compared with the conventional scan. In
particular, as it approaches a system speed in the test mode, from 100 to 500 times
more (depending on the scan size) test patterns can be applied during the time of a
conventional scan-based test. Clearly, TestExpress enables at-speed tests and can
achieve better coverage of un-modeled defects. Moreover, special clocking schemes,
where scan chains in the mission and compaction modes are clocked at-speed while
the regular shift clock is applied to the stimuli scans, can be used to target transition
faults. The scan enable signals do not have to be routed as a clock. Furthermore, the
negative impact of the IR drop and di/dt issues are reduced notably. High quality
test is guaranteed as the new approach can work with all traditional fault models
used today, as well as with any new fault models of the future. In particular, timerelated defects should be relatively easy to handle due to inherent features of the
limited depth sequential ATPG that TestExpress is using. To simplify fault diagnosis,
the quiet cell logic in the compaction mode can prevent the erroneous responses
from propagating back to the circuit. Thus, the existing partial-scan-based diagnostic techniques turn out to be applicable. Finally, the proposed solution allows testing
either in a deterministic manner (including test compression), in a BIST mode, or by
combining both techniques. Consequently, TestExpress can be regarded as a disruptive innovation which is ideally suited as the next generation DFT technology for
low-cost high-quality manufacturing test targeting deep sub-micron devices.
The thesis proposes comprehensive solutions for the current testing challenges
that help to maintain high quality test in all stages of a design lifecycle from early
production stages to the wear-out and aging monitoring, also within the power-consumption constraints. Moreover, it introduces a new deterministic test-per-cycle
technology which one day may become the only viable solution to provide the highest quality test.
82
83
References
References
[1]
M. E. Aboulhamid and E. Cerny, "A class of test generators for built-in
testing," IEEE Trans. Comput., vol. C-32, no. 10, pp. 957–959, Oct. 1983.
[2]
A. S. Abu-Issa and S. F. Quigley, "Bit-swapping LFSR for low-power
BIST," Electronics Letters, vol. 44, pp. 401–402, Mar. 2008.
[3]
V. D. Agrawal, "Special issue on partial scan methods," Journal of
Electronic Testing, vol. 7, no. 1/2, Aug./Oct. 1995.
[4]
V. D. Agrawal, K.-T. Cheng, D. D. Johnson, and T. S. Lin, "Designing
circuits with partial scan," IEEE Design Test Comput., vol. 5, no. 2, pp. 815, Apr. 1988.
[5]
H. Ando, "Testing VLSI with random access scan," in Proc. COMPCON,
1980, pp. 50-52.
[6]
D. H. Baik and K. K. Saluja, "Progressive random access scan: a
simultaneous solution to test power, test data volume and test time," in
Proc. ITC, 2005, pp. 359-368.
[7]
P. H. Bardell and W. H. McAnney, "Self-testing of multiple logic
modules," in Proc. ITC, 1982, pp. 200-204.
[8]
P. H. Bardell and W.H McAnney, "Simultaneous self-testing system," US
Patent 4513418, 1985.
[9]
C. Barnhart, V. Burnkhorst, F. Distler, O. Farnsworth, B. Keller, and B.
Konemann, "OPMISR: The foundation for compressed ATPG vectors," in
Proc. ITC, 2001, pp. 748-757.
[10]
B. Benware, G. Mrugalski, A. Pogiel, J. Rajski, J. Solecki, and J. Tyszer,
"Fault diagnosis with orthogonal compactors in scan-based designs,"
Journal of Electronic Testing, vol. 27, no. 5, pp. 599-609, Oct. 2011.
[11]
S. Bhunia, H. Mahmoodi, D. Ghosh, and S. Mukhopadhyay, "Low-power
scan design using first-level supply gating," IEEE Trans. VLSI Systems,
vol. 13, no. 3, pp. 384-395, Mar. 2005.
[12]
Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, and S.
Pravossoudovitch, "A gated clock scheme for low power scan testing of
logic ICs or embedded cores," in Proc. ATS, 2001, pp. 253-258.
[13]
Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, and S.
Pravossoudovitch, "Efficient scan chain design for power minimization
during scan testing under routing constraint," in Proc. ITC, 2003, pp.
488–493.
84
[14]
K. M. Butler, J. Saxena, T. Fryars, G. Hetherington, A. Jain, and J. Lewis,
"Minimizing power consumption in scan testing: pattern generation
and DFT techniques," in Proc. ITC, 2004, pp. 355-364.
[15]
S. T. Chakradhar, A. Balakrishnan, and V. D. Agrawal, "An exact
algorithm for selecting partial scan flip-flops," in Proc. DAC, 1994, pp.
81-86.
[16]
M. Chatterjee and D. Pradham, "A novel pattern generator for nearperfect fault-coverage," in Proc. VTS, 1995, pp. 417-425.
[17]
R. Chau and R. Arghavani, "A method for making a semiconductor
device having a high-k gate dielectric," US Patent 6617210, 2005.
[18]
K.-T. Cheng and V. D. Agrawal, "A partial scan method for sequential
circuits with feedback," IEEE Trans. Comput., vol. 39, no. 4, pp. 544-548,
Apr. 1990.
[19]
"Cisco visual networking index: global mobile data traffic forecast
update 2014–2019 White Paper," Cisco Systems, Inc., 2015.
[20]
T. F. Coleman and J. J. More, "Estimation of sparse Jacobian matrices and
graph coloring problems," SIAM J. Numer. Anal., vol. 20, no. 1, pp. 187209, Feb. 1983.
[21]
F. Corno, P. Prinetto, and M. Sonza Reorda, "Making the circular self-test
path technique effective for real circuits," in Proc. ITC, 1994, pp. 949957.
[22]
F. Corno, M. Rebaudengo, M. Sonza Reorda, and G. Squillero, "Low
power BIST via non-linear hybrid cellular automata," in Proc. VTS, 2000,
pp. 29-34.
[23]
R. Dandapani, J. Patel, and J. Abraham, "Design of test pattern
generators for built-in test," in Proc ITC, 1984, pp. 315-319.
[24]
D. Das and N. A. Touba, "Reducing test data volume using
external/LBIST hybrid test patterns," in Proc. ITC, 2000, pp. 115–122.
[25]
R. Dorsch and H.-J. Wunderlich, "Reusing scan chains for test pattern
decompression," in Proc. ETW, 2001, pp. 24-32.
[26]
R. Dorsch and H.-J. Wunderlich, "Tailoring ATPG for embedded testing,"
in Proc. ITC, 2001, pp. 530-537.
[27]
G. Edirisooriya and J. P. Robinson, "Design of low cost ROM based test
generators," in Proc. VTS, 1992, pp. 61-66.
[28]
E. B. Eichelberger and T. W. Williams, "A logic design structure or LSI
testability," in Proc. DAC, 1977, pp. 462-468.
85
[29]
M. Filipek, Y. Fukui, H. Iwata, G. Mrugalski, J. Rajski, M. Takakura, and J.
Tyszer, "Low power decompressor and PRPG with constant value
broadcast," in Proc. ATS, 2011, pp. 84–89.
[30]
M. Filipek, G. Mrugalski, N. Mukherjee, B. Nadeau-Dostie, J. Rajski, J.
Solecki, and J. Tyszer, "Low-power programmable PRPG with test
compression capabilities," IEEE Trans. VLSI Systems, vol. 23, no. 6, pp.
1063-1076, June 2015.
[31]
S. Funatsu, N. Wakatsuki, and T. Arima, "Test generation systems in
Japan," in Proc. DAC, 1975, pp. 114-122.
[32]
P. Gelsinger, "Discontinuities driven by a billion interconnected," IEEE
Design & Test of Computers, vol. 17, no. 1, pp. 7-15, Jan.-Mar. 2000.
[33]
S. Gerstendorfer and H.-J. Wunderlich, "“Minimized power
consumption for scan-based BIST," in Proc. ITC, 1999, pp. 77-84.
[34]
J. Geuzebroek, E. J. Marinissen, A. Majhi, A. Glowatz, and F. Hapke,
"Embedded multi-detect ATPG and its effect on the detec-tion of
unmodeled defects," in Proc. ITC, 2007, paper 30.3.
[35]
V. Gherman, H.-J. Wunderlich, H. Vranken, F. Hapke, M. Wittke, and M.
Garbers, "Efficient pattern mapping for deterministic logic BIST," in
Proc. ITC, 2004, pp. 48–56.
[36]
P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, "A test vector
inhibiting technique for low energy BIST design," in Proc. VTS, 1999, pp.
407–412.
[37]
P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, and H.-J.
Wunderlich, "A modified clock scheme for a low power BIST test
pattern generator," in Proc. VTS, 2001, pp. 306–311.
[38]
P. Girard, N. Nicolici, and X. Wen (ed.), Power-aware testing and test
strategies for low power devices. New York: Springer, 2010.
[39]
L. H. Goldstein, "Controllability/observability analysis of digital
circuits," IEEE Trans. Circuits Syst., vol. 26, no. 9, pp. 685-693, Jan. 1979.
[40]
R. Gupta, R. Gupta, and M. A. Breuer, "The ballast methodology for
structured partial scan design," IEEE Trans. Comput., vol. 39, no. 4, pp.
538-544, Aug. 1990.
[41]
A.-W. Hakmi, S. Holst, H.-J. Wunderlich, J. Schloffel, F. Hapke, and A.
Glowatz, "Restrict encoding for mixed-mode BIST," in Proc. VTS, 2009,
pp. 179–184.
[42]
A.-W. Hakmi, H.-J. Wunderlich, C. G. Zoellin, A. Glowatz, F. Hapke, J.
Schloeffel, and L. Souef, "Programmable deterministic built-in self-test,"
in Proc. ITC, 2007, paper 18.1.
86
[43]
F. Hapke, M. Reese, J. Rivers, A. Over, V. Ravikumar, W. Redemund, A.
Glowatz, J. Schloeffel, and J Rajski, "Cell-aware production test results
from a 32-nm notebook processor," in Proc. ITC, 2012, paper 1.1.
[44]
S. Hellebrand, J. Rajski, S. Tarnick, S. Venkataraman, and B. Courtois,
"Built-in test for circuits with scan based on reseeding of multiplepolynomial linear feedback shift registers," IEEE Trans. Comput., vol. 44,
no. 2, pp. 223-233, Feb. 1995.
[45]
A. Hertwig and H.-J. Wunderlich, "Low power serial built-in self-test," in
Proc. ETS, 1998, pp. 49-53.
[46]
C. Hobbs and P. Lee. (2013, Jul.) Electronic Design. [Online].
www.electronicdesign.com/embedded/understanding-iso-26262asils
[47]
K. Ichino, T. Asakawa, S. Fukumoto, K. Iwasaki, and S. Kajihara, "Hybrid
BIST using partially rotational scan," in Proc. ATS, 2001, pp. 379-384.
[48]
"IEEE draft standard for access and control of instrumentation
embedded within a semiconductor device," IEEE Association, Sept.
2014.
[49]
M. E. Imhof, C. G. Zoellin, H.-J. Wunderlich, N. Mading, and J. Leenstra,
"Scan test planning for power reduction," in Proc. DAC, 2007, pp. 521526.
[50]
A. Jas., C.V. Krishna, and N.A. Touba, "Hybrid BIST based on weighted
pseudo-random testing: a new test resource partitioning scheme," in
Proc. VTS, 2001, pp. 2-8.
[51]
A. Jas, K. Mohanram, and N.A. Touba, "An embedded core DFT scheme
to obtain highly compressed test sets," in Proc. ATS, 1999, pp. 275-280.
[52]
G. Jervan, Z. Peng, and R. Ubar, "Test cost minimization for hybrid BIST,"
in Proc. Defect and Fault Tolerance in VLSI Systems, 2000, pp. 25-27.
[53]
R. Kapur, S. Mitra, and T.W. Williams, "Historical perspective on scan
compression," IEEE Design & Test, vol. 25, no. 2, pp. 114–120, Apr. 2008.
[54]
Y. Kim, M.-H. Yang, Y. Lee, and S. Kang, "A new low power test pattern
generator using a transition monitoring window based on BIST
architecture," in Proc. ATS, 2005, pp. 230–235.
[55]
S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, "Optimization by
simulated annealing," Science, vol. 220, no. 4598, pp. 671-680, May
1983.
[56]
T. Kobayashi, T. Matsue, and H. Shiba, "Flip-Flop circuit with FLT (FaultLocation-Technique) capability," in Proc. IECEO Conf, 1968, p. 962.
87
[57]
B. Koenemann, "LFSR-coded test patterns for scan designs," in Proc.
ETC, 1991, pp. 237–242.
[58]
B. Koenemann, C. Barnhart, B. Keller, and T. Snethen, "A SmartBIST
variant with guaranteed encoding," in Proc. ATS, 2001, pp. 325-330.
[59]
B. Konemann, J. Mucha, and G. Zwiehoff, "Built-in logic block
observation techniques," in Proc. ITC, 1979, pp. 37-41.
[60]
A. Krasniewski and S. Pilarski, "Circular self-test path: a low cost BIST
technique for VLSI circuits," IEEE Trans. CAD, vol. 8, no. 1, pp. 46-55, Jan.
1989.
[61]
C. V. Krishna, A. Jas, and N. A. Touba, "Test vector encoding using partial
LFSR reseeding," in Proc. ITC, 2001, pp. 885-893.
[62]
C.V. Krishna and Nur A. Touba, "Hybrid BIST using an incrementally
guided LFSR," in Proc. Int. Symp. on Defect and Fault Tolerance in VLSI
Systems, 2003, pp. 217-224.
[63]
C. V. Krishna and N. A. Touba, "Reducing test data volume using LFSR
reseeding with seed compression," in Proc. ITC, 2002, pp. 321–330.
[64]
D. H. Lee and S. M. Reddy, "On determining scan flip-flops in partial-scan
designs," in Proc. ICCAD, 1990, pp. 322–325.
[65]
L. Lei and K. Chakrabarty, "Hybrid BIST based on repeating sequences
and cluster analysis," in Proc. DATE, 2005, pp. 1142-1147.
[66]
H.-G. Liang, S. Hellebrand, and H.-J. Wunderlich, "Two-dimensional test
data compression for scan-based deterministic BIST," in Proc. ITC,
2001, pp. 894–902.
[67]
L. Li and K. Chakrabarty, "Test set embedding for deterministic BIST
using a reconfigurable interconnection network," IEEE Trans. CAD, vol.
23, no. 9, pp. 1289–1305, Sept. 2004.
[68]
X. Lin and J. Rajski, "Adaptive low shift power test pattern generator,"
in Proc. ATS, 2010, pp. 355–360.
[69]
S. Mitra and K. S. Kim, "X-Compact: an efficient response compaction
technique," IEEE Trans. CAD, vol. 23, no. 4, pp. 421-432, Mar. 2004.
[70]
G. E. Moore, "Cramming more components onto integrated circuits,"
Electronics, vol. 38, no. 8, pp. 114-117, Apr. 1965.
[71]
G. Mrugalski, N. Mukherjee, J. Rajski, J. Solecki, and J. Tyszer, "Low
power programmable PRPG with enhanced fault coverage gradient," in
Proc. ITC, 2012, paper 9.3.
[72]
G. Mrugalski, A. Pogiel, J. Rajski, J. Tyszer, and C. Wang, "Fault diagnosis
with convolutional compactors," IEEE Trans. CAD, vol. 26, no. 8, pp.
1478-1494, Aug. 2007.
88
[73]
G. Mrugalski, J. Rajski, J. Solecki, and J. Tyszer, "Scan chain configuration
for test-per-clock based on circuit topology," US Patent 9009553, 2015.
[74]
G. Mrugalski, J. Rajski, J. Solecki, and J. Tyszer, "Test generation for testper-clock," US Patent Application No. 20140372824, 2014.
[75]
G. Mrugalski, J. Rajski, J. Solecki, and J. Tyszer, "Test-per-clock based on
dynamically-partitioned reconfigurable scan chains," US Patent
Application No. 20140372818, 2014.
[76]
G. Mrugalski, J. Rajski, Ł. Rybak, J. Solecki, and J. Tyszer, "A deterministic
BIST scheme based on EDT-compressed test patterns," Accepted for
IEEE ITC, 2015.
[77]
G. Mrugalski, J. Rajski, J. Solecki, and J. Tyszer, "Fault-driven scan chain
configuration for test-per-clock," US Patent 9003248, 2015.
[78]
G. Mrugalski, J. Rajski, J. Solecki, and J. Tyszer, "Scan chain stitching for
test-per-clock," US Patent Application No. 20140372821, 2014.
[79]
G. Mrugalski, J. Rajski, J. Solecki, J. Tyszer, and C. Wang, "TestExpress new time-effective scan-based deterministic test paradigm," Submitted
to IEEE Trans. CAD, 2015.
[80]
N. Mukherjee, J. Rajski, G. Mrugalski, A. Pogiel, and J Tyszer, "Ring
generator: an ultimate linear feedback shift register," IEEE Computer,
vol. 44, no. 6, pp. 64-71, June 2011.
[81]
F. Muradali, V. K. Agarwal, and B. Nadeau-Dostie, "A new procedure for
weighted random built-in self-test," in Proc. ITC, 1990, pp. 660–669.
[82]
S. Natarajan, M. Agostinelli, S. Akbar, M. Bost, A. Bowonder, V.
Chikarmane, S. Chouksey, and A Dasgupta, "A 14nm logic technology
featuring 2nd-generation FinFET, air-gapped interconnects, selfaligned double patterning and a 0.0588 µm2 SRAM cell size," in
Technical Digest of the 2014 International Electron Devices Meeting,
2014, pp. 3.7.1- 3.7.3.
[83]
O. Novak and J. Nosek, "Test-per-clock testing of the circuits with scan,"
in Proc. Int. On-Line Testing Workshop, 2001, pp. 90-92.
[84]
N. Parimi and Sun X., "Toggle-masking for test-per-scan VLSI circuits,"
in Proc. IEEE Int. Symp. on Defect and Fault-Tolerance in VLSI Systems,
2004, pp. 332–338.
[85]
S. Pateras and J. Rajski, "Generation of correlated random patterns for
the complete testing of synthesized multi-level circuits," in Proc. DAC,
1991, pp. 347–352.
[86]
I. Pomeranz and S. M. Reddy, "3-weight pseudo-random test generation based on a deterministic test set for combinational and
89
sequential circuits," IEEE Trans. CAD, vol. 12, no. 7, pp. 1050–1058, July
1993.
[87]
J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, "Embedded
deterministic test," IEEE Trans. CAD, vol. 23, no. 5, pp. 776-792, May
2004.
[88]
J. Rajski, J. Tyszer, G. Mrugalski, and B. Nadeau-Dostie, "Test generator
with preselected toggling for low power built-in self-test," in Proc. VTS,
2012, pp. 1-6.
[89]
J. Rajski, J. Tyszer, C. Wang, and S. M. Reddy, "Finite memory test
response compactors for embedded test applications," IEEE Trans. CAD,
vol. 24, no. 4, pp. 622-634, Apr. 2005.
[90]
W. Rao and A. Orailoglu, "Virtual compression through test vector
stitching for scan based designs," in Proc. DATE, 2003, pp. 104-109.
[91]
W. Rhines, "Keynote presentation: 3 discontinuities in design for test,"
ETS, 2014.
[92]
P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici, "Low power mixedmode BIST based on mask pattern generation using dual LFSR reseeding," in Proc. ICCD, 2002, pp. 474–479.
[93]
J. P. Roth, "Diagnosis of automata failures: a calculus and a method," IBM
Journal of Research and Development, vol. 10, no. 4, pp. 278-291, July
1966.
[94]
R. Sankaralingam, R. Oruganti, and N. Touba, "Static compaction
techniques to control scan vector power dissipation," in Proc. VTS,
2000, pp. 35-40.
[95]
T. Saraswathi, K. Ragini, and C. G. Reddy, "A review on power
optimization of linear feedback shift register (LFSR) for low power built
in self test (BIST)," in Proc. ICECT, 2011, pp. 172–176.
[96]
N. Sitchinava, S. Samaranayake, R. Kapur, E. Gizdarski, F. Neuveux, and
T.W. Williams, "Changing the scan enable during shift," in Proc. VTS,
2004, pp. 73-78.
[97]
Y. Son, J. Chong, and G. Russell, "E-BIST: enhanced test-per-clock BIST
architecture," IEE Proc. Comput. Digit. Techn., vol. 149, no. 1, pp. 9-15,
Jan. 2002.
[98]
C. Stroud, "An automated BIST approach for general sequential logic
synthesis," in Proc. DAC, 1988, pp. 3-8.
[99]
M. Tehranipoor, M. Nourani, and N. Ahmed, "Low transition LFSR for
BIST-based applications," in Proc. ATS, 2005, pp. 138-143.
90
[100]
N. A. Touba, "Survey of test vector compression techniques," IEEE
Design & Test, vol. 23, no. 4, pp. 294–303, Apr. 2006.
[101]
N. A. Touba and E. J. McCluskey, "Bit-fixing in pseudo-random
sequences for scan BIST," IEEE Trans. CAD, vol. 20, no. 4, pp. 545–555,
Apr. 2001.
[102]
N. A. Touba and E. J. McCluskey, "Transformed pseudo-random patterns
for BIST," in Proc. VTS, 1995, pp. 2-8.
[103]
K.-H. Tsai, J. Rajski, and M. Marek-Sadowska, "Star test: the theory and
its applications," IEEE Trans. CAD, vol. 19, no. 9, pp. 1052-1064, Sept.
2000.
[104]
S. Wang, "Generation of low power dissipation and high fault coverage
patterns for scan-based BIST," in Proc. ITC, 2002, pp. 834–843.
[105]
S. Wang and S.K. Gupta, "DS-LFSR: A BIST TPG for low switching
activity," IEEE Trans. CAD, vol. 21, no. 7, pp. 842–851, July 2002.
[106]
S. Wang and S. K. Gupta, "LT-RTPG: A new test-per-scan BIST TPG for
low switching activity," IEEE Trans. CAD, vol. 25, no. 8, pp. 1565–1574,
Aug. 2006.
[107]
L.-T. Wang, C.-W. Wu, and X. Wen, VLSI test principles and architectures:
design for testability. San Francisco, CA: Morgan Kaufmann Publishers
Inc., 2006.
[108]
L. Whetsel, "Adapting scan architectures for low power operation," in
Proc. ITC, 2000, pp. 863–872.
[109]
T. W. Williams and K. P. Parker, "Design for testability - a survey," IEEE
Trans. Comput., vol. 31, no. 1, pp. 2-15, January 1982.
[110]
P. Wohl, J. A. Waicukauski, S. Patel, and M. Amin, "X-tolerant
compression and applications of scan-ATPG patterns in a BIST
architecture," in Proc. ITC, 2003, pp. 727–736.
[111]
P. Wohl, J. A. Waicukauski, S. Patel, F. DaSilva, T. W. Williams, and R.
Kapur, "Efficient compression of deterministic patterns into multiple
PRPG seeds," in Proc. ITC, 2005, pp. 916-925.
[112]
H.-J. Wunderlich, "Multiple distributions for biased random test
patterns," IEEE Trans. CAD, vol. 9, no. 6, pp. 584–593, June 1990.
[113]
H.-J. Wunderlich and G. Kiefer, "Bit-flipping BIST," in Proc. ICCAD, 1996,
pp. 337–343.
[114]
J. Ye, Y. Huang, Y. Hu, W.-T. Cheng, R. Guo, L. Lai, T.-P. Tai, X. Li, W.
Changchien, D.-M. Lee, J.-J. Chen, S. C. Eruvathi, K. K. Kumara, C. Liu, and
S. Pan, "Diagnosis and layout aware (DLA) scan chain stitching," in Proc.
ITC, 2013, paper 15.1.
91
[115]
X. Zhang and K. Roy, "Power reduction in test-per-scan BIST," in IEEE
International On-Line Testing Workshop, 2000, pp. 133–138.
[116]
C. Zoellin, H.-J. Wunderlich, N. Maeding, and J. Leenstra, "BIST power
reduction using scan-chain disable in the Cell processor," in Proc. ITC,
2006, paper 32.3.
[117]
Y. Zorian, "A distributed BIST control scheme for compex VLSI devices,"
in Proc. VTS, 1993, pp. 4-9.
92

Download Report

testing digital integrated circuits with novel low power high fault

Paperzz.com

Your Paperzz