Harnessing Big Data to Simplify Debugging

Harnessing Big Data
to Simplify Debugging
Asi Lifshitz, CTO
www.thevtool.com
May9,9,2016
2016
May
1
Agenda
•
•
•
•
•
Introduction
What is Big Data, Anyway?
Simulation Log Files
Graphical Representation of a Log File
Summary
May 9, 2016
2
RTL Debugging
• Verification is one of the major bottlenecks towards
tape-out
• Debugging failing
tests is complex
and time-consuming
Source: Wilson Research & Mentor Graphics, 2014
May 9, 2016
3
Debugging Today
• Iterating between the waveforms and the simulation
log file
• Simulation log files can reach several GB
May 9, 2016
4
Debugging Tomorrow
• Big Data tools will quickly and efficiently extract data
from huge log files
• Extracting and manipulating data gets simpler
• Data can be presented in a graphical way
• Shortening the debug time will
shorten the project schedule
and increase the
engineer’s productivity
May 9, 2016
5
Agenda
•
•
•
•
•
Introduction
What is Big Data, Anyway?
Simulation Log Files
Graphical Representation of a Log File
Summary
May 9, 2016
6
Big Data
• Big data is a term for data sets that are so large or
complex that traditional data processing applications
are inadequate
• The term often refers simply to the usage of
advanced methods for extracting value from data,
and seldom to a particular size of data set
May 9, 2016
7
Big Data – Cont.
• For some organizations, facing few gigabytes of data
for the first time may trigger a need to reconsider
data management options
• For others, it may take tens or hundreds of terabytes
before data size becomes a significant consideration
May 9, 2016
8
Database
• A database is an organized collection of data
• The data is typically organized in a way that supports
processes that require information
• A database management system (DBMS) is a
computer software application that interacts with
the user, other applications, and the database itself
to capture and analyze data
May 9, 2016
9
Database for Log Files
• Database can be used to query a specific record, i.e.,
a specific message
• However, if some computation is required a
database search engine is to be used
– A concrete example which goes beyond the capabilities of
a database, is when the DV engineer would like to see all
messages from time point tp1 to time point tp2
May 9, 2016
10
Database Search Engine
• A search engine allows the user to search for
information using simple keywords
May 9, 2016
11
• A free and open-source database search engine,
originally written in Java
• Has been ported to Delphi, Perl, C#, C++, Python,
Ruby, and PHP
• Suitable for any application that requires full text
indexing and searching capability
• The core of its logical architecture is the idea of a
document containing fields of text
May 9, 2016
12
Agenda
•
•
•
•
•
Introduction
What is Big Data, Anyway?
Simulation Log Files
Graphical Representation of a Log File
Summary
May 9, 2016
13
Lucene for Verification
• A simulation log file is a structured textual file, and as
such it can be indexed
• Once indexed, Lucene API can be used to search for
all the ”interesting” events that are needed for
debugging a failing test
May 9, 2016
14
UVM
• The Universal Verification Methodology (UVM) is a
standardized methodology for verifying integrated
circuit designs
• More than 70% of the
industry have adopted
UVM, and the numbers
will only grow with time
Source: Wilson Research & Mentor Graphics, 2014
May 9, 2016
15
UVM Messages
• UVM-based simulation contains UVM messages that
usually have the following format:
Verbosity
Filename(line)
Timepoint
Emitter
Message
May 9, 2016
16
UVM Message Example
•
UVM_ERROR /project/sflash/verification/SFLASH_controller_ENV/src/sflash_controller_env_sb.sv(1863) @
4498000: uvm_test_top.env.sb [WRITE_MODE_SPI_DATA_ERR] Sent data packet contains 0x532e4000, but
expected 0x532e4cb3
• UVM ERROR is the verbosity (or severity)
• /project/sflash/verification/SFLASH_controller_ENV/src/sflash
_controller_env_sb.sv(1863) is the filename(line)
• @ 4498000 is the time point
• uvm_test_top.env.sb is the emitter of the message
• [WRITE_MODE_SPI_DATA_ERR] Sent data packet contains
0x532e4000, but expected 0x532e4cb3 is the message
May 9, 2016
17
Using Lucene for UVM Messages
• Parse the log file, so that every message will be
broken to the aforementioned 5 elements and stored
as records in Lucene database
• The user can now use the efficient API of Lucene to
extract information
May 9, 2016
18
Extracting UVM Records
• Being designed to handle huge records, Lucene
returns these records in a negligible time
– Receive all messages of a specific verbosity, or specific
verbosity within some time range
– Messages containing a specific string
– All messages emitted from the APB UVC writing 0X1 to
register sflash_reg.enable
May 9, 2016
19
Agenda
•
•
•
•
•
Introduction
What is Big Data, Anyway?
Simulation Log Files
Graphical Representation of a Log File
Summary
May 9, 2016
20
Why Graphical Representation?
• It is extremely hard to navigate through the log file,
while seeking for the necessary information, without
being overwhelmed or miss important information
• Graphical representation of
data is more natural and is
much easier for analysis
May 9, 2016
21
Graphical Representation
of a Log File
May 9, 2016
22
Graphical Debugging
• The transition from debugging a textual file to a
graphical representation is intuitive
• Problems are traced much faster.
The engineer can quickly see what is wrong, when
the pattern changes, or when some unexpected
event has occurred
May 9, 2016
23
Agenda
•
•
•
•
•
Introduction
What is Big Data, Anyway?
Simulation Log Files
Graphical Representation of a Log File
Summary
May 9, 2016
24
Summary
• The complexity and size of designs these days require
new techniques, as the traditional ones impose very
long debugging time
• Harnessing tools that are used for processing Big
Data can simplify and shorten the debug time of
failing tests
• We hope that this work will encourage more
researches on importing these strong capabilities to
the existing and new EDA tools
May 9, 2016
25
Thank You
May 9, 2016
26
26