Practical and Secure PCM-Based Main

Practical and Secure PCM-Based Main-Memory System
via Online Attack Detection
Moinuddin K. Qureshi Luis A. Lastras-Montaño Michele M. Franceschini John P. Karidis
IBM T. J. Watson Research Center, Yorktown Heights NY
{moinqureshi, lastrasl, franceschini, karidis}@us.ibm.com
Abstract
We recently proposed Start-Gap as a simple low-overhead
mechanism to do near-perfect wear leveling in phase change
memories. This method was extended to handle a malicious
attack, Repeat Address Attack (RAA), and provided a lifetime of several months under such an attack. A recent study
has argued that an attacker can use Birthday Paradox Attack
(BPA) to cause much earlier failure of the Start-Gap method.
The objective of this report is twofold. We first analyze
the vulnerability of Start-Gap to BPA. We show that tuning
the region size in our solution to handle both BPA and RAA
causes a loss of lifetime of at most by 2x compared to a system that is tuned to handle only RAA.
We then propose a practical framework, that can guarantee year(s) of lifetime under attacks while still incurring negligible (<1%) write overhead for typical applications. It uses
a simple and novel Online Attack Detector (OAD) circuit to
adapt wear leveling algorithm depending on the properties
of the memory reference stream. The OAD circuit requires a
hardware overhead of few tens of bytes and is quite effective
at detecting a large family of attacks.
1 Introduction and Background
Phase Change Memory (PCM) is emerging as a promising technology for building main memory systems. While
PCM has several desirable attributes such as high density
and good scalability, it suffers from the drawback of limited write endurance. Each PCM cell is projected to endure
a maximum of about 107 to 108 writes. While this range of
write endurance may be sufficient for a typical memory system, the actual lifetime is reduced because the write traffic is
non-uniform across memory space, causing some lines to fail
earlier than others. Wear leveling is a commonly used technique that tries to make write traffic uniform, by remapping
heavily written lines to less heavily written lines.
Traditional wear leveling algorithms are table based,
which requires significant storage overhead and indirection
latency (especially given that these tables are made in slower
technologies such as EDRAM or DRAM). In our MICRO’09
paper [3], we proposed Start-Gap wear leveling as a means
to obviate the storage and latency overhead of table-based
methods and still provide near-perfect lifetime. We showed,
using both experimental data and analytical models, that for
typical workloads the lifetime with Start-Gap is approximately 97% of the lifetime under uniform writes. However,
lifetime limited memories are vulnerable to attacks, when an
adversary knows about the wear leveling algorithm and tries
to cause line failures by writing repeatedly to a few lines.
In [3], we specifically analyzed an attack that generates
repeated writes to the same line to cause failure. We call such
an attack as a Repeat Address Attack (RAA). RAA can cause
line failure quickly as the line may not get relocated before
wear-out. To tolerate RAA, we proposed that the memory be
divided into regions, and each such region performs it own
Randomized Start-Gap wear leveling. This scheme, Region
Based Start-Gap (RBSG), has small enough number of lines,
such that a line is guaranteed to move before the write limit
is reached. We showed that under RAA, the lifetime with
RBSG is approximately 3-4 months.
2
Birthday Paradox Attack
A recent study [4] has now suggested that our RBSG
scheme is vulnerable to attacks that are inspired by birthday paradox. Simply stated, the birthday paradox captures
the fact that the number of random trials required to find a
item twice in a collection of items is quite low. For example,
to find a pair of individuals sharing a birthday on average we
require only 23 randomly picked individuals. Similarly, an
attacker can randomly stress lines in the memory, and in a
few thousand trials, the attack is likely to find a line that has
been attacked before, and cause line failure under RBSG. For
example, the number of random trials that are required to
find a line that has already been attacked is approximately
10240 for a memory containing 64M lines. The study in [4]
suggests that such a Birthday Paradox Attack (BPA) would
significantly reduce the lifetime with RBSG to under a few
hours. The study in [4] also points out that it is possible to
handle BPA with RBSG by reducing the region size, but such
Sandbagging of RBSG is impractical and makes the system
significantly vulnerable to RAA.
We first attempt to analyze the conjecture about the impracticality of RBSG to handle both BPA and RAA. In general, we need the number of lines in the region of RBSG to be
as small as possible to handle BPA. For example, if the number of lines is such that in each attack 10% of the lifetime is
eaten away, then BPA will need to hit the line 10 times, which
can be shown to make BPA an extremely ineffective attack.
On the other hand, for RAA, we need the lines in RBSG
to be as large as possible (subject to the constraints that
NumLines < Endurance/W ritesP erGapM ovement).
Thus, there exists a trade-off in lifetime between the two attacks, and the actual lifetime is worse of the two lifetimes.
The region size that maximizes this worst case lifetime is the
optimal point of operation for RBSG under both attacks.
We use a region size such that a given line is guaranteed
to move once every Endurance/4 writes are performed to
the region. Therefore, the line must be attacked four times
in random trials to cause failure under BPA. The next section
analyzes the lifetime under BPA and RAA for such a system.
To handle BPA, the region size is reduced by 2x compared to
nRAA , therefore n = E/(4ψ) = E/256.
Under RAA, the number of writes (WRAA ) it takes to
cause failure is given by:
E2
(3)
256
Under BPA, a line must be attacked K = 4 times, where
each such attempt performs E/4 writes to the line. Although,
an attacker will need to perform more than E/4 writes per
trial to ensure that the attacking line does get E/4 writes, we
will pessimistically assume that each trial only needs E/4
writes. From Equation 2, the number of writes (WBP A ) it
takes for BPA to cause failure of one line is:
WRAA = nE =
3
WBP A ≈ 2 · N 4 ·
(4)
For BPA to take more writes than RAA, we have:
3 Analytical Model for BPA vs. RAA
WBP A > WRAA
(5)
E
E2
>
4
256
(6)
3
This section compares BPA and RAA using a simple analytical model. The expected number of independent random
trials T required to touch a line K times for a memory consisting of N lines is given by [2]:
E
4
2·N4 ·
which implies
N > 224
for E = 225 .
(7)
24
TK
√
≈ K! · Γ(1 + 1/K) · N (1−1/K)
K
as
N → ∞ (1)
For K = 4, the expected number of trials are given by:
T4 ≈ 2 · N
3
4
(2)
We cross-validated our experimental setup and this equation by experimenting for 9 different values of N (doubling
from 1 million to 256 million). We evaluated each data-point
one thousand times and used the average to ensure very high
confidence. We found that the number of trials from the
equation matched very well (within ±2%) with the data obtained experimentally. Therefore, we will use this equation to
derive the lifetime under BPA. In the following we will provide analytical expressions for the analysis as well as number
obtained using reasonable assumptions.
Definitions:
Let N be the number of lines in memory
Let E be the endurance of each line, we assume E = 225
Let ψ be the number of writes per each gap movement,
we assume ψ = 64 (power-of-two ψ simplifies analysis)
Let n be the number of lines in each region of RBSG. To
handle only RAA, n < Eψ, say nRAA = E/128.
3.1
Simplified Model
We will first assume that both attacks (RAA and BPA)
happen at the same speed. The next subsection will incorporate the effects such as difference in speed and spare lines.
So, for a memory with 2 or more lines , BPA attack will
require more writes than RAA to cause failure.
3.2
Difference in Attack Speed
BPA can exploit the excess memory bandwidth to attack
several lines in parallel, so it can cause more writes per unit
time than RAA. Let the writes per unit time for BPA be F
times higher than for RAA. Then for BPA to take more time
than RAA attack, we have:
WBP A
> WRAA
F
3.3
(8)
Effect of Spare Lines
Under RAA the whole region of RBSG fails at a similar time. If the number of spare lines are less than the lines
in RBSG region, then the system will fail. However, under
BPA, line failures happen in a discontinuous fashion. So,
a proper analysis of BPA must take into account the spare
lines, as systems are typically provisioned with few spare
lines. For a system with L spare lines, BPA must cause more
than L + 1 failures to be successful. We used 64K spare lines
in [3]. We experimentally found that with 64K spare lines,
the number of trials required for BPA to succeed increased
by a factor of 19x compared to that with no spare lines. Let,
BPA attack require S times more trials when the effect of
spare lines is incorporated, compared to when there are no
spare lines. Then, for BPA to take more time than RAA:
WBP A · S > WRAA
(9)
NO SPARE LINES
64K SPARES
S=1 F=1
S=19 F=32
S=1 F=16
S=1 F=32
MICRO’09
20
2
21
22
2
2
23
2
24
2
25
26
2
2
27
2
28
2
29
2
30
2
31
2
32
2
NUMBER OF LINES IN MEMORY (N) BEYOND WHICH BPA TAKES MORE TIME TO CAUSE FAILURE THAN RAA
Figure 1. Number of lines in memory beyond which the expected time to attack under BPA is greater
than under RAA (for this analysis region size of RBSG is reduced by 2x compared to nRAA and ψ = 64).
3.4
Tying it All Together
For BPA to take more time than RAA, incorporating both
the effect of difference in speed and effect of spare lines:
S
> WRAA
F
(10)
E S
E2
)·
>
4 F
256
(11)
WBP A ·
3
(2 · N 4 ·
N > 224 · (
F 4
)3
S
for E = 225 .
(12)
Thus, if BPA occurs at a rate 32 times faster (F=32)1 than
RAA, but effect of spare lines is not taken into account, then
the system would need N > 230.67 lines for BPA to take
more time than RAA. But if spare line effect (S=19x) is incorporated, and even if F=32, then N > 225 is sufficient,
which is less than the number of lines used in [3]. As shown
in Figure 1, for our system in [3], BPA does not reduce lifetime more than by a factor of 2x compared to a system that
is tuned to handle only RAA.2
4 Towards A Practical and Secure Approach
Ideally, we would like to have a system with year(s) of
lifetime even under attacks. A typical way to implement secure wear leveling is to use probabilistic randomized swapping in a table based scheme. Such schemes have been extensively studied in the Flash domain. For example, [1] (see
Section 2.4 for theoretical analysis) analyzed a scheme that
performs random swap of memory regions with a small probability p. They showed that for small values of p, the system
can have near perfect lifetime (90%+) even under attacks.
Table based methods incur huge storage overhead. The
scheme proposed in [4] is essentially a table based scheme
that performs wear leveling by swapping a location with another random location in memory. The design uses a hierarchical structure to reduce storage overhead. However, it re1 Note, that with 32 banks, BPA can potentially attack 32 lines in parallel.
However, in reality, each such attack attempt need to write more than E/4
writes to ensure that that the attacking line receives E/4 writes, therefore
the effective value of F will range between 16-32.
2 We also analyzed a system where region size is further reduced by 2x,
such that BPA would need to get the line eight times to cause line failure.
The break even points for such a system are: N > 220 for (S=1,F=1),
N > 225.3 for (S=1,F=32), and N > 222.2 for (S=5,F=32).
quires swapping of large memory regions, which incurs significant hardware complexity. Furthermore, it incurs an overhead of 12.5% extra writes, which can significantly increase
power consumption, have adverse effect on performance of
even typical applications, and reduce 12.5% of memory lifetime even under normal operations. Ideally, we want these
overheads to be less than 1%.
Table-based methods make common case application pay
a high cost (in terms of area, write power, write bandwidth).
A more practical approach to design a robust wear leveling
algorithm is to keep these overheads to minimum for typical applications, and pay a higher overhead for only attacklike applications. Based on this insight, we propose Adaptive Wear Leveling (AWL). Figure 2 shows the architecture of
AWL. AWL consists of an Online Attack Detector (OAD),
that analyzes the memory reference stream to detect attacklike patterns. Based on this information, it increase the frequency at which line moves under attack-like scenarios, thus
providing more robustness and security.
LINE ADDR
WEAR
LEVELING
ALGORITHM
PCM MEMORY
OAD
ONLINE ATTACK DETECTOR
Figure 2. Arch. of Adaptive Wear Leveling
We explain the concept of AWL using Start-Gap. To aid
our discussion, we first define a few terms. Let Line Vulnerability Factor (LVF) be the maximum number of writes to
the memory done between consecutive movement of a given
line. As one line moves every ψ writes, for a memory with
N lines, we have:
LV F = N · ψ
(13)
If the LVF is smaller than the endurance (E), then RAA will
succeed easily. In general, we want the LVF to be much
smaller than E, as it would reduce the amount of lifetime
lost for a line under one round of attack. Let us say that the
desired value of LVF is around E/8. This would mean that ψ
must be around E/8N . Given that E is the range of N, this
would result in a value of ψ much less than one, indicating
several gap movements for each demand write to memory.
However, we would like to have ψ ≥ 100 to limit the write
overhead to less than 1%. To balance these contradictory requirements, we being by first pointing out that ψ does not
have to be a static value; it can be changed dynamically depending on the behavior of the memory write stream.
In general, there are several writes to other locations of
memory, between consecutive writes to a given line. Let Intra Write Distance (IWD) be the number of writes between
two consecutive writes to a given line. We denote IWD simply by the variable d. In general, d is in the order of several
tens of thousand, given that the DRAM cache in our system [3] has storage for several hundred thousand lines. If
there are writes to other lines in memory, in between consecutive writes to a given line, this reduces the Effective LVF
(ELVF) perceived by the line, which is given by:
ELV Fd =
N ·ψ
ψ
=N ·( )
d
d
(14)
If we want to limit the lifetime lost under attack for a given
line to E/8, then we can calculate the desired value of dynamic ψ, which we denote by ψd , as follows:
ELV Fd ≤
N · ψd
E
E·d
E
⇒
≤
⇒ ψd ≤
8
d
8
8N
Table 1. Prob. of hit in most recent “w” writes
w=1024
0.67×10−6
0.01×10−6
0.02×10−6
w=2048
1.45×10−6
0.35×10−6
1.19×10−6
5.1
Anatomy of an Attack
For an attack to successfully cause failure in lifetime limited memories in a short time, it has to write to a few lines
repeatedly and at a sufficiently high write bandwidth. All
the three requirements are important. For example, if the attack simultaneously focuses on several thousand lines, then
the value of d will be in a range where even the default StartGap will move the lines before significant wear-out. The
writes must be done repeatedly for several million times for
each line, otherwise the wear-out on each line will be negligible. And, if the attack happens at very low write bandwidth
then the time for the attack to succeed will increase linearly.
Figure 3 shows canonical form of several attacks.
(15)
In general, d is much larger than 1K for typical applications.
To validate this intuition we conducted an experiment where
the size of the last level cache was set to 32MB 8-way. We
kept track of most recent w writes to memory. The Table
below shows the fraction of writes that hit in a window of
1K, 2K, 4K writes for the three database workloads.
db1
db2
oltp
write access, all the recent 1K addresses are checked, and the
hit count in this window is tracked. If the hit count in the window is greater than a certain threshold then the application is
likely to be an attack. The value of d can be calculated by
the distance from the most recent write address in the list at
which the address hits. However, such a circuit would incur
impractically large area, power, and latency overhead. To
develop a low-cost, practical, yet accurate attack detection
circuit we begin by analyzing some basic attacks.
w=4096
10.2×10−6
13.2×10−6
24.7×10−6
Thus, for typical applications d >> 1K. The likelihood
of hit in a window of 1K writes is less than one in a million. Therefore, we can safely have ψd = 128 for typical
applications. However, under attacks the value of d is much
lower than 1K, which means ψd must be reduced. The key
insight here is that the value of d can clearly separate typical
applications from attacks. If there is a hardware circuit that
can detect attack-like patterns and provide the worst-possible
value of d at runtime then ψd in Start-Gap algorithm can be
regulated to tolerate attacks. The next section describes simple, novel, and effective attack detection circuit.
5 Online Attack Detection
We can create a hardware circuit that keeps track of most
recent 1K write addresses to measure the value of d. On each
a1
a1
(i) RAA (d=1)
a1
a2
a2
...
an
(ii) GRAA (d=n)
...
an
...
b1
b2
...
bn
(iii) BPA (d=n)
R2−Rn are random/benign elements that do not repeat
a1
R2
...
Rn
(iv) SMA (d=n)
Figure 3. Types of attacks (i) RAA (ii) Generalized RAA (iii) BPA (iv) Stealth Mode Attack
Figure 3(i) shows the Repeat Address Attack. It continuously writes to the same line. Therefore, d=1. This attack can
be generalized, where the writes are done to n lines continuously. We call this Generalized RAA (GRAA) with period
of n, as shown in Figure 3(ii). For GRAA, d=n. BPA can be
viewed as a form of GRAA, which changes the working set
after every several million writes. As shown in Figure 3(iii),
for BPA also d=n. The final attack, which we call the Stealth
Mode Attack (SMA), attacks only one line but disguises it
in other (n-1) lines. These lines are chosen randomly and
may not repeat across iterations. For SMA again, d=n, but
the attack is concentrated on only 1 line. Figure 4 shows the
probability of hit in a window of most recent 1K writes for
these attacks and typical applications. There is 3-4 orders
ATTACKS
11111111
00000000
10 10 10
10
0
−1
−2
−3
10
−4
10
−5
11111
00000
10 10
10
−6
−7
Figure 5. Estimation of d by PAD for a window
of (i) 128 writes (ii) 1024 writes
−8
Figure 4. Differentiating between attacks and
typical applications using hit rate
Practical Attack Detection
We are interested in measuring hits in a window of say 1K
writes. This can be approximated by simply having a small
LRU stack of e entries and inserting the address of the incoming write request in the stack with a very small probability p.
For example, a 16-entry LRU stack with p = 1/256 can easily detect hits for frequent writes in a window of 1K entries.
We call this circuit, Practical Attack Detector (PAD). PAD
also contains two 16-bit counters: HitCounter and WriteCounter. Each incoming write address is checked in PAD
and increments the WriteCounter. If there is a hit, the line is
updated to MRU position and the HitCounter is incremented.
If there is a miss, then with probability p the given address is
inserted in the LRU stack at the MRU position. If the WriteCounter reaches its maximum value, the hit rate of the LRU
stack is calculated, and both counters are halved. We estimate the distance (d) as inverse of the hit rate. For example
for SMA that repeats one every 1000 writes, the hit rate will
be 0.1% and the estimated d will be around 1000. The calculated value of d is stored in a register DistReg and this value
is used between periods of distance calculation.
5.3
64
12
8
25
6
51
2
10
24
20
4
40 8
9
81 6
9
16 2
38
32 4
7
65 68
53
6
Actual value of d
Probability of hit in a 1K−entry window
5.2
8
16
32
TYPICAL APPS
window-128
window-1K
4
SMA
65536
32768
16384
8192
4096
2048
1024
512
256
128
64
32
16
8
4
2
1
1
2
GRAA/BPA
PAD estimated d
of magnitude difference between the hit rate from attacklike patterns and patterns from typical workloads. SMA is
the most challenging to detect among the attacks. If SMA
can be detected, then GRAA (hence BPA) can be detected
as well, as they have multiple attack lines thus providing a
higher chance of being detected. Therefore, in this work we
focus on detecting only SMA.
Evaluation of Attack Detector
By varying the number of entries in the LRU stack and insertion probability (p), the PAD circuit can be programmed
to monitor different window sizes. Figure 5 shows the values of d for a range of SMA attacks. Two PAD circuits are
compared. The first tracks a window of 1K writes by having
16 entries and p=1/256. The second tracks a window of 128
writes by having 4 entries and p=1/256. The values reported
by each of the two circuits is shown on the y-axis. Thus,
PAD is quite accurate at estimating the value of d, even for
the worst-to-detect SMA. PAD can detect any attack that interleaves fewer than window-size number of writes between
consecutive writes to a given line.
The detection latency of PAD is quite low. Since decisions
about estimating d are made once every 32K writes, the worst
case detection latency is 32K writes. Given that a line enters
the stack with probability p=1/256, the probability that the
line does not enter the stack in 1K writes is 2−4 , in 4K writes
is 2−16 , and in 32K writes is 2−128 . So, it is almost certain
that the attacking line will be detected within 32K writes.
Note that if endurance is around 32M, then 32K writes represent a tiny 0.1% of lifetime lost before detection. For access
patterns that attack from a distance greater than the window
can be tolerated by the default wear leveling scheme, therefore the increased d reported by PAD is not a problem.
5.4
Salient Features of Attack Detector
PAD has several desirable properties that we expect from
a good attack detector. First, it can detect attacks that repeatedly hit in a window of w writes with a very high accuracy.
Second, it can overlook hits that occur in a window beyond
2w writes. Third, it pardons infrequent write hits in the window because a write must hit few hundred times within a
short period to get detected. Fourth, it has a very low detection latency (less than 32K writes). Fifth, it has a decaying
effect on hit rate that means a concentrated attack is not easily forgotten, and will have impact on measured distance for
a while. Sixth, if multiple attacks occur simultaneously each
with different period, then PAD reports the most severe attack. And finally, PAD has a very low hardware overhead ≈
(16*4)+4=68 bytes for tracking a window of 1K writes, and
≈ (4*4)+4=20 bytes for tracking a window of 128 writes.
6
Adaptive Start-Gap
The key insight in Adaptive Start-Gap (ASG) is that for
typical applications it can incur a negligible overhead of one
extra write every 128 writes. And, for attack-like applications this overhead increases as d reduces. A higher overhead
for applications that frequently write to few (tens) of lines is
tolerable as such applications are not using the 100K+ entry
cache. If performance is important, then such applications
must be rewritten to exploit the cache rather than to expect
better write performance from the memory system.
6.1
Naive Approach
but the number of useful writes performed increases from
50% to 98.4%. Thus, Adaptive Start-Gap increases lifetime
under attacks while providing low overhead for typical applications. The lifetime under BPA-style attack can be made
to be in year(s) as well by simply increasing ψdr and making
minor changes to the write queue policy [3].
A naive implementation of ASG would not split memory
into regions. ASG would then set the gap movement interval
(ψd ) as follows, based on Equation 17:
E·d
)
8N
(16)
Where PAD provides the estimate of d. If N=2E (as in [3]),
then ψd = 1/16 for RAA, as d=1. Thus, ASG will perform
16 writes for each RAA write to ensure that no more than
one-eight of the lifetime is lost before the line gets moved.
6.2
3.5
3.0
2.5
ψdr = M IN (128,
E · dr · r
)
8N
(17)
A useful value for the number of regions is to set it to number
of banks. For r=16 and N=2E, the above equation becomes:
ψdr = M IN (128, dr )
(18)
The gap movement interval is simply equal to d, if d is
less than 128. To estimate d accurately for a window of 128
writes, we use a 4-entry LRU stack with p=1/256 (estimation
probability of this circuit was shown in Figure 5). The write
overhead is now limited to one extra write per demand write
even under RAA. The proposed solution has 16 regions, each
with its own Start and Gap registers, and 20-byte attack detection circuit. Thus the total storage overhead for the proposed solution is: 16*(4+4+20)=448 bytes, which is even
less than the 1.5KB overhead of RBSG [3].
Evaluation
We now evaluate the lifetime of the scalable ASG under GRAA, assuming perfect information for d is available
at runtime. For the purpose of this evaluation we assume
N = 226 , E = 225 , N umBanks = 16, r = 16, and TimeToWriteOneLine=1 µsec. We assume one region per bank.
The manufacturer may rate the memory lifetime under the
scenario that random addresses are written continuously to
separate banks at full speed. This means, a lifetime of 4 years
for each bank3 of memory and the memory would be rated
for four years. Figure 6 shows the rated lifetime and lifetime
under GRAA attacks as the number of lines in GRAA is varied from 1 to 64. The lifetime remains constant at four years,
3 The number of lines per region is 222 , each line can be written 225
times, for a total of 247 writes per bank. If 220 writes per second can be
performed for each bank, with 225 seconds per year, it would take 22 years.
ExtraWrites
UsefulWrites
2.0
1.5
1.0
0.5
0.0
Scalable Approach
A more scalable approach is to split the memory into r
regions. With r regions, the number of lines in each region
Nr reduces by a factor of r. Each region has its own attack
detection circuit that observes the write traffic to that region
and estimates the distance dr for that region. This is used to
set the gap movement interval of the region ψdr , as follows:
6.3
4.0
Lifetime (Years)
ψd = M IN (128,
Rated
GRAA(1) GRAA(2) GRAA(4) GRAA(8) GRAA(16) GRAA(32)GRAA(64)
Figure 6. Lifetime of Adaptive Start-Gap under
GRAA (number of attack lines varied)
7
Summary and Future Directions
We first analyzed the birthday paradox attack (BPA) and
showed that it does not cause more than 2x lifetime loss compared to RAA. We then introduce the concept of adaptive
wear leveling and made the following contributions: (1) We
introduce the notion of attack detection to identify attacklike access patterns (2) We provide a low-cost yet highly accurate attack detection circuit (3)We propose adaptive StartGap that decouples detection from correction. This scheme
can provide year(s) of lifetime under attack while still having
very low write overhead (< 1%) for typical applications and
incurring a storage overhead of few hundred bytes.
We plan to extend this work in several directions: extending attack detection to other wear-leveling algorithms,
analyzing other forms of attacks, supporting page-mode in
a secure fashion, and incorporating write bandwidth utilization in attack detection and associated decisions. The output of the attack detector can also be passed to the OS so
that it can deal with attacking by remapping their pages frequently, or by informing the system administrator. Another
line of research is to protect typical applications from having
attack-like behavior. For example, using a cache with randomized index instead of traditional indexing, which would
also make it much harder for the adversary to launch attacks.
We encourage architects to focus on developing such simple
low-cost practical designs that can handle worst-case attacks,
and yet incur negligible overhead for typical applications.
References
[1] A. Ben-Aroya and S. Toledo. Competitive analysis of flashmemory algortihms. In Proceedings of the 14th Annual European Symposium on Algorithms, pages 100–111, 2006.
[2] M. Klamkin and D. J. Newman. Extensions of the birthday
surprise. Journal of Combinatorial Theory, 1967.
[3] M. Qureshi et al. Enhancing lifetime and security of pcm-based
main memory with start-gap wear leveling. In MICRO’09.
[4] A. Seznec. Towards phase change memory as a secure main
memory. Technical report, INRIA, Nov. 2009.