Ches 2001

Precise Bounds for
Montgomery Modular Multiplication
and
Some Potentially Insecure RSA Moduli
Colin D. Walter
formerly: www.co.umist.ac.uk (Manchester, UK)
[email protected]
future: www.comodo.net (Bradford, UK)
[email protected]
Motivation
• Modular multiplication is the foundation of most arithmetic-based
cryptography: efficiency and security are important.
• Montgomery modular multiplication is one highly favoured method.
• To avoid full length comparisons or timing attacks, conditional
modular reductions are skipped, but the price is a higher bound,
often 2M for modulus M, and perhaps extra iterations.
• For typical, standard key and word lengths, 2M will overflow into
the next word by just 1 bit.
So an extra word may have to be processed: inefficient.
• Perhaps the overflow bit can be detected and allow a
power analysis attack.
RSA 2002
C.D. Walter, UMIST
2
History
• P. L. Montgomery
Modular multiplication without trial division
Maths of Compn 44 (1985), 519–521
• C. D. Walter
Montgomery Exponentiation Needs No Final
Subtractions
Electronics Letters 35
(1999), 1831–1832
• G. Hachez & J.-J. Quisquater
Montgomery Exponentiation with No Final
Subtractions: improved results
CHES 2000, LNCS 1965, 293 – 301
RSA 2002
C.D. Walter, UMIST
3
Montgomery Modr Multn
{ Pre-condition: 0  A < rn }
P  0 ;
For i  0 to n1 do
Begin
q  (p0+aib0)(-m0-1) mod r ;
P  (P + aiB + qM) div r ;
{ Invariant: 0  P < M+B }
End ;
{ Post-conditions: Prn  A×B mod M ,
ABr–n  P < M + ABr–n }
RSA 2002
C.D. Walter, UMIST
4
Loop Invariants I
Suppose P < M+B at the start of the loop.
At the end of the loop, the new value of P is
(P + aiB + qM) div r < ((M+B)+(r–1)B+(r–1)M)/r = M+B
So the invariant holds.
If B was bounded by 2M, the output would be bounded by 3M.
Either we perform a conditional subtraction
or we perform another iteration to keep input less than 2M.
The former is banned to avoid timing attacks.
If the last ai is small enough, the bound becomes M+B/2 < 2M
and another iteration would be unnecessary.
To achieve that we require ai  r/2 for the top digit:
— unlikely if A  M and M uses all bits of the top word.
RSA 2002
C.D. Walter, UMIST
5
Loop Invariants II
More accuracy is possible. Define:  i  ij10 a j r j i
Then i+1 = (i + ai)/r < 1 by induction.
Suppose Pi is the value of P at the start of the iteration using i.
Then it is easy to establish:
i+1B  Pi+1 < M + i+1B
because
i+1B = (iB + aiB)/r
< (Pi + aiB + qiM)/r
= (Pi + aiB + qiM) div r
= Pi+1
and similarly for the upper bound.
RSA 2002
C.D. Walter, UMIST
6
Post-Condition
At the end of the last iteration:  n   nj 10 a j r j  n  Ar n
So the loop invariant gives:
ABr–n  P < M + ABr–n
• This is the tightest interval possible since its width is only M.
• It improves on the previous upper bound M+B since Ar–n < 1.
• It is much better if A is known to be smaller, e.g. less than M.
RSA 2002
C.D. Walter, UMIST
7
Stability
Under what conditions will a bound on A and B be preserved?
Then output from one MMM can be re-used as input without
adjustment.
Suppose A and B are bounded by (1+)M.
We require M + ABr–n  (1+)M always for such stability, i.e.
M + (1+)2M2r–n  (1+)M
This means
(1+)2Mr–n  
which we can solve for suitable .
It has real solutions exactly when: 4M  rn
RSA 2002
C.D. Walter, UMIST
8
First Results
• The condition 4M  rn for I/O remaining bound
improves on those given by the papers cited earlier.
• When the condition is satisfied we can choose  so that
A and B are bounded by 2M or by ½rn as appropriate.
• Intermediate values of P are bounded above by ¾rn.
• For such M with n digits, there is no extra processing
required to compensate for removing the final subtraction.
• For standard key lengths, we need to take n to be 1 more
than the number of digits in M in order to satisfy the bound.
RSA 2002
C.D. Walter, UMIST
9
Standard Key Lengths
• We have seen the need for increasing n for standard key
lengths. This means one more iteration than the number of
digits in M. It is the cost of deleting the final subtraction.
• How many bits of the corresponding extra digit are
required?
• We know the bound 2M means at most one bit is needed.
Is it necessary? Its occasional existence may provide a
handle for a timing or power analysis attack.
• The frequency of the top bit being non-zero is different for
squares and multiplies. This was reported at RSA 2001.
(This bit is what prompts the final conditional subtraction.)
RSA 2002
C.D. Walter, UMIST
10
The Extra Bit
• The frequency of the top bit becoming set is around
25% – 30% when n has not been increased.
• Increasing n decreases the upper bound M + ABr–n
making it less likely to set the topmost bit,
i.e. the next bit after the top bit of M.
• We need to discover its frequency of being 1
to determine if a difference for squares and multiplies
is measurable. We will see when it is always zero.
• Since n is being increased by 1, we have
¼rn–1 < M < rn–1 and want I/O to be less than rn–1.
RSA 2002
C.D. Walter, UMIST
11
Conditions for no overflow bit
• The condition of interest is
M + ABr–n < rn–1 when A, B < rn–1.
• So we need M such that
M + (rn–1)2r–n < rn–1 i.e. M < rn–1(1–r–1)
• Thus the arguments and output of MMM will have the same
number of words as M unless the top word of M is all 1s.
• Hence, when the final conditional subtraction is omitted from
MMM, there is no “overflow” bit against which a power
analysis attack can be mounted unless the top word of M is
all 1s.
RSA 2002
C.D. Walter, UMIST
12
The Unlikely Event
• The potentially dangerous case is therefore when the top
word of M is r – 1, which is reassuringly uncommon,
and the worst case is M = rn–1.
• By solving our previous quadratic in , the best bound on the
inputs to achieve stability in that worst case is
(1+)M = ½rn(1–(1–4r–1)½) = rn–1 + rn–2 + 2rn–3 + 5rn–4 +...
• With the reasonable assumptions that residues mod M are
uniformly distributed, at most about r–1 of outputs will
exceed rn–1.
• So, for a 16-bit architecture, and limited smartcard life,
the overflow bit is too rare to be of use in power analysis.
• One could safely re-introduce a conditional subtraction here
to avoid the need for extra hardware.
RSA 2002
C.D. Walter, UMIST
13
Exponentiation
• We end by noting that no final subtraction is needed in the
case of MMM exponentiation:
• To compute Te mod M, pre-processing generates Trn mod M
so that subsequent multiplications are all larger than from
standard modular multiplication by a factor of rn mod M.
The output is therefore A = Tern mod M.
• Post-processing removes the extra factor rn by an MMM
multiplication by 1. The output is bounded above by
M + Ar–n where A < 2M < ½rn. So the output is  M.
Of course, equality with M is impossible, since that could
only arise from T = 0 which would result in output 0.
• So no final modular reduction is needed for exponentiation.
RSA 2002
C.D. Walter, UMIST
14
Conclusion
• Precise output bounds have been obtained
for Montgomery Modular Multiplication.
• This gives I/O bounds for MMM
in the context of exponentiation
when the final conditional subtraction is omitted.
• All numbers have the same word size as the
modulus M when 4M  rn and M has n words.
• Otherwise, MMM must perform another iteration,
but overflow bits are then too rare
to be in danger from power analysis attacks.
• No final modular subtraction is required for expn.
RSA 2002
C.D. Walter, UMIST
15