
M. Mlist, Victoria, Australia
September 2021
Abstract
The SchonhageStrassen algorithm for multiplying large integers
makes use of the Fast Fourier Transform (FFT) algorithm
which performs the Discrete Fourier Transform (DFT).
An interactive demonstration
of these algorithms is given.
1. Introduction
The SchonhageStrassen algorithm [SS71]
for multiplying two large integers, m and n,
converts the integers into vectors of "digits",
applies the Fast Fourier Transform (FFT) algorithm
(which performs the Discrete Fourier Transform (DFT)) to the vectors,
multiplies corresponding elements of the transformed vectors,
applies to inverse FFT algorithm to the result, and
lastly propagates carries to recover the
digits of the product, m×n.
If m and n both have L digits the SchonhageStrassen algorithm has
timecomplexity O(L log(L) log(log(L))).
Table 1 gives a toy
interactive demonstration of these algorithms.
It operates in base ten for ease of checking and
it omits the recursive step in SS(M,N) but the javascript shows where.
2. Multiplication
Longmultiplication (fig. 1)
of multidigit numbers is taught in school.
It involves multiplying pairs of digits,
adding up columns, and
propagating "carries" from right to left
in each row and also in the final total.
1 2 3 4 m x n = 1234 x 5678
x 5 6 7 8
. ^
6 1 7 0  5
7 4 0 4  6
8 6 3 8  7
9 8 7 2  8

7 0 0 6 6 5 2
Figure 1: LongMultiplication.
In the final total, 2=2 carry 0,
8+7+0=5 carry 1,
and so on.
It makes no difference if we process n in reverse.
1 2 3 4 m x n = 1234 x 5678
x 5 6 7 8
.
9 8 7 2  8
8 6 3 8  7
7 4 0 4  6
6 1 7 0  5
 v
7 0 0 6 6 5 2
Figure 2: Flip n.
The working of longmultiplication can be
laid out in another slightly different but equivalent way (fig. 3)
which involves adding up along diagonals rather than columns.
1 2 3 4 m x n = 1234 x 5678
.<
8  9 8 7 2
 .
7  8 6 3 8 .
 . .
6  7 4 0 4 . .
 . . 2
5  6 1 7 0 . 5
 . . . . 6
v . . . 6
. . 0
. 0
7
1234 x 5678 = 7006652
Figure 3: Another Angle on LongMultiplication.
This suggests the convolution of two vectors (sec. 3).
3. Convolutions
The (linear) convolution of two vectors
mm and nn, both of length L,
can be visualised by writing the vectors as in fig. 4,
performing multiplications of elements, mm_{i}×nn_{j},
and adding up the diagonals to give a vector of length 2L−1.
This is just like longmultiplication but without
propagating carries from one diagonal (column) to the next
in the final result.
1 2 3 4 1,2,3,4 conv 5,6,7,8
<
8  8 16 24 32
 .
7  7 14 21 28 .
 . .
6  6 12 18 24 . .
 . . 32
5  5 10 15 20 . 52
 . . . . 61
v . . . 60
. . 34
. 16
5
5,16,34,60,61,52,32 >carries> 7,0,0,6,6,5,2
Figure 4: Linear Convolution
If mm and nn are the digits of integers m and n respectively,
the elements of the convolution are larger than the digits of m and n
but if "carries" are then propagated along the convolution,
e.g., 32=2 carry 3, 52+3=55=5 carry 5, and so on,
then we recover the digits of their product.
(The product may be of length 2L−1 or 2L.)
The convolution theorem states that
conv(u,v) = F^{−1}(F(u).F(v))
where F is the Discrete Fourier Transform,
F^{−1} is the inverse transform, and
'.' is elementwise multiplication,
(F(u))_{i}×(F(v))_{i}.
If a vector v has n elements, the Fast Fourier Algorithm (FFT) (see below)
computes the Discrete Fourier Transform of v in O(n log n) steps.
4. DFT and FFT
An integer, e.g.,
n=d_{3}d_{2}d_{1}d_{0}
can be thought of as a polynomial, e.g.,
f(x)=d_{3}x^{3}+d_{2}x^{2}+d_{1}x+d_{0}.
A polynomial of degree L−1 is also uniquely defined by
its values at L, or more, distinct points and, taken together,
these values form an alternative representation of the polynomial;
it is quick to multiply two polynomials represented in this way
on a common set of points.
The product of two polynomials each of degree L−1
is of degree 2(L−1) and is uniquely defined
by values at 2L−1 points (so 2L points more than suffice).
The
Karatsuba (O(d^{log2(3)}) time) and
Toom3 (O(d^{log3(5)}) time)
algorithms for integer multiplication
can be constructed by considering the values of
the polynomials derived from integers at certain values –
at −1, 0 and +1 for Karatsuba and
at 0, +1, −1, −2 and "∞" for Toom3 [All21].
In comparison,
a Discrete Fourier Transform is made of the values of a polynomial at
powers of a primitive^{†} L^{th} root of unity
where L is a power of two
(if a given L is not a power of two it is rounded up to a power,
e.g., 5→8, and the integer/polynomial padded with leading zeroes).
The Fast Fourier Transform (FFT) algorithm [CT65]
performs the Discrete Fourier Transform (DFT) of a vector, quickly.
 E.g., FFT, n=4 . . .

n=d_{3},d_{2},d_{1},d_{0}; or
f(x)=d_{3}x^{3}+d_{2}x^{2}+d_{1}x+d_{0}

divide 
n_{e}=d_{2}, d_{0} 
n_{o}=d_{3}, d_{1} 
divide 
n_{e,e}=d_{0} 
n_{e,o}=d_{2} 
n_{o,e}=d_{1} 
n_{o,o}=d_{3} 
base cases 
y_{e,0}=d_{0}+d_{2},
y_{e,1}=d_{0}−d_{2} 
y_{o,0}=d_{1}+d_{3},
y_{o,1}=d_{1}−d_{3} 
conquer 
y_{0}=y_{e,0}+y_{o,0}=d_{0}+d_{2}+d_{1}+d_{3}=f(1);
y_{1}=y_{e,1}+ry_{o,1}=d_{0}−d_{2}+r(d_{1}−d_{3})=f(r);
y_{2}=y_{e,0}−y_{o,0}=d_{0}+d_{2}−(d_{1}+d_{3})=f(r^2);
y_{3}=y_{e,1}−ry_{o,1}=d_{0}−d_{2}−r(d_{1}−d_{3})=f(r^3)
 conquer 
 where n_{e} is evens, n_{o} is odds and,
in this case, r is a primitive (4th) root of unity,
r^{2}=−1, r^{3}=−r, r^{4}=1.
 A remarkable thing is that, with a small modification,
the same process gives its inverse
{y_{i}}→{d_{i}}.
Check . . .

x_{0}=y_{0}+y_{2}+y_{1}+y_{3}=4d_{0}
x_{1}=y_{0}−y_{2}+r^{1}(y_{1}−y_{3})=4d_{1}
x_{2}=y_{0}+y_{2}−(y_{1}+y_{3})=4d_{2}
x_{3}=y_{0}−y_{2}−r^{1}(y_{1}−y_{3})=4d_{3}
 so instead of r use r^{1} (=r^{3} in this case),
and divide the answer by two at each "conquer" step.
 The example above
where n=4 generalises to 'n' being other powers of 2.
The demonstration below calculates
the linear, cyclic, and negativecyclic convolutions of m and n;
these take quadratic time and are for comparison purposes.
It also interpolates the product polynomial
which involves continuous values and thus may show rounding errors.
It lastly runs a simplified(!) S and S algorithm.
Note, if m or n have more than 8 digits results may be
too big for a javascript "int".
5. Rings
In signal processing the FFT algorithm is
performed on complex numbers C
with r being a primitive root of unity in C but
it can be performed over rings other than C.
All that FFT requires is that there are
primitive 2^{?th} roots of unity.
 Ring: + commutative & associative;
× associative;
∃ 0 s.t. ∀ a, a+0=0+a=a;
∀ a, ∃ (−a) s.t. a+(−a)=0;
∃ 1 s.t. ∀ a, a×1=1×a=a;
× distributes over +.
(× can, but need not, be commutative.
We can write a−b for a+(−b).)
E.g., ∀ k>1, Z_{k} is a ring.
(If k happens to be a prime number then Z_{k} is
a field in which case there are no divisors of zero and
every nonzero element has a multiplicative inverse.)
 Note, (k−1)^{2}=k(k−2)+1, so
(k−1)^{2}=1 mod k, i.e.,
k−1 (=−1) is a square root of unity if k>2.
 Z_{5}:
2^{2}=4, 3^{2}=4, 4^{2}=1.
 Note, 2 is a primitive (4th) root of unity
2→2^{2}=4→2^{3}=3→2^{4}=1.
(So is 3: 3→4→2→1,
which gives the same cycle of values in reverse.)
 Z_{9}:
2^{2}=4, 3^{2}=9, 4^{2}=7,
5^{2}=7, 6^{2}=0, 7^{2}=4, 8^{2}=1.
 Note, 2 is a primitive (sixth) root of unity,
2→4→8→7→5→1.
 Z_{17} (17=2^{4}+1),
 squares: 0, 1, 4, 9, 16,
5^{2}=8, 6^{2}=2, 7^{2}=15, 8^{2}=13,
9^{2}=13, 10^{2}=15, 11^{2}=2, 12^{2}=8,
13^{2}=16, 14^{2}=9, 15^{2}=4, 16^{2}=1.
 Note, 4 is a primitive (4th) root of unity, 4→16→13→1 and
2 is a primitive (8th) root of unity,
2→4→8→16→15→13→9→1.
 2 is a 2k^{th} root of
unity in Z_{2k+1}, as
2^{k}=(2^{k}+1)−1=−1, so
2^{2k}=(2^{k})^{2}=1,
all mod 2^{k}+1.
 In Z_{2k+1},
r=2, r^{k}=2^{k}=−1,
and ∀ j, r^{j}=−r^{k+j}.
For use in integer multiplication, the ring Z_{k}
means there is no rounding error (as would be the case with complex numbers).
The only question is, how big should the
modulus k=2^{?}+1 be?
It should be large enough that
multiplying digits and then summing a diagonal (fig. 4)
should accomodate distinct values from zero to
at least (base−1)^{2}×L.
5.1 Mod base^{p}+1
An interesting thing
 123456 mod 11_{10}
= −1+2−3+4−5+6 mod 11 = 3
 123456 mod 101_{10} = 12−34+56
= 34 mod 101,
(123456=1222×101+34)
 123456 mod 1001_{10} = −123+456 mod 1001 = 333
 consider the base ten numeral
d_{5}d_{4}d_{3}d_{2}d_{1}d_{0}
and the operation 'mod 101_{10}' (mod one hundred and one).
 d_{1}d_{0} mod 101
= d_{1}d_{0}
 d_{3}d_{2}×100 mod 101
= (d_{3}d_{2}×101
− d_{3}d_{2}) mod 101
= −d_{3}d_{2}
 d_{5}d_{4}×10000 mod 101
= (d_{5}d_{4}×100×100) mod 101
= (d_{5}d_{4}×100×101
− d_{5}d_{4}×100) mod 101
= −d_{5}d_{4}×100 mod 101
= +d_{5}d_{4}
 i.e.,
d_{5}d_{4}d_{3}d_{2}d_{1}d_{0}
mod 101
= d_{5}d_{4}−d_{3}d_{2}+d_{1}d_{0} mod 101.
 (Note, 4321 mod 101
= (−43+21) mod 101
= −22 mod 101
= 79.)
This relates to the negacyclic convolution
(and suggests another way of achieving 'mod 2^{k}+1').
Corresponding results hold for other bases including
2^{k} and the operation 'mod (2^{k}+1)' where,
instead of digit groupings and digitshifts,
bit groupings and bitshifts are used
– useful in Z_{2k+1}.
This avoids the use of division to implement mod.
5.2 Mod base^{p}−1
In a similar vein to the previous section
 123456 mod 999 = 123+456 = 579
 123456 mod 99 = 12+34+56 mod 99 = 102 mod 99 = 3
 123456 mod 9 = 1+2+3+4+5+6 mod 9 = 21 mod 9 = 3
This relates to the cyclic convolution.
5.3 the "base"
For ease of checking,
the demonstration (table 1) divides m and n into decimal digits;
you could say that it uses base ten.
It could as well use base 100 or base 2^{k}.
The best choice is approximately √L.
6. Conclusion
The SchonhageStrassen algorithm [SS71]
(O(L log(L) log(log(L)))time for Ldigit integers)
was the fastest multiplication algorithm for large integers for
almost four decades until Furer [Fu07] gave an algorithm that is
asymptotically faster for enormous integers.
Harvey and Van Der Hoeven [HV19] finally removed the
log log L factor, as Schonhage and Strassen had
speculated would be possible.
References
 [All21] L. Allison, 'Arithmetic on bigInts', Aug., 2021
[www]←click.
 Discusses and demonstrates +, − × and ÷
on large integers, on class bigInt.
 [CT65] J. W. Cooley and J. W. Tukey,
'An algorithm for the machine calculation of complex Fourier series',
Math. Comp., 19, 297301, 1965
[doi:10.1090/S00255718196501785861].
 Also see the
FFT@[wikip]['21];
another source is [Co09].
 [Co09] T. Cormen, C. Leiserson, R. Rivest, C. Stein,
'Introduction to Algorithms', MIT Press, 3rd ed., 2009.
 Discusses the FFT and briefly sketches the SchonhageStrassen algorithm.
 [Fu07] M. Furer,
'Faster integer multiplication', STOC 39, pp.5766, 2007
[doi:10.1145/1250790.1250800]
or
[www]['21].
 "... running in time
n log(n) 2^{(log*(n))} ..."
(Note here, "log^{*} (log star) is the number of times
the logarithm function must be iteratively applied before the
result is less than or equal to 1. ..." – wikip.
And,
"... 2^{(log*(n))} is better than log(log(n))
for n > 2^{264} ..."
– wikip.
So little difference for practical values.)
Also see [HV21].
 [HV19] D. Harvey and J. Van Der Hoeven,
'Integer multiplication in time O(n log n)',
HALarchive, 2019
[www]['21].
 This is conjectured to be as fast as is possible.
Will it be proven so?
 [SS71] A. Schonhage and V. Strassen,
'Schnelle Multiplikation grosser Zahlen',
Computing, 7, pp.281292, 1971,
[doi:10.1007/BF02242355].
 Also see
SS@[wikip]['21].
^{†} r is a primitive k^{th} root of unity iff
the smallest j s.t. r^{j} = 1 is j = k.

