Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Some publications
This book collects in the same document all state-of-the-art
algorithms in multiple precision arithmetic (integers, integers modulo n,
floating-point numbers). The best current reference on that topic is
volume 2 from Knuth's The art of computer programming,
which misses some new important algorithms (divide and conquer division,
other variants of FFT multiplication, floating-point algorithms, ...)
Our aim is to give detailed algorithms:
for all operations (not just multiplication as many text books),
for all size ranges (not just schoolbook methods or FFT-based methods),
and including all details (for example how to properly deal with carries
for integer algorithms, or a rigorous analysis of roundoff errors for
floating-point algorithms).
The book would be useful for graduate students in computer science and
mathematics (perhaps too specialized for most undergraduates, at least
in its present state), researchers in discrete mathematics, computer
algebra, number theory, cryptography, and developers of multiple-precision
libraries.
This book is a free version of the book of the same name published by Masson in
1995. The examples use an obsolete version of Maple (V.3), but most of the text
still applies to Maple and other modern computer algebra systems.
We give simple and efficient methods to compute and/or estimate the
predecessor and successor of a floating-point number using only floating-point
operations in rounding to nearest. This may be used to simulate interval
operations, in which case the
quality in terms of the diameter of the result is
significantly improved compared to existing approaches.
We searched for the worst cases for correct rounding of the exponential
function in the IEEE 754r decimal64 format, and computed all the bad cases
whose distance from a breakpoint (for all rounding modes) is less than
10-15 ulp, and we give the worst ones. In particular, the worst case
for |x| ≥ 3 * 10-11 is exp(9.407822313572878 * 10-2)
= 1.09864568206633850000000000000000278... This work can be
extended to other elementary functions in the decimal64 format and allows
the design of reasonably fast routines that will evaluate these functions
with correct rounding, at least in some domains.
[Complete lists of worst cases for the exponential are available
for the IEEE 754r
decimal32 and
decimal64
formats.]
We exhibit ten new primitive
trinomials over GF(2)
of record degrees 24036583, 25964951, 30402457, and 32582657.
This completes the search for the currently known Mersenne prime exponents.
We describe the implementation of the reciprocal square root --- also called
inverse square root --- as a native function
in the MPFR library. The difficulty is to implement
Newton's iteration for the reciprocal square root on top's of
GNU MP'smpn layer, while guaranteeing a rigorous 1/2 ulp bound on the
roundoff error.
Let Sn denote the symmetric group
with n letters, and g(n) the maximal order of an element
of Sn. If the standard factorization of M into primes
is M=q1a1
q2a2 ...
qkak, we define l(M)
to be q1a1 +
q2a2 + ... +
qkak;
one century ago, E. Landau proved that g(n)=maxl(M) ≤ n M
and that,
when n goes to infinity, log g(n) ~ sqrt(n log(n)).
There exists a basic algorithm to compute g(n) for 1 ≤ n
≤ N; its running time is
O(N3/2 / sqrt(log N))
and the needed memory is O(N);
it allows computing g(n) up to, say, one million. We describe an
algorithm to calculate g(n) for n up to 1015.
The main idea is to use
the so-called l-superchampion numbers. Similar numbers, the
superior highly composite numbers, were introduced by S. Ramanujan to
study large values of the divisor function tau(n)=sumd divides n 1.
Faster Multiplication in
GF(2)[x], with Richard P. Brent, Pierrick Gaudry and Emmanuel Thomé,
Proceedings of the
Eighth Algorithmic Number Theory
Symposium (ANTS-VIII), May 17-22, 2008, Banff Centre, Banff, Alberta
(Canada), A. J. van der Poorten and A. Stein, editors, pages 153--166,
LNCS 5011, 2008.
A preliminary version appeared as INRIA Research Report, November 2007.
In this paper, we discuss an implementation of various algorithms for
multiplying polynomials in GF(2)[x]: variants of the window methods,
Karatsuba's, Toom-Cook's, Schönhage's and Cantor's algorithms. For most
of them, we propose improvements that lead to practical speedups.
We give a new algorithm for performing the distinct-degree factorization of
a polynomial P(x) over GF(2), using a multi-level
blocking strategy. The coarsest level of blocking replaces GCD computations
by multiplications, as suggested by Pollard (1975), von zur Gathen and Shoup
(1992), and others.
The novelty of our approach is that
a finer level of blocking replaces multiplications by squarings,
which speeds up the computation
in GF(2)[x]/P(x) of certain interval polynomials
when P(x) is sparse.
As an application we give a fast
algorithm to search for all irreducible trinomials
xr + xs + 1 of degree
r over GF(2), while producing a certificate that can be checked
in less time than the full search.
Naive algorithms cost O(r2) per trinomial, thus O(r3) to search
over all trinomials of given degree r.
Under a plausible assumption about the
distribution of factors of trinomials, the new algorithm has
complexity O(r2 (log r)3/2 (log log r)1/2)
for the search over all trinomials of degree r.
Our implementation achieves a speedup of greater than a factor
of 560 over the naive algorithm
in the case r = 24036583 (a Mersenne exponent).
Using our program, we have found two new primitive trinomials
of degree 24036583 over GF(2) (the previous record degree was 6972593).
Schönhage-Strassen's algorithm is one of the best known algorithms for
multiplying large integers. Implementing it efficiently is of utmost
importance, since
many other algorithms rely on it as a subroutine. We present here an improved
implementation, based on the one distributed within the GMP library.
The following ideas and
techniques were used or tried: faster arithmetic modulo 2n+1,
improved cache
locality, Mersenne transforms, Chinese Remainder Reconstruction, the
sqrt(2) trick, Harley's and Granlund's tricks, improved tuning. We
also discuss some ideas we plan to try in the future.
The currently best known algorithms for the numerical evaluation of
hypergeometric constants such as zeta(3) to d decimal digits have time
complexity O(M(d) log2 d) and space complexity of O(d log d) or
O(d). Following work from Cheng, Gergel, Kim and Zima, we present a new
algorithm with the same asymptotic complexity, but more efficient in practice.
Our implementation of this algorithm improves over existing programs
for the computation of Pi, and we announce a new record of 2 billion digits
for zeta(3).
One considers the problem of finding hard to round cases of a periodic
function for large floating-point inputs, more precisely when the function
cannot be efficiently approximated by a polynomial.
This is one of the last few issues that prevents from guaranteeing an
efficient computation of correctly rounded transcendentals for the whole
IEEE-754 double precision format.
The first non-naive algorithm for that problem is presented, with an heuristic
complexity of O(20.676 p) for a precision of p bits.
The efficiency of the algorithm is shown on the largest IEEE-754 double
precision binade for the sine function, and some corresponding bad cases are
given.
We can hope that all the worst cases of the trigonometric functions
in their whole domain will be found within a few years, a task that
was considered out of reach until now.
Until version 4.2.1, GNU MP
(GMP for short) division has complexity
O(M(n) log(n)), which is not asymptotically optimal.
We propose here some division algorithms that achieve O(M(n)) with small
constants.
Code is available too: invert.c computes an approximate
inverse within 2 ulps in 3M(n).
Errors Bounds on Complex Floating-Point Multiplication, with Richard Brent
and Colin Percival,
Mathematics
of Computation volume 76 (2007), pages 1469-1481.
Some technical details are given
in
INRIA Research Report 6068, December 2006.
Given floating-point arithmetic with t-digit base-β significands
in which all arithmetic operations are performed as if calculated to infinite
precision and rounded to a nearest representable value, we prove that
the product of complex values z0 and z1
can be computed with
maximum absolute error |z0| |z1| (1/2)
β1-t
sqrt(5). In particular, this provides relative error bounds of 2-24
sqrt(5) and 2-53 sqrt(5) for IEEE 754 single and double precision
arithmetic respectively, provided that overflow, underflow, and
denormals do not occur.
We also provide the numerical worst cases for IEEE 754 single and double
precision arithmetic.
The Elliptic Curve Method for integer factorization (ECM) was invented by
H. W. Lenstra, Jr., in 1985 [Lenstra87].
In the past 20 years, many improvements of ECM were proposed
on the mathematical, algorithmic, and implementation sides.
This paper summarizes the current state-of-the-art,
as implemented in the GMP-ECM software.
Erratum: on page 541 we write ``Computer experiments indicate that these curves
have, on average, 3.49 powers of 2 and 0.78 powers of 3, while Suyama's family
has 3.46 powers of 2 and 1.45 powers of 3''. As noticed by Romain Cosset,
those experiments done by the
first author are wrong, since he ran 1000 random curves with the same
prime input p=1010+19, and the results differ according on the
congruence of p mod powers of 2 and 3. With 10000 random curves on the 10000
primes just above 1020, we get an average of
23.34*31.68 = 63.9 for Suyama's family, and
23.36*30.67 = 21.5 for curves of the form
(16d + 18) y2 = x3 + (4d + 2)x2 + x.
This paper presents a multiple-precision binary
floating-point library,
written in the ISO C language, and based on the GNU MP library.
Its particularity is to extend ideas from the IEEE-754 standard
to arbitrary precision, by providing correct rounding
and exceptions.
We demonstrate how these strong semantics are achieved ---
with no significant slowdown with respect to other tools ---
and discuss a few applications where such a library can be useful.
This short note shows the nasty effects of patents for the development of free
software, even for patents that were not written with software applications
in mind.
Obtenir un seul résultat pour un calcul donné : à
première vue, cela semble une évidence ; c'est en fait un vaste
sujet de recherche auquel les chercheurs apportent petit à petit leurs
contributions. Une nouvelle étape est franchie aujourd'hui grâce
à MPFR, une bibliothèque de
calcul multi-précision sur les nombres flottants.
A naive digital plane is a subset of
points (x,y,z) in Z3 verifying
h ≤ ax+by+cz < h+max{ | a|,|b|, |c| } where
(a,b,c,h) in Z4. Given a finite unstructured subset
of Z3, determine whether there
exists a naive digital plane containing it is called
digital plane recognition. This question is
rather classical in the field of digital geometry (also called
discrete geometry). We suggest in this paper a new algorithm to
solve it. Its asymptotic complexity
is bounded by O(n7) but its behavior seems to be linear in
practice. It uses an original strategy of optimization in a set of
triangular facets (triangles). The code is short and elementary
(less than 300 lines) and available on
http://www.loria.fr/~debled/plane and here.
We propose a new algorithm to find worst cases for the correct rounding of a
mathematical function of one variable. We first reduce this problem to the
real small value problem--i.e., for polynomials with real coefficients. Then,
we show that this second problem can be solved efficiently by extending
Coppersmith's work on the integer small value problem--for polynomials with
integer coefficients--using lattice reduction. For floating-point numbers with
a mantissa less than N and a polynomial approximation of degree d, our
algorithm finds all worst cases at distance less than N-d^2/(2d+1)
from a machine number in time O(N(d+1)/(2d+1)+epsilon). For d=2, a
detailed study improves on the O(N2/3+epsilon) complexity from
Lefèvre's algorithm to O(N4/7+epsilon). For larger d, our
algorithm can be used to check that there exist no worst cases at distance
less than N-k in time O(N1/2+epsilon).
Gal's Accurate Tables Method Revisited,
with Damien Stehlé, INRIA Research Report RR-5359, October 2004.
An improved version
appeared in the Proceedings of
Arith'17.
Those ideas are demonstrated by an implementation of the
exp2 function
in double precision.
Erratum in the final version of the paper: in Section 4, the
simultaneous worst case for sin and cos is t0=1f09c0c6cde5e3
and not t0=31a93fddd45e3.
See also
my coauthor page.
Gal's accurate tables algorithm aims at providing an efficient implementation of
elementary functions with correct rounding as often as possible. This method
requires an expensive pre-computation of a table made of the values taken by the
function or by several related functions at some distinguished points. Our
improvements of Gal's method are two-fold: on the one hand we describe what is
the arguably best set of distinguished values and how it improves the efficiency
and correctness of the implementation of the function, and on the other hand we
give an algorithm which drastically decreases the cost of the pre-computation.
These improvements are related to the worst cases for the correct rounding of
mathematical functions and to the algorithms for finding them. We show that the
whole method can be turned into practice by giving complete tables for
2x and sin(x) for x in [1/2,1[, in double precision.
On March 10, 2004, Dan Bernstein announced a revised draft of his paper
Removing redundancy in high-precision Newton iteration,
with algorithms that compute a reciprocal
of order n over C[[x]] 1.5+o(1) times longer than a product;
a quotient or logarithm 2.16666...+o(1) times longer;
a square root 1.83333...+o(1) times longer;
an exponential 2.83333...+o(1) times longer.
We give better algorithms.
Note added on March 24, 2004: the 1.5+o(1) reciprocal algorithm was already
published by Schönhage (Information Processing Letters 74, 2000, p. 41-46)
Note added on July 24, 2006: in a preprint
Newton's method and FFT trading, Joris van der Hoeven gives
better constants for the exponential (2.333...) and the quotient (1.666...).
Note added on April 20, 2009: as noticed by David Harvey, in Section 3,
in the Divide algorithm, Step 4 should read
q <- q0 + g0 (h1 - ε) xn, where
h = h0 + xn h1.
Indeed, after Step 3 we have q0 f = h0 +
ε xn + O(x2n),
i.e., q0 = h0/f + ε/f xn + O(x2n).
Thus h/f = q0 + (h1 - ε)/f xn
+ O(x2n) = q0 + g0 (h1 - ε)
xn + O(x2n).
Arithmétique
flottante,
with Vincent Lefèvre, INRIA Research Report RR-5105,
February 2004 (in french).
Ce document rassemble des notes d'un cours donné en 2003 dans
la filière Algorithmique Numérique et
Symbolique du DEA d'Informatique de l'Université Henri Poincaré
Nancy 1.
Ces notes sont basées en grande partie sur le livre
Elementary Functions. Algorithms and Implementation de Jean-Michel
Muller.
Aussi disponible sur LibreCours.
A new proof of the ``accurate summation'' algorithm proposed by Demmel and
Hida is presented.
The main part of that proof
has been written in the Coq language and verified by the Coq proof assistant.
We present new algorithms for the inverse, division, and
square root of power series.
The key trick is a new algorithm --- MiddleProduct or, for
short, MP --- computing the n middle coefficients of a
(2n-1) * n full product in the same number of multiplications
as a full n * n product.
This improves previous work of Brent, Mulders, Karp and Markstein,
Burnikel and Ziegler.
These results apply both to series and polynomials.
Some aspects of what a standard for the implementation of the elementary
functions could be are presented. Firstly, the need for such a standard is
motivated. Then the proposed standard is given. The question of roundings
constitutes an important part of this paper: three levels are proposed,
ranging from a level relatively easy to attain (with fixed maximal relative
error) up to the best quality one, with correct rounding on the whole range
of every function.
We do not claim that we always suggest the right choices, or that we have
thought about all relevant issues. The mere goal of this paper is to raise
questions and to launch the discussion towards a standard.
A long note on Mulders'
short product, with Guillaume Hanrot,
November 2002. A revised version appeared in the
Journal of Symbolic Computation, volume 37, pages 391-401, 2004.
The short product of two power series is
the meaningful part of the product of these objects,
i.e., sum(a[i] b[j] xi+j, i+j < n). In [Mulders00], Mulders gives an
algorithm to compute a short product faster than the full product in the
case of Karatsuba's multiplication [KaOf62].
This algorithm works by selecting a
cutoff point k and performing a full k x k product and two
(n-k) x (n-k) short products recursively. Mulders also gives a
heuristically optimal cutoff point beta n. In this paper, we determine
the optimal cutoff point in Mulders' algorithm. We also give a slightly
more general description of Mulders' method.
The binary algorithm is a variant of the Euclidean algorithm that
performs well in practice. We present a quasi-linear time
recursive algorithm that computes the greatest common divisor of
two integers by simulating a slightly modified version of the
binary algorithm. The structure of the
recursive algorithm is very close to the one of the well-known
Knuth-Schönhage fast gcd algorithm, but the description and the
proof of correctness are significantly simpler in our case. This
leads to a simplification of the implementation and to better
running times.
Consider polynomials over GF(2).
We describe efficient
algorithms for finding trinomials with
large irreducible (and possibly primitive) factors, and give examples
of trinomials
having a primitive factor of degree r for all Mersenne exponents
r = + 3 mod 8 in the range 5 < r < 2976221,
although there is no irreducible trinomial of degree r.
We also give
trinomials with a primitive factor of degree r = 2k
for 3 <= k <= 12. These trinomials enable efficient
representations of the finite field GF(2r).
We show how trinomials with large primitive factors can
be used efficiently in applications where primitive trinomials
would normally be used.
Note added April 22, 2009: this paper is mentioned in Divisibility of
Trinomials by Irreducible Polynomials over F2, by Ryul Kim and
Wolfram Koepf, International Journal of Algebra, Vol. 3, 2009, no. 4, 189-197.
This paper provides a simpler proof of the ``accurate summation'' algorithm
proposed by Demmel and Hida in
DeHi02.
It also gives improved bounds in some cases, and examples showing that those
new bounds are optimal.
This simpler proof will be used to obtain a computer-generated proof
of Demmel-Hida's algorithm, using a proof assistant like HOL, PVS or Coq.
Cet article décrit les premiers résultats de la recherche
(avec Richard Brent et Samuli Larvala) de trinômes primitifs de
degré 6972593 sur GF(2), et indique quelques conséquences
amusantes de la loi de Moore.
A Fast Algorithm for Testing Reducibility of Trinomials mod 2
and Some New Primitive Trinomials of Degree 3021377,
with Richard Brent and Samuli Larvala, Mathematics of Computation,
volume 72, number 243, pages 1443-1452, 2003.
A preliminary version appeared as
Report
PRG TR-13-00, Oxford
University Computing Laboratory, December 2000.
The standard algorithm for testing irreducibility of a trinomial of prime degree r over GF(2) requires 2r + O(1) bits of memory and of order r2
bit-operations. We describe an algorithm which requires only 3r/2 + O(1) bits of memory and less bit-operations than the standard algorithm.
Using the algorithm, we have found several new irreducible trinomials of high degree.
If r is a Mersenne exponent (i.e. 2r - 1 is a Mersenne prime), then an irreducible trinomial of degree r is necessarily primitive and can be used
to give a pseudo-random number generator with period at least 2r - 1. We give examples of primitive trinomials for r = 756839, 859433, and
3021377. The results for r = 859433 extend and correct some computations of Kumada et al [Mathematics of Computation 69 (2000),
811-814]. The two results for r = 3021377 are primitive trinomials of the highest known degree.
We propose a new algorithm to find worst cases for correct rounding
of an analytic function.
We first reduce this problem to the real small value problem
--- i.e. for polynomials with real coefficients.
Then we show that this second problem can be solved efficiently,
by extending Coppersmith's work
on the integer small value problem
--- for polynomials with integer coefficients ---
using lattice reduction
[Coppersmith96a,Coppersmith96b,Coppersmith01].
For floating-point numbers with a mantissa less than $N$,
and a polynomial approximation of degree $d$,
our algorithm finds all worst cases at distance $< N^{\frac{-d^2}{2d+1}}$
from a machine number in time $O(N^{\frac{d+1}{2d+1}+\varepsilon})$.
For $d=2$, this improves on the $O(N^{2/3+\varepsilon})$ complexity
from Lef\`evre's algorithm [Lefevre00,LeMu01] to $O(N^{3/5+\varepsilon})$.
We exhibit some new worst cases found using our algorithm,
for double-extended and quadruple precision.
For larger $d$, our algorithm can be used to check that there exist no
worst cases at distance $< N^{-k}$ in time $O(N^{\frac{1}{2}+O(\frac{1}{k})})$.
Symbolic computation and computer algebra systems are usually known to be
either very slow, or memory expensive.
However, some specific symbolic computation problems have
received in the last years new algorithmic solutions,
which enabled to push further the limits of what is doable within a
reasonable amount of time and space.
Some noticeable examples are polynomial factorisation,
lattice reduction, Groebner basis computation.
We will present a few such algorithms, together with a
state-of-the-art of what problems computer algebra systems can
(or cannot) solve, and for each problem what the current frontiers are.
We present a formal proof (at the implementation level) of an efficient
algorithm proposed by Paul Zimmermann
[Zimmermann00]
to compute square roots
of arbitrary large integers.
This program, which is part of the GNU Multiple Precision
Arithmetic Library (GMP) is completely proven within the
Coq system. Proofs are developed using the Correctness tool
to deal with imperative features of the program.
The formalization is rather large (more than 13000 lines) and requires
some advanced techniques for proof management and reuse.
Note: Vincent Lefèvre found a potential problem in the GMP implementation,
which is fixed by the following patch. This does not
contradicts our proof: the problem is due to the different C data types
(signed or not, different width), whereas our proof assumed a unique type.
Aliquot Sequence 3630 Ends After Reaching 100 Digits, with M. Benito,
W. Creyaufmüller and J. L. Varona, Experimental Mathematics, volume 11,
number 2, pages 201-206.
In this paper we present a new computational record: the aliquot sequence
starting at 3630 converges to 1 after reaching a hundred decimal digits.
Also, we show the current status of all the aliquot sequences starting with a
number smaller than 10,000; we have reached at leat 95 digits for all of them.
In particular, we have reached at least 112 digits for the so-called
"Lehmer five sequences," and 101 digits for the "Godwin twelve sequences."
Finally, we give a summary showing the number of aliquot sequences of unknown
end starting with a number less than or equal 106.
Note added 29 July 2008: in this paper, we say (page 203, middle of right
column) "It is curious to note that the driver 29 * 3 * 11 * 31
has appeared in no place in any of the sequences given in Table 1";
Clifford Stern notes that this driver appears at index 215 of sequence 165744,
which gives 29 * 3 * 7 * 11 * 31² * 37 * 10594304241173.
This document presents my research contributions from 1988 to 2001,
performed first at INRIA Rocquencourt within the Algo project (1988 to 1992),
then at INRIA Lorraine and LORIA within the projects Euréca (1993-1997),
PolKA (1998-2000), and Spaces (2001).
Three main periods can be roughly
distinguished: from 1988 to 1992 where my research
focused on analysis of algorithms and random generation,
from 1993 to 1997 where I worked on computer algebra and related algorithms,
finally from 1998 to 2001 where I was interested in arbitrary precision
floating-point arithmetic with well-defined semantics.
Arithmétique en précision
arbitraire, rapport de recherche INRIA 4272,
septembre 2001, à paraître dans la revue
"Calculateurs parallèles" [in french].
This paper surveys the available algorithms for integer or
floating-point arbitrary precision calculations. After a brief
discussion about possible memory representations, known algorithms
for multiplication, division, square root, greatest common
divisor, input and output, are presented, together with their
complexity and usage. For each operation, we present the naïve
algorithm, the asymptotically optimal one, and also intermediate
«divide and conquer» algorithms, which often are very useful. For
floating-points computations, some general-purpose methods are
presented for algebraic, elementary, hypergeometric and special
functions.
Recently, van Hoeij's published a new algorithm for factoring
polynomials over the rational integers. This algorithms rests
on the same principle as Berlekamp-Zassenhaus, but uses
lattice basis reduction to improve drastically on the
recombination phase. The efficiency of the LLL algorithm is very
dependent on fine tuning; in this paper, we present such tuning to
achieve better performance. Simultaneously, we describe a
generalization of van Hoeij's algorithm to factor polynomials over
number fields.
This paper gives new results for the isolation of real roots of a univariate
polynomial using Descartes' rule of signs, following work of Vincent,
Uspensky, Collins and Akritas, Johnson, Krandick.
The first contribution is a generic algorithm which enables one to describe
all the existing strategies in a unified framework.
Using that framework, a new algorithm is presented, which is optimal
in terms of memory usage, while doing no more computations than other
algorithms based on Descartes' rule of signs.
We show that these critical optimizations have important consequences
by proposing a full efficient solution for isolating the real roots
of zero-dimensional polynomial systems.
Density results on floating-point invertible numbers, with Guillaume Hanrot, Joël Rivat and Gérald Tenenbaum,
Theoretical Computer Science, volume 291, number 2, 2003, pages 135-141.
(The slides of a related talk I gave in January 2002 at the workshop
"Number Theory and Applications" in Luminy are
here.)
Let F_k denote the k-bit mantissa
floating-point (FP) numbers.
We prove a conjecture of J.-M. Muller according to which
the proportion of numbers in F_k with no FP-reciprocal (for rounding to the
nearest element) approaches
1/2-3/2 log(4/3) i.e. about 0.06847689 as
k goes to infinity.
We investigate a similar question for the inverse square root.
This short note gives a detailed correctness proof of
fast (i.e. subquadratic) versions of the
GNU MP
mpn_bz_divrem_n and mpn_sqrtrem functions, together with
complete GMP code.
The mpn_bz_divrem_n function divides (with remainder)
a number of 2n limbs by a divisor of n limbs in 2K(n), where K(n) is the
time spent in a (n times n) multiplication, using the
Moenck-Borodin-Jebelean-Burnikel-Ziegler algorithm.
The mpn_sqrtrem computes the square root and the remainder
of a number of 2n limbs (square root and remainder have about n limbs each)
in time 3K(n)/2; it uses Karatsuba Square Root.
We present new algorithms for the inverse, quotient, or
square root of power series.
The key trick is a new algorithm --- RecursiveMiddleProduct or RMP
--- computing the n middle coefficients of a
(2n * n) product in essentially the same number of operations --- K(n)
--- than a full (n * n) product with Karatsuba's method.
This improves previous work of Mulders, Karp and Markstein,
Burnikel and Ziegler.
These results apply both to series, polynomials, and multiple precision
floating-point numbers.
A Maple implementation is available here,
together with slides of a talk given at ENS Paris
in January 2004.
In this paper we describe ideas used to accelerate the Searching Phase
of the Berlekamp-Zassenhaus algorithm, the algorithm most widely used
for computing factorizations in Z[x]. Our ideas do not alter the
theoretical worst-case complexity, but they do have a significant effect in
practice: especially in those cases where the cost of the Searching
Phase completely dominates the rest of the algorithm. A complete
implementation of the ideas in this paper is publicly available in the
library NTL. We give timings of this implementation on
some difficult factorization problems.
On Sums of Seven Cubes,
with Francois Bertault and Olivier
Ramaré, Mathematics of Computation, volume 68, number 227,
pages 1303-1310, 1999.
Uniform Random Generation of
Decomposable Structures Using Floating-Point Arithmetic with Alain
Denise, Theoretical Computer Science, volume 218, number 2, 219--232, 1999.
A preliminary version appeared as
INRIA Research Report 3242, September 1997.
The recursive method formalized by Nijenhuis and Wilf [NiWi78] and
systematized by Flajolet, Van Cutsem and Zimmermann [FlZiVa94], is extended
here to floating-point arithmetic. The resulting ADZ method enables one to
generate decomposable data structures --- both labelled or unlabelled ---
uniformly at random, in expected O(n^{1+\epsilon}) time and space, after a
preprocessing phase of O(n^{2+\epsilon}) time, which reduces to
O(n^{1+\epsilon}) for context-free grammars.
We study here a quantity related to the number of walks with North and East steps staying under the line of slope d starting from the origin. We give an asymptotic analysis of this quantity with respect to both the width n and the slope d, answering to a question asked by Bernard Mourrain.
Cinq algorithmes de calcul
symbolique,
notes de cours d'un module de spécialisation du DEA d'informatique
de l'Université Henri Poincaré Nancy 1, 1997.
These are lecture notes of a course entitled «Some computer algebra
algorithms» given by the author at the University of Nancy 1 in 1997. Five
fundamental algorithms used by computer algebra systems are briefly described:
Gosper's algorithm for computing indefinite sums, Zeilberger's algorithm for
definite sums, Berlekamp's algorithm for factoring polynomials over finite
fields, Zassenhaus' algorithm for factoring polynomials with integer
coefficients, and Lenstra's integer factorization algorithm using elliptic
curves. All these algorithms were implemented --- or improved --- by the
author in the computer algebra system MuPAD.
MuPAD is a general purpose computer algebra system with two programming concepts for parallel processing: micro-parallelism for shared-memory machines and macro-parallelism for distributed architectures. This article describes language instructions for both concepts, the current state of implementation, together with some examples.
Polynomial Factorization Challenges,
with L. Bernardin and M. Monagan, poster presented at
the International Symposium on Symbolic and Algebraic Computation
(ISSAC), July 1996, 4 pages.
We describe the GFUN package which contains functions for manipulating sequences, linear recurrences or differential equations and generating functions of various types. This document is intended both as an elementary introduction to the subject and as a reference manual for the package.
A Calculus of Random
Generation, with Philippe Flajolet and Bernard Van Cutsem, Proceedings
of European Symposium on Algorithms (ESA'93), LNCS 726, pages 169-180, 1993.
This report describes the algorithm used by the epelle program, together with its implementation in the C language. This program is able to check about 30.000 words every second on modern computers, without any error, contrary to the Unix spell program which makes use of hashing methods and could thus accept wrong words. The main principle of epelle is to use digital trees (also called dictionary trees), which in addition reduces the space needed to store the list of words (by a factor of about 5 for the french dictionary). Creating a new digital tree for the franch language (about 240.000 words) takes only a dozen of seconds. The same program is directly usable for other languages and more generally for any list of alphanumeric keys.