>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:
PG> Obviously it's not intended that gist__int_ops should actually work
PG> with data of this kind - that's what gist__intbig_ops is for. But
PG> it's not reasonable for it to crash rather than returning an error.
PG> I'm working on a patch.
And here it is.
This version doesn't restrict the sparseness of keys any more than it
needs to for safety. This necessitates improving the O(N^2) compression
algorithm to O(N), because otherwise it's possible to create datasets
that cause a single compression call to run for months (with no check
for interrupts, either).
There are essentially 4 specific issues addressed:
1. Integer overflow in internal_size could result in memory corruption
in decompression since a zero-length array would be allocated and
then written to.
2. Integer overflow in g_int_compress could cause pessimal merge
choices, resulting in unnecessarily large ranges (which would in turn
trigger issue 1 above)
3. Even without overflow, array sizes could become large enough to cause
unexplained memory allocation errors, so cap the sizes and report
actual errors pointing at gist__intbig_ops as needed.
4. Large inputs to the compression function always consist of large runs
of consecutive integers, and the compression loop was processing
these one at a time in an O(N^2) manner with a lot of overhead. The
expected runtime of this function could easily exceed 6 months as a
result. Performing a linear-time first pass reduces the worst case
to something on the order of seconds.
--
Andrew (irc:RhodiumToad)