Fix serious performance problems in LZ4Stream_read_internal.
I was distressed to find that reading an LZ4-compressed toc.dat
file was hundreds of times slower than it ought to be. On
investigation, the blame mostly affixes to LZ4Stream_read_overflow's
habit of memmove'ing all the remaining buffered data after each read
operation. Since reading a TOC file tends to involve a lot of small
(even one-byte) decompression calls, that amounts to an O(N^2) cost.
This could have been fixed with a minimal patch, but to my
eyes LZ4Stream_read_internal and LZ4Stream_read_overflow are
badly-written spaghetti code; in particular the eol_flag logic
is inefficient and duplicative. I chose to throw the code
away and rewrite from scratch. This version is about sixty
lines shorter as well as not having the performance issue.
Fortunately, AFAICT the only way to get to this problem is to
manually LZ4-compress the toc.dat and/or blobs.toc files within a
directory-style archive; in the main data files, we read blocks
that are large enough that the O(N^2) behavior doesn't manifest.
Few people do that, which likely explains the lack of field
complaints. Otherwise this performance bug might be considered
bad enough to warrant back-patching.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/1f8062dd9668572d66549fc798a7d2057aa34ee1
Modified Files
--------------
src/bin/pg_dump/compress_lz4.c | 242 +++++++++++++++--------------------------
1 file changed, 89 insertions(+), 153 deletions(-)