Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward - Mailing list pgsql-hackers

From Chao Li
Subject Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date
Msg-id CBF70398-279C-4BA7-9FA1-2879A49F4EC9@gmail.com
Whole thread Raw
In response to Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward
Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
List pgsql-hackers


On Oct 12, 2025, at 09:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

While playing around with the test cases for pg_dump compression,
I was startled to discover that the performance of compress_lz4's
"stream API" code is absolutely abysmal.  Here is a simple test
case to demonstrate, using the regression database as test data:

$ pg_dump -Fd --compress=lz4 -f rlz4.dir regression
$ time pg_restore -f /dev/null rlz4.dir

real    0m0.023s
user    0m0.017s
sys     0m0.006s

So far so good, but now let's compress the toc.dat file:

$ lz4 -f -m --rm rlz4.dir/toc.dat
$ time pg_restore -f /dev/null rlz4.dir

real    0m1.335s
user    0m1.326s
sys     0m0.008s

Considering that lz4 prides itself on fast decompression speed,
that is not a sane result.  Decompressing the file only requires
a couple ms on my machine:

$ time lz4cat rlz4.dir/toc.dat.lz4 >/dev/null

real    0m0.002s
user    0m0.000s
sys     0m0.002s

So on this example, pg_restore is something more than 600x slower
to read the TOC data than it ought to be.


I tested the patch on my MacBook M4 (Sequoia 15.6.1), compiled with -O2 optimization:

- For LZ4 performance improvement:

Without the fix (latest master):
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=lz4 -f rlz4.dir regression

chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir  0.03s user 0.03s system 11% cpu 0.463 total

chaol@ChaodeMacBook-Air tmp % lz4 -f -m --rm rlz4.dir/toc.dat

# It took 1.59s to restore from compressed data, much slower than from uncompressed data.
chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir  1.59s user 0.02s system 97% cpu 1.653 total
```

With the fix:
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=lz4 -f rlz4.dir regression

chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir  0.02s user 0.03s system 16% cpu 0.305 total

chaol@ChaodeMacBook-Air tmp % lz4 -f -m --rm rlz4.dir/toc.dat

# Much faster !!!
chaol@ChaodeMacBook-Air tmp % time pg_restore -f /dev/null rlz4.dir
pg_restore -f /dev/null rlz4.dir  0.02s user 0.02s system 93% cpu 0.043 total
```

- For Gzip zero read bug:

Without the fix:
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=gzip -f rgzip.dir regression
chaol@ChaodeMacBook-Air tmp % gzip rgzip.dir/toc.dat

# Failed
chaol@ChaodeMacBook-Air tmp % pg_restore -f /dev/null rgzip.dir
pg_restore: error: could not read from input file:
```

With the fix:
```
chaol@ChaodeMacBook-Air tmp % pg_dump -Fd --compress=gzip -f rgzip.dir regression
chaol@ChaodeMacBook-Air tmp % gzip rgzip.dir/toc.dat

# Ran successfully 
chaol@ChaodeMacBook-Air tmp % pg_restore -f /dev/null rgzip.dir
```

I also reviewed the code change, basically LGTM. Just a couple of trivial comments:

1 - 0001
In WriteDataToArchiveLZ4()

```
+ required = LZ4F_compressBound(chunk, &state->prefs);
+ if (required > state->buflen - state->bufdata)
+ {
+ cs->writeF(AH, state->buffer, state->bufdata);
+ state->bufdata = 0;
+ }
```

And in EndCompressorLZ4()
```
+ required = LZ4F_compressBound(0, &state->prefs);
+ if (required > state->buflen - state->bufdata)
+ {
+ cs->writeF(AH, state->buffer, state->bufdata);
+ state->bufdata = 0;
+ }
```

These two code pieces are similar, only difference is the first parameter passed to LZ4F_compressBound().

I think we can create an inline function for it. But these are just 5 lines, I don’t have a strong option on that.

Same thing for the other code pieces in LZ4Stream_write() and LZ4Stream_close().

2 - 0003
```
 /*
  * LZ4 equivalent to feof() or gzeof().  Return true iff there is no
- * decompressed output in the overflow buffer and the end of the backing file
```

This doesn’t belong to the current patch. But “iff” seems a typo of “if”. You may fix it as you are touching this piece of code.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




pgsql-hackers by date:

Previous
From: Xuneng Zhou
Date:
Subject: Re: pgstattuple: Use streaming read API in pgstatindex functions
Next
From: "David G. Johnston"
Date:
Subject: Re: [PATCH v1] parallel pg_restore: avoid disk seeks when jumping short distance forward