Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward - Mailing list pgsql-hackers

From Chao Li
Subject Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date
Msg-id 09C1DFFE-6078-4C8E-B38D-A4D5CE087433@gmail.com
Whole thread Raw
In response to Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Oct 14, 2025, at 10:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This won't show the effect, because pg_dump will be able to go back
and insert data offsets into the dump's TOC, so pg_restore can just
seek to where the data is.  See upthread discussion about what's
needed to provoke Dimitrios' problem.

I tried this very tiny (relatively speaking) test case:

regression=# create database d1;
CREATE DATABASE
regression=# \c d1
You are now connected to database "d1" as user "postgres".
d1=# create table alpha as select repeat(random()::text, 1000) from generate_series(1,1000000);
SELECT 1000000
d1=# create table omega as select 42 as x;
SELECT 1
d1=# \q

Then

$ pg_dump -Fc d1 | cat >d1.dump
$ time pg_restore -f /dev/null -t omega d1.dump

The point of the pipe-to-cat is to reproduce Dimitrios' problem case
with no data offsets in the TOC.  Then the restore is doing about the
simplest thing I can think of to make it skip over most of the archive
file.  Also, I'm intentionally using the default choice of gzip
because that already responds to DEFAULT_IO_BUFFER_SIZE properly.
(This test is with current HEAD, no patches except adjusting
DEFAULT_IO_BUFFER_SIZE.)

I got these timings:

DEFAULT_IO_BUFFER_SIZE = 1K
real    0m0.020s
user    0m0.002s
sys     0m0.017s

DEFAULT_IO_BUFFER_SIZE = 4K
real    0m0.014s
user    0m0.003s
sys     0m0.011s

DEFAULT_IO_BUFFER_SIZE = 128K
real    0m0.002s
user    0m0.000s
sys     0m0.002s

This test case has only about 50MB worth of compressed data,
so of course the times are very small; scaling it up to
gigabytes would yield more impressive results.  But the
effect is clearly visible.


With your example, I can now see the difference, however, I had to create 5 more times of rows in the first table:

```
evantest=# CREATE TABLE alpha AS SELECT repeat(random()::text, 1000) FROM generate_series(1, 5000000);
SELECT 5000000
evantest=#
evantest=# CREATE TABLE omega AS SELECT 42 AS x;
SELECT 1
```

My test is with the patch, I only adjusted DEFAULT_IO_BUFFER_SIZE.

DEFAULT_IO_BUFFER_SIZE=4K
```
% /usr/bin/time pg_dump -Fc evantest | cat > d1.dump
      294.83 real       220.28 user        45.90 sys

% /usr/bin/time pg_restore -f /dev/null -t omega d1.dump
        0.16 real         0.02 user         0.09 sys
```

DEFAULT_IO_BUFFER_SIZE=128K
```
% /usr/bin/time pg_dump -Fc evantest | cat > d1.dump
      296.89 real       220.85 user        46.64 sys

% /usr/bin/time pg_restore -f /dev/null -t omega d1.dump
        0.01 real         0.00 user         0.00 sys
```

With bigger blocker size, pg_restore skips less blocks, so it gets faster.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Logical Replication of sequences
Next
From: Michael Paquier
Date:
Subject: Re: Add LZ4 compression in pg_dump