Home > mailing lists

Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward - Mailing list pgsql-hackers

From	Chao Li
Subject	Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date	October 14 08:36:07
Msg-id	09C1DFFE-6078-4C8E-B38D-A4D5CE087433@gmail.com Whole thread Raw
In response to	Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Oct 14, 2025, at 10:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This won't show the effect, because pg_dump will be able to go back
and insert data offsets into the dump's TOC, so pg_restore can just
seek to where the data is. See upthread discussion about what's
needed to provoke Dimitrios' problem.

I tried this very tiny (relatively speaking) test case:

regression=# create database d1;
CREATE DATABASE
regression=# \c d1
You are now connected to database "d1" as user "postgres".
d1=# create table alpha as select repeat(random()::text, 1000) from generate_series(1,1000000);
SELECT 1000000
d1=# create table omega as select 42 as x;
SELECT 1
d1=# \q

Then

$ pg_dump -Fc d1 | cat >d1.dump
$ time pg_restore -f /dev/null -t omega d1.dump

The point of the pipe-to-cat is to reproduce Dimitrios' problem case
with no data offsets in the TOC. Then the restore is doing about the
simplest thing I can think of to make it skip over most of the archive
file. Also, I'm intentionally using the default choice of gzip
because that already responds to DEFAULT_IO_BUFFER_SIZE properly.
(This test is with current HEAD, no patches except adjusting
DEFAULT_IO_BUFFER_SIZE.)

I got these timings:

DEFAULT_IO_BUFFER_SIZE = 1K
real    0m0.020s
user    0m0.002s
sys     0m0.017s

DEFAULT_IO_BUFFER_SIZE = 4K
real    0m0.014s
user    0m0.003s
sys     0m0.011s

DEFAULT_IO_BUFFER_SIZE = 128K
real    0m0.002s
user    0m0.000s
sys     0m0.002s

This test case has only about 50MB worth of compressed data,
so of course the times are very small; scaling it up to
gigabytes would yield more impressive results. But the
effect is clearly visible.

With your example, I can now see the difference, however, I had to create 5 more times of rows in the first table:

```

evantest=# CREATE TABLE alpha AS SELECT repeat(random()::text, 1000) FROM generate_series(1, 5000000);

SELECT 5000000

evantest=#

evantest=# CREATE TABLE omega AS SELECT 42 AS x;

SELECT 1

```

My test is with the patch, I only adjusted DEFAULT_IO_BUFFER_SIZE.

DEFAULT_IO_BUFFER_SIZE=4K

```

% /usr/bin/time pg_dump -Fc evantest | cat > d1.dump

294.83 real 220.28 user 45.90 sys

% /usr/bin/time pg_restore -f /dev/null -t omega d1.dump

0.16 real 0.02 user 0.09 sys

```

DEFAULT_IO_BUFFER_SIZE=128K

```

% /usr/bin/time pg_dump -Fc evantest | cat > d1.dump

296.89 real 220.85 user 46.64 sys

% /usr/bin/time pg_restore -f /dev/null -t omega d1.dump

0.01 real 0.00 user 0.00 sys

```

With bigger blocker size, pg_restore skips less blocks, so it gets faster.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

pgsql-hackers by date:

From: shveta malik
Date: 14 October, 08:21:11
Subject: Re: Logical Replication of sequences

From: Michael Paquier
Date: 14 October, 08:40:59
Subject: Re: Add LZ4 compression in pg_dump

Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward - Mailing list pgsql-hackers

Previous

Next