Home > mailing lists

Re: pg_dump performance issues - Mailing list pgsql-general

From	Adrian Klaver
Subject	Re: pg_dump performance issues
Date	February 22, 2024 17:17:18
Msg-id	fc88cb13-eb4d-4ef8-9ef8-4e23034d11bf@aklaver.com Whole thread Raw
In response to	Re: pg_dump performance issues (Dominique Devienne <ddevienne@gmail.com>)
List	pgsql-general

Tree view

On 2/22/24 06:40, Dominique Devienne wrote:
> On Thu, Feb 22, 2024 at 12:20 PM Dominique Devienne <ddevienne@gmail.com 
> <mailto:ddevienne@gmail.com>> wrote:
> 
>     On Thu, Feb 22, 2024 at 11:41 AM Francisco Olarte
>     <folarte@peoplecall.com <mailto:folarte@peoplecall.com>> wrote:
> 
>         Dominique:
>          > .... in the 10-12MB/s throughput range.....
> 
>         This has the faint smell of a saturated 100Mbps link in the
>         middle (12*8=96Mbps)
> 
> 
>     So indeed, I'm not on 1Gb/s as I expected... I'll report back when
>     this is fixed.
> 
> 
> So IT fixing the networking on my Windows machine is slower than I 
> expected...
> 
> Thus I switched to a Linux client, which I verified using iperf3 really 
> has 1Gbps.
> And there I can read at 100MB/s, which is almost saturating the network 
> link.
> The Linux dumps are 45MB this time, instead of 100MB on Windows, as if 
> the former is compressed, and the latter isn't.

Did you use the same pg_dump command on the Windows client?

Where was the Windows client Postgres software installed from?

Where was the Linux client Postgres software installed from?

> Perhaps the (implicit?) compression of the dump is why it's 8x slower 
> than mine? (I do fewer catalog queries I suspect)

 From

https://www.postgresql.org/docs/current/app-pgdump.html

d
directory

  <...>
  This format is compressed by default using gzip and also supports 
parallel dumps.

> And I see no speedup from parallelizing, unfortunately, which I find 
> strange, as mentioned earlier. Any idea why?

 From same link as above

-j njobs
--jobs=njobs

Run the dump in parallel by dumping njobs tables simultaneously. This 
option may reduce the time needed to perform the dump but it also 
increases the load on the database server. You can only use this option 
with the directory output format because this is the only output format 
where multiple processes can write their data at the same time.

<...>


I'm guessing on this small a database the overhead of -j is greater then 
the benefit derived.


> Note that this is against the v12 server, and the client is still 16.1.
> 
> Applying a two-threaded producer-consumer scheme to my dump, to 
> parallelize reads and writes, I might approach 100MB/s overall.
> Before getting into parallelizing reads, which complicate matters, and 
> my dump container's API is serial anyway (it's SQLite...)
> 
> So looks like I'm networking bound on 1Gbps, while Marc perhaps reached 
> PostgreSQL's (or the disks'?) limits before saturating his 10Gbps 
> networking.
> 
> Thanks, --DD
> 
> [ddevienne]$ my_dump ...
> COPY:         25x (   88,073 rows,     56,768,104 bytes) in   0.537s 
> (100.7 MB/s)
> DUMP'd 88,023 rows (56,761,583 bytes) from 25 tables in 1.062s (51.0 MB/s)
> 
> [ddevienne]$ ll ....db
> -rw-r--r-- 1 ddevienne users 58,351,616 Feb 22 15:10 ....db
> 
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fc -n 
> '"..."' -f ...-Fc.dump "host=acme dbname=copyperf"
> 7.561u 0.216s 0:08.46 91.8%     0+0k 0+91056io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -n 
> '"..."' -f ...-Fd.dump "host=acme dbname=copyperf"
> 7.351u 0.219s 0:08.33 90.7%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j2 
> -n '"..."' -f ...-Fdj2.dump "host=acme dbname=copyperf"
> 7.667u 0.228s 0:08.56 92.0%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j3 
> -n '"..."' -f ...-Fdj3.dump "host=acme dbname=copyperf"
> 7.964u 0.247s 0:08.71 94.1%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j4 
> -n '"..."' -f ...-Fdj4.dump "host=acme dbname=copyperf"
> 7.491u 0.243s 0:08.21 94.1%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ du -sm ...-F*
> 45      ...-Fc.dump
> 45      ...-Fd.dump
> 45      ...-Fdj2.dump
> 45      ...-Fdj3.dump
> 45      ...-Fdj4.dump

-- 
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

From: Siddharth Jain
Date: 22 February 2024, 17:16:17
Subject: Re: How do you optimize the disk IO when you cannot assume a file will start at a boundary then?

From: Moreno Andreo
Date: 22 February 2024, 17:25:42
Subject: Re: [SPAM] Re: Partial table duplication via triggger

Re: pg_dump performance issues - Mailing list pgsql-general

Previous

Next