Re: [PATCH] pg_dump: lock tables in batches - Mailing list pgsql-hackers

From Fabrízio de Royes Mello
Subject Re: [PATCH] pg_dump: lock tables in batches
Date
Msg-id CAFcNs+rwNf7oUYg6T76_in8a8Ct5oHP7MspG3JO83fJHQ314fA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] pg_dump: lock tables in batches  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] pg_dump: lock tables in batches
List pgsql-hackers


On Wed, Dec 7, 2022 at 2:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-12-07 10:44:33 -0500, Tom Lane wrote:
> >> I have a strong sense of deja vu here.  I'm pretty sure I experimented
> >> with this idea last year and gave up on it.  I don't recall exactly
> >> why, but either it didn't show any meaningful performance improvement
> >> for me or there was some actual downside (that I'm not remembering
> >> right now).
>
> > IIRC the case we were looking at around 989596152 were CPU bound workloads,
> > rather than latency bound workloads. It'd not be surprising to have cases
> > where batching LOCKs helps latency, but not CPU bound.
>
> Yeah, perhaps.  Anyway my main point is that I don't want to just assume
> this is a win; I want to see some actual performance tests.
>

Here we have some numbers about the Aleksander's patch:

1) Setup script

CREATE DATABASE t1000;
CREATE DATABASE t10000;
CREATE DATABASE t100000;

\c t1000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3 TIMESTAMPTZ);', i) FROM generate_series(1, 1000) AS i \gexec

\c t10000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3 TIMESTAMPTZ);', i) FROM generate_series(1, 10000) AS i \gexec

\c t100000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3 TIMESTAMPTZ);', i) FROM generate_series(1, 100000) AS i \gexec

2) Execution script

time pg_dump -s t1000 > /dev/null
time pg_dump -s t10000 > /dev/null
time pg_dump -s t100000 > /dev/null

3) HEAD execution

$ time pg_dump -s t1000 > /dev/null
0.02user 0.01system 0:00.36elapsed 8%CPU (0avgtext+0avgdata 11680maxresident)k
0inputs+0outputs (0major+1883minor)pagefaults 0swaps

$ time pg_dump -s t10000 > /dev/null
0.30user 0.10system 0:05.04elapsed 8%CPU (0avgtext+0avgdata 57772maxresident)k
0inputs+0outputs (0major+14042minor)pagefaults 0swaps

$ time pg_dump -s t100000 > /dev/null
3.42user 2.13system 7:50.09elapsed 1%CPU (0avgtext+0avgdata 517900maxresident)k
0inputs+0outputs (0major+134636minor)pagefaults 0swaps

4) PATCH execution

$ time pg_dump -s t1000 > /dev/null
0.02user 0.00system 0:00.28elapsed 9%CPU (0avgtext+0avgdata 11700maxresident)k
0inputs+0outputs (0major+1886minor)pagefaults 0swaps

$ time pg_dump -s t10000 > /dev/null
0.18user 0.03system 0:02.17elapsed 10%CPU (0avgtext+0avgdata 57592maxresident)k
0inputs+0outputs (0major+14072minor)pagefaults 0swaps

$ time pg_dump -s t100000 > /dev/null
1.97user 0.32system 0:21.39elapsed 10%CPU (0avgtext+0avgdata 517932maxresident)k
0inputs+0outputs (0major+134892minor)pagefaults 0swaps

5) Summary

             HEAD     patch
  1k tables  0:00.36  0:00.28
 10k tables  0:05.04  0:02.17
100k tables  7:50.09  0:21.39

Seems we get very good performance gain using Aleksander's patch. I used the "-s" to not waste time issuing COPY for each relation (even all is empty) and evidence the difference due to roundtrip for LOCK TABLE. This patch will also improve the pg_upgrade execution over database with thousands of relations.

Regards,

--
Fabrízio de Royes Mello

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Error-safe user functions
Next
From: Andrey Borodin
Date:
Subject: Re: Transaction timeout