Re: Add pg_walinspect function with block info columns - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Add pg_walinspect function with block info columns
Date
Msg-id CAH2-Wzm9shOkEDM10_+qOZkRSQhKVxwBFiehH6EHWQQRd_rDPw@mail.gmail.com
Whole thread Raw
In response to Re: Add pg_walinspect function with block info columns  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Add pg_walinspect function with block info columns  (Peter Geoghegan <pg@bowt.ie>)
Re: Add pg_walinspect function with block info columns  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Mar 29, 2023 at 8:28 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I took a look at v9 and LGTM.

Pushed, thanks.

There is still an outstanding question around the overhead of
outputting FPIs and even block data from pg_get_wal_block_info(). At
one point Melanie suggested that we'd need to do something about that,
and I tend to agree. Attached patch provides an optional parameter
that will make pg_get_wal_block_info return NULLs for both block_data
and block_fpi_data, no matter whether or not there is something to
show. Note that this only affects those two bytea columns; we'll still
show everything else, including valid block_data_length and
block_fpi_length values (so the metadata describing the on-disk size
of block_data and block_fpi_data is unaffected).

To test this patch, I ran pgbench for about 5 minutes, using a fairly
standard configuration with added indexes and with wal_log_hints
enabled. I ended up with the following WAL records afterwards:

pg@regression:5432 [1402115]=# SELECT
  "resource_manager/record_type" t,
  pg_size_pretty(combined_size) s,
  fpi_size_percentage perc_fpi
FROM
  pg_get_wal_Stats ('0/10E9D80', 'FFFFFFFF/FFFFFFFF', FALSE) where
combined_size > 0;
┌─[ RECORD 1 ]──────────────────┐
│ t        │ XLOG               │
│ s        │ 1557 MB            │
│ perc_fpi │ 22.029466865781302 │
├─[ RECORD 2 ]──────────────────┤
│ t        │ Transaction        │
│ s        │ 49 MB              │
│ perc_fpi │ 0                  │
├─[ RECORD 3 ]──────────────────┤
│ t        │ Storage            │
│ s        │ 13 kB              │
│ perc_fpi │ 0                  │
├─[ RECORD 4 ]──────────────────┤
│ t        │ CLOG               │
│ s        │ 1380 bytes         │
│ perc_fpi │ 0                  │
├─[ RECORD 5 ]──────────────────┤
│ t        │ Database           │
│ s        │ 118 bytes          │
│ perc_fpi │ 0                  │
├─[ RECORD 6 ]──────────────────┤
│ t        │ RelMap             │
│ s        │ 565 bytes          │
│ perc_fpi │ 0                  │
├─[ RECORD 7 ]──────────────────┤
│ t        │ Standby            │
│ s        │ 30 kB              │
│ perc_fpi │ 0                  │
├─[ RECORD 8 ]──────────────────┤
│ t        │ Heap2              │
│ s        │ 4235 MB            │
│ perc_fpi │ 0.6731388657682449 │
├─[ RECORD 9 ]──────────────────┤
│ t        │ Heap               │
│ s        │ 4482 MB            │
│ perc_fpi │ 54.46811493602934  │
├─[ RECORD 10 ]─────────────────┤
│ t        │ Btree              │
│ s        │ 1786 MB            │
│ perc_fpi │ 22.829279332421116 │
└──────────┴────────────────────┘

Time: 3618.693 ms (00:03.619)

So about 12GB of WAL -- certainly enough to be a challenge for pg_walinspect.

I then ran the following query several times over the same LSN range
as before, with my patch applied, but with behavior equivalent to
current git HEAD (this is with outputting block_data and
block_fpi_data values still turned on):

pg@regression:5432 [1402115]=# SELECT
  count(*)
FROM
  pg_get_wal_block_info ('0/10E9D80', 'FFFFFFFF/FFFFFFFF', false);
┌─[ RECORD 1 ]───────┐
│ count │ 17,031,979 │
└───────┴────────────┘

Time: 35171.463 ms (00:35.171)

The time shown here is typical of what I saw.

And now the same query, but without any overhead for outputting
block_data and block_fpi_data values:

pg@regression:5432 [1402115]=# SELECT
  count(*)
FROM
  pg_get_wal_block_info ('0/10E9D80', 'FFFFFFFF/FFFFFFFF', true);
┌─[ RECORD 1 ]───────┐
│ count │ 17,031,979 │
└───────┴────────────┘

Time: 15235.499 ms (00:15.235)

This time is also typical of what I saw. The variance was fairly low,
so I won't bother describing it.

I think that this is a compelling reason to apply the patch. It would
be possible to get about 75% of the benefit shown here by just
suppressing block_fpi_data output, without suppressing block_data, but
I think that it makes sense to either suppress both or neither. Things
like page split records can write a fairly large amount of WAL in a
way that resembles an FPI, even though technically no FPI is involved.

If there are no objections, I'll move ahead with committing something
along the lines of this patch in the next couple of days.

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Thoughts on using Text::Template for our autogenerated code?
Next
From: Peter Geoghegan
Date:
Subject: Re: Add pg_walinspect function with block info columns