Thread: pg_wal_summary_contents() and pg_walsummary may return different results on the same WAL summary file

Hi,

I found that pg_wal_summary_contents() may miss some results that pg_walsummary returns for the same WAL summary file.
Hereare the steps to reproduce the issue:
 

-----------------------------
initdb -D data
echo "summarize_wal = on" >> data/postgresql.conf
pg_ctl -D data start
psql <<EOF
CREATE TABLE t AS SELECT n i, n j FROM generate_series(1, 1000) n;
DELETE FROM t;
CHECKPOINT;
VACUUM t;
CHECKPOINT;
SELECT foo.* FROM (SELECT * FROM pg_available_wal_summaries() ORDER BY start_lsn DESC LIMIT 1) JOIN LATERAL
pg_wal_summary_contents(tli,start_lsn, end_lsn) foo ON true;
 
EOF
pg_walsummary -i data/pg_wal/summaries/$(ls -1 data/pg_wal/summaries/ | tail -1)
-----------------------------

In my test, pg_walsummary returned three records:

TS 1663, DB 5, REL 1259, FORK main: block 0
TS 1663, DB 5, REL 16384, FORK main: limit 0
TS 1663, DB 5, REL 16384, FORK vm: limit 0

However, pg_wal_summary_contents() returned only one record:

  relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | is_limit_block
-------------+---------------+-------------+---------------+----------------+----------------
         1259 |          1663 |           5 |             0 |              0 | f

pg_wal_summary_contents() seems to miss the summary information with "limit" that pg_walsummary reports. This appears
tobe a bug. The attached patch fixes this.
 

By the way, pg_wal_summary_contents() and pg_walsummary perform nearly the same task but are implemented in different
functions.This could be the root of issues like this. In the future, it would be better to have a common function for
outputtingthe WAL summary file that both can use.
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment
On Wed, Jul 3, 2024 at 5:34 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> pg_wal_summary_contents() seems to miss the summary information with "limit" that pg_walsummary reports. This appears
tobe a bug. The attached patch fixes this. 

Oops. It looks like pg_wal_summary_contents() forgets to emit the
limit block when that's the only data for a particular relation fork.
And maybe you can make it emit the limit block multiple times if the
list of block numbers is long enough.

Thanks for the patch. I think you can commit and back-patch this, but
I don't think the commit message is quite right, because it's not like
this code just NEVER executes where it is located currently. Or am I
missing something?

> By the way, pg_wal_summary_contents() and pg_walsummary perform nearly the same task but are implemented in different
functions.This could be the root of issues like this. In the future, it would be better to have a common function for
outputtingthe WAL summary file that both can use. 

It's entirely possible that, with some refactoring, more code could be
shared. I tried to make all of the blkreftable stuff reusable, but I
didn't pay as much attention to synchronizing up the various users of
it. However, the fact that pg_walsummary is frontend code and
pg_wal_summary_contents() is backend code does make it hard to get
perfect reuse. I think if you go through pg_wal_summary_contents(),
you'll find that almost every line of that function contains something
backend-specific.

--
Robert Haas
EDB: http://www.enterprisedb.com




On 2024/07/03 22:42, Robert Haas wrote:
> On Wed, Jul 3, 2024 at 5:34 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> pg_wal_summary_contents() seems to miss the summary information with "limit" that pg_walsummary reports. This
appearsto be a bug. The attached patch fixes this.
 
> 
> Oops. It looks like pg_wal_summary_contents() forgets to emit the
> limit block when that's the only data for a particular relation fork.
> And maybe you can make it emit the limit block multiple times if the
> list of block numbers is long enough.
> 
> Thanks for the patch. I think you can commit and back-patch this, but
> I don't think the commit message is quite right, because it's not like
> this code just NEVER executes where it is located currently. Or am I
> missing something?

Yes, so I updated the commit message. I borrowed your description and used it in the message. Attached is the revised
versionof the patch.
 

If there are no objections, I will commit and backpatch it.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment
On Thu, Jul 4, 2024 at 6:16 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Yes, so I updated the commit message. I borrowed your description and used it in the message. Attached is the revised
versionof the patch. 
>
> If there are no objections, I will commit and backpatch it.

+1. Maybe change "Fix bugs in pg_wal_summary_contents()" to "Fix limit
block handling in pg_wal_summary_contents()".

--
Robert Haas
EDB: http://www.enterprisedb.com




On 2024/07/08 22:50, Robert Haas wrote:
> On Thu, Jul 4, 2024 at 6:16 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> Yes, so I updated the commit message. I borrowed your description and used it in the message. Attached is the
revisedversion of the patch.
 
>>
>> If there are no objections, I will commit and backpatch it.
> 
> +1. Maybe change "Fix bugs in pg_wal_summary_contents()" to "Fix limit
> block handling in pg_wal_summary_contents()".

Thanks! I've pushed the patch and used your wording in the commit message.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION