Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? - Mailing list pgsql-bugs

From Marco Boeringa
Subject Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?
Date
Msg-id 105bfec6-5393-3828-1b12-3640635c9d02@boeringa.demon.nl
Whole thread Raw
In response to Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-bugs
Thanks Tomas for the explanation.

This all still points to a bug then, with WAL being affected, and 
possibly the processing of 'Planet' data in the secondary database that 
did not show an issue, being responsible for the enormous amount of WAL 
being written due to checkpoint failures. Although the latter is still 
speculation, it may be caused by another as of yet to determine cause.

To be absolutely clear about this: I have never seen so much WAL being 
generated with Planet processing in ordinary situations when this issue 
doesn't rear its head. In fact, I am currently processing Planet again, 
and the total size of files on "file system root" has not been over 
about 250GB, with still 750GB free space. This is what I see under 
ordinary situations.

So I am also pretty much convinced there is no misconfiguration of the 
WAL settings in my 'posgresql.conf' file.

Any suggestions for a further course of action, or do you people have at 
least enough info for now to give it a first try to find out what might 
be wrong?

Marco

Op 25-7-2022 om 15:39 schreef Tomas Vondra:
>
> On 7/25/22 08:04, Marco Boeringa wrote:
>> To extend on this, two interesting questions that come to mind are:
>>
>> - Does running SELECT COUNT(*) create WAL?
>>
> Yes. An obvious example is updating the visibility map (which is always
> logged to WAL) or hint bits (which may be WAL logged). I'd also bet we
> may generate WAL for indexes, e.g. to kill deleted tuples.
>
>> - Is it potentially conceivable that there is a kind of cross-database
>> vulnerability *within one and the same PostgreSQL cluster*, where an
>> issue in one database causes the WAL in another database to no longer
>> successfully be written to disk during checkpoints? I have never seen
>> processing errors where PostgreSQL emitted true PostgreSQL errors with
>> error numbers cause issues like that and affect a second database in the
>> same cluster, but since no error is generated here, and there might be
>> some uncatched error, I wonder?
>>
>> I am especially asking the second question since, although I wrote there
>> is no edit activity going on potentially generating WAL in the affected
>> small database, which is true, there *was* processing on Planet sized
>> data going on in a second database in the same cluster. That certainly
>> *is* capable of generating 890GB of WAL if nothing is cleaned up during
>> checkpoints due to checkpoints failing.
>>
> WAL is a resource shared by all the databases in the cluster, so if that
> gets broken it's broken for everyone.
>
>
> regards
>



pgsql-bugs by date:

Previous
From: David Steele
Date:
Subject: Re: could not link file in wal restore lines
Next
From: Tom Lane
Date:
Subject: Re: BUG #17434: CREATE/DROP DATABASE can be executed in the same transaction with other commands