Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? - Mailing list pgsql-bugs
From | Marco Boeringa |
---|---|
Subject | Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? |
Date | |
Msg-id | 2dbccee7-9616-997d-cf19-fa23e29e0497@boeringa.demon.nl Whole thread Raw |
In response to | Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?
|
List | pgsql-bugs |
Hi Tom, Thanks for the response, but please review the exact conditions I already mentioned in my previous mails: - When this issue happens, there is absolutely no other activity going on than the three active sessions I mentioned: the autovacuum worker with no wait event, and the two "SELECT COUNT(*) FROM <table>" related sessions with both a wait event, see the original post. There are no other active sessions doing any kind of editing work, no INSERTS, UPDATES, DELETES or whatever in this point of the processing that could generate WAL. - Note that this is a custom written geoprocessing workflow with just one user on the database, not a public database with hundreds of users emitting whatever unknown queries against the database, so I know exactly at what point in my processing flow it fails and what goes on then. - The database affected itself is just a few dozen GBs. While I appreciate, if I understand PostgreSQL and the concept of WAL good enough (I don't consider myself a PostgreSQL expert), that WAL might potentially exceed the size of the database when heavy editing is going, 890 GB of WAL being written seems like an anomaly given in the context of the first points. - This problem only first reared its head after the issues starting in PG14.2 related to SELECT COUNT(*) So, does your suggested option (b) still make sense in this context? If not, and we assume this is a bug needing reporting, what exact information will you guys need to pinpoint the issue besides the information already given? What is the best course of action? I have never before reported a bug for PostgreSQL, so I am slightly at loss as to what exact information you will need. E.g., besides your suggestion of activating 'log_checkpoints', what other suggestions for specific logging? I fully appreciate the main answer will be to submit the typical "smallest reproducible case", but that will be extremely hard in this particular case, as the geoprocessing workflow processing OpenStreetMap data goes through a whole chain of largely auto-generated SQL statements (based on settings in the input tool), that are nearly impossible to share. Although it is also again questionable if it is actually relevant, as the point where it fails only has the mentioned sessions and single SELECT COUNT(*) SQL statement going on. The issues is intermittent as well, so there wouldn't be guarantees it would reproduce on the first try, even if I could share it. I also appreciate I might need to hire an expert for some remote debugging, but before going that way, I appreciate some more insights. Marco Op 23-7-2022 om 17:33 schreef Tom Lane: > Marco Boeringa <marco@boeringa.demon.nl> writes: >> Ok, I found it out using 'sudo baobab'. It is the: >> 'var/lib/postgresql/14/main/pg_wal' >> folder that is filled up with 890 GB of data... causing the file system >> root to run out of space and Ubuntu opening the disk usage analyzer and >> a warning as a consequence. > The most likely explanations for this are > (a) misconfiguration of WAL archiving, so that the server thinks > it should keep WAL files till they've been archived, only that > never happens. > (b) inability to complete checkpoints for some reason, preventing > WAL files from being recycled. > > It doesn't look like you have wal_archiving on, so (a) *should* > be ruled out, but you never know. If there are a ton of "nnn.ready" > files underneath pg_wal then trouble here would be indicated. > > As for (b), you might try enabling log_checkpoints and seeing if > the log messages give any clue. > > regards, tom lane
pgsql-bugs by date: