Thread: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3?

"SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3?

From
Marco Boeringa
Date:
Hi,

Anyone else encountering this?

I have had issues with SELECT COUNT (*) in PG 14.2, that I think were 
supposed to be fixed in PG 14.3.

However, I have now seen PostgreSQL locking up with a SELECT COUNT (*) 
statement in PG 14.3 twice.

Note that in both cases, the code that issued this statement, had 
successfully processed similar statements against other tables multiple 
times before locking up.

When I look in pgAdmin, I see three "active" sessions, that never finish 
though (or are extremely slow...), and I do see some minor "Tuple out" 
activity sporadically (a few thousand tuples at a time):

- autovacuum: "VACUUM <TABLE_NAME>" with no Wait Event

- client backend: "SELECT COUNT (*) FROM <TABLE_NAME>" with Wait Event 
"IPC:ExecuteGather"

- parallel worker: "SELECT COUNT (*) FROM <TABLE_NAME>" with Wait Event 
"LWLock:BufferContent"

When this happens, it is impossible to stop the sessions, when I choose 
"Cancel Query" or "Terminate Session", pgAdmin returns "success" for the 
operation, yet the sessions remain visible in the pgAdmin window, even 
after clicking "refresh", something that definitely doesn't happen in 
ordinary situations.

Under "Locks" in pgAdmin, I see an "AccesShareLock" for the "client 
backend" and "parallel worker", and a "ShareUpdateExclusiveLock" for the 
"autovacuum" for the table involved.

Additionally, when this happens, my "File system root" of Ubuntu slowly 
starts filling up, until 100% full (growth from +/-630GB to 1TB), and 
PostgreSQL shuts down:

"server closed the connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request."

Note that the "logging collector" is set "off", and no logs are thus 
being kept (this is a test system, and I don't care much for the logs).

In order to get out this situation, I have needed to restore from backup.

My system:

- Windows Hyper-V virtualized Ubuntu 20.04.1 LTS

- PG 14.3 (Ubuntu 14.3-1.pgdg20.04+1)

- POSTGIS="3.2.1 5fae8e5" [EXTENSION] PGSQL="140" 
GEOS="3.8.0-CAPI-1.13.1 " PROJ="6.3.1" LIBXML="2.9.10" LIBJSON="0.13.1" 
LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"

All run on an HPZ840 system with 256 GB ECC RAM, dual Xeon E5-2680v4, 
Windows 10. 5x2TB NVMe configured as Windows "Storage Space" attached 
via a PCIe Express card.

Data: OpenStreetMap data for the entire Planet, +/- 1.4 TB database if 
processing finishes.

Marco




Re: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3?

From
Marco Boeringa
Date:

FWIIW, I have not been able to reproduce this issue in the latest 14.4 release up until now, with quite significant testing. So it seems this issue is finally fixed, although only time will tell for sure, as the issue was intermittent.

Marco



-------- Doorgestuurd bericht --------
Onderwerp: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3?
Datum: Thu, 16 Jun 2022 08:10:12 +0200
Van: Marco Boeringa <marco@boeringa.demon.nl>
Aan: pgsql-bugs@lists.postgresql.org


Hi,

Anyone else encountering this?

I have had issues with SELECT COUNT (*) in PG 14.2, that I think were supposed to be fixed in PG 14.3.

However, I have now seen PostgreSQL locking up with a SELECT COUNT (*) statement in PG 14.3 twice.

Note that in both cases, the code that issued this statement, had successfully processed similar statements against other tables multiple times before locking up.

When I look in pgAdmin, I see three "active" sessions, that never finish though (or are extremely slow...), and I do see some minor "Tuple out" activity sporadically (a few thousand tuples at a time):

- autovacuum: "VACUUM <TABLE_NAME>" with no Wait Event

- client backend: "SELECT COUNT (*) FROM <TABLE_NAME>" with Wait Event "IPC:ExecuteGather"

- parallel worker: "SELECT COUNT (*) FROM <TABLE_NAME>" with Wait Event "LWLock:BufferContent"

When this happens, it is impossible to stop the sessions, when I choose "Cancel Query" or "Terminate Session", pgAdmin returns "success" for the operation, yet the sessions remain visible in the pgAdmin window, even after clicking "refresh", something that definitely doesn't happen in ordinary situations.

Under "Locks" in pgAdmin, I see an "AccesShareLock" for the "client backend" and "parallel worker", and a "ShareUpdateExclusiveLock" for the "autovacuum" for the table involved.

Additionally, when this happens, my "File system root" of Ubuntu slowly starts filling up, until 100% full (growth from +/-630GB to 1TB), and PostgreSQL shuts down:

"server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request."

Note that the "logging collector" is set "off", and no logs are thus being kept (this is a test system, and I don't care much for the logs).

In order to get out this situation, I have needed to restore from backup.

My system:

- Windows Hyper-V virtualized Ubuntu 20.04.1 LTS

- PG 14.3 (Ubuntu 14.3-1.pgdg20.04+1)

- POSTGIS="3.2.1 5fae8e5" [EXTENSION] PGSQL="140" GEOS="3.8.0-CAPI-1.13.1 " PROJ="6.3.1" LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"

All run on an HPZ840 system with 256 GB ECC RAM, dual Xeon E5-2680v4, Windows 10. 5x2TB NVMe configured as Windows "Storage Space" attached via a PCIe Express card.

Data: OpenStreetMap data for the entire Planet, +/- 1.4 TB database if processing finishes.

Marco

Unfortunately, after more testing, it turns out this issue still persists in PostgreSQL 14.4.

I have now encountered exactly the same problem as described in the original issue below: One autovacuum session that never finishes with no wait event (or is just incredibly slow and not finishing after many hours although it should in minutes considering the relative small dataset and normal operation), and the "client backend" and "parallel worker" stuck on the same wait events as listed below  with the same "SELECT COUNT (*)" SQL statement.

One thing to note as well, besides this being workstation level hardware with ECC RAM, is that I now also activated 'pg_checksums' on the PostgreSQL databases, and reloaded all data, so all data should now have checksums.  No PostgreSQL error at all is generated via the ODBC connection I use to access and update the database when this happens and PostgreSQL appears stuck on the autovacuum. So I guess this now means I can now pretty much exclude a hardware error, and this must be some software issue, considering the checksums.

Marco



-------- Doorgestuurd bericht --------
Onderwerp: Re: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3?
Datum: Fri, 1 Jul 2022 11:49:47 +0200
Van: Marco Boeringa <marco@boeringa.demon.nl>
Aan: pgsql-bugs@lists.postgresql.org


FWIIW, I have not been able to reproduce this issue in the latest 14.4 release up until now, with quite significant testing. So it seems this issue is finally fixed, although only time will tell for sure, as the issue was intermittent.

Marco



-------- Doorgestuurd bericht --------
Onderwerp: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3?
Datum: Thu, 16 Jun 2022 08:10:12 +0200
Van: Marco Boeringa <marco@boeringa.demon.nl>
Aan: pgsql-bugs@lists.postgresql.org


Hi,

Anyone else encountering this?

I have had issues with SELECT COUNT (*) in PG 14.2, that I think were supposed to be fixed in PG 14.3.

However, I have now seen PostgreSQL locking up with a SELECT COUNT (*) statement in PG 14.3 twice.

Note that in both cases, the code that issued this statement, had successfully processed similar statements against other tables multiple times before locking up.

When I look in pgAdmin, I see three "active" sessions, that never finish though (or are extremely slow...), and I do see some minor "Tuple out" activity sporadically (a few thousand tuples at a time):

- autovacuum: "VACUUM <TABLE_NAME>" with no Wait Event

- client backend: "SELECT COUNT (*) FROM <TABLE_NAME>" with Wait Event "IPC:ExecuteGather"

- parallel worker: "SELECT COUNT (*) FROM <TABLE_NAME>" with Wait Event "LWLock:BufferContent"

When this happens, it is impossible to stop the sessions, when I choose "Cancel Query" or "Terminate Session", pgAdmin returns "success" for the operation, yet the sessions remain visible in the pgAdmin window, even after clicking "refresh", something that definitely doesn't happen in ordinary situations.

Under "Locks" in pgAdmin, I see an "AccesShareLock" for the "client backend" and "parallel worker", and a "ShareUpdateExclusiveLock" for the "autovacuum" for the table involved.

Additionally, when this happens, my "File system root" of Ubuntu slowly starts filling up, until 100% full (growth from +/-630GB to 1TB), and PostgreSQL shuts down:

"server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request."

Note that the "logging collector" is set "off", and no logs are thus being kept (this is a test system, and I don't care much for the logs).

In order to get out this situation, I have needed to restore from backup.

My system:

- Windows Hyper-V virtualized Ubuntu 20.04.1 LTS

- PG 14.3 (Ubuntu 14.3-1.pgdg20.04+1)

- POSTGIS="3.2.1 5fae8e5" [EXTENSION] PGSQL="140" GEOS="3.8.0-CAPI-1.13.1 " PROJ="6.3.1" LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"

All run on an HPZ840 system with 256 GB ECC RAM, dual Xeon E5-2680v4, Windows 10. 5x2TB NVMe configured as Windows "Storage Space" attached via a PCIe Express card.

Data: OpenStreetMap data for the entire Planet, +/- 1.4 TB database if processing finishes.

Marco

On Fri, Jul 22, 2022 at 09:56:06AM +0200, Marco Boeringa wrote:
> Unfortunately, after more testing, it turns out this issue still persists in
> PostgreSQL 14.4.
> 
> I have now encountered exactly the same problem as described in the original
> issue below: One autovacuum session that never finishes with no wait event (or
> is just incredibly slow and not finishing after many hours although it should
> in minutes considering the relative small dataset and normal operation), and
> the "client backend" and "parallel worker" stuck on the same wait events as
> listed below  with the same "SELECT COUNT (*)" SQL statement.
> 
> One thing to note as well, besides this being workstation level hardware with
> ECC RAM, is that I now also activated 'pg_checksums' on the PostgreSQL
> databases, and reloaded all data, so all data should now have checksums.  No
> PostgreSQL error at all is generated via the ODBC connection I use to access
> and update the database when this happens and PostgreSQL appears stuck on the
> autovacuum. So I guess this now means I can now pretty much exclude a hardware
> error, and this must be some software issue, considering the checksums.

You might want to run these queries and show us the output, in case it
suggests a cause:

    SELECT version();
    
    -- non-default server settings
    SELECT name, current_setting(name), source
    FROM pg_settings
    WHERE source NOT IN ('default', 'override');

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson




Hi Bruce,

As requested.

Note that this is on very capable hardware in the form of an HP Z840 
workstation with NVMe. I have processed the entire OpenStreetMap 
"Planet" file with this hardware and configuration with success, it 
actually fails on a much smaller Geofabrik "Italy" extract:

version |

---------------------------------------------------------------------------------------------------------------------------------+
PostgreSQL 14.4 (Ubuntu 14.4-1.pgdg20.04+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit|

|name |current_setting |source              |

|--------------------------------|-----------------------------------------------------------------|--------------------|
|application_name                |DBeaver 22.1.2 - SQLEditor 
<Script-2.sql>                        |session             |
|autovacuum_work_mem |8000kB |configuration file  |
|checkpoint_completion_target |0.9 |configuration file  |
|checkpoint_timeout |5min |configuration file  |
|client_encoding |UTF8 |client              |
|cluster_name |14/main |configuration file  |
|cursor_tuple_fraction |1 |configuration file  |
|DateStyle                       |ISO, DMY |client              |
|default_statistics_target |1000 |configuration file  |
|default_text_search_config |pg_catalog.english |configuration file  |
|dynamic_shared_memory_type |posix |configuration file  |
|effective_cache_size |100GB |configuration file  |
|effective_io_concurrency |500 |configuration file  |
|extra_float_digits |3 |session             |
|jit |on |configuration file  |
|lc_messages |en_US.UTF-8 |configuration file  |
|lc_monetary |nl_NL.UTF-8 |configuration file  |
|lc_numeric |nl_NL.UTF-8 |configuration file  |
|lc_time |nl_NL.UTF-8 |configuration file  |
|listen_addresses                |localhost, |configuration file |
|log_destination |stderr |configuration file  |
|log_line_prefix                 |%m [%p] %q%u@%d |configuration file  |
|log_rotation_age |1h |configuration file  |
|log_rotation_size |10000kB |configuration file  |
|log_statement |none |configuration file  |
|log_timezone |Europe/Amsterdam |configuration file  |
|log_truncate_on_rotation |on |configuration file  |
|logging_collector |off |configuration file  |
|maintenance_io_concurrency |500 |configuration file  |
|maintenance_work_mem |8000MB |configuration file  |
|max_connections |1000 |configuration file  |
|max_parallel_maintenance_workers|28 |configuration file  |
|max_parallel_workers |128 |configuration file  |
|max_parallel_workers_per_gather |56 |configuration file  |
|max_stack_depth |2MB |environment variable|
|max_wal_senders |0 |configuration file  |
|max_wal_size |25GB |configuration file  |
|max_worker_processes |128 |configuration file  |
|min_wal_size |1GB |configuration file  |
|parallel_leader_participation |off |configuration file  |
|parallel_setup_cost |100 |configuration file  |
|parallel_tuple_cost |0.025 |configuration file  |
|password_encryption |md5 |configuration file  |
|port |5433 |configuration file  |
|random_page_cost |1 |configuration file  |
|search_path                     |osm, public, "$user" 
|session             |
|shared_buffers |75GB |configuration file  |
|ssl |off |configuration file  |
|ssl_cert_file |server.crt |configuration file  |
|ssl_key_file |server.key |configuration file  |
|synchronous_commit |off |configuration file  |
|temp_buffers |8000MB |configuration file  |
|temp_tablespaces |osm_i |configuration file  |
|TimeZone |Europe/Berlin |client              |
|track_activity_query_size |10000B |configuration file  |
|wal_compression |on |configuration file  |
|wal_level |minimal |configuration file  |
|work_mem |2000MB |configuration file  |


Op 22-7-2022 om 21:07 schreef Bruce Momjian:
> On Fri, Jul 22, 2022 at 09:56:06AM +0200, Marco Boeringa wrote:
>> Unfortunately, after more testing, it turns out this issue still persists in
>> PostgreSQL 14.4.
>>
>> I have now encountered exactly the same problem as described in the original
>> issue below: One autovacuum session that never finishes with no wait event (or
>> is just incredibly slow and not finishing after many hours although it should
>> in minutes considering the relative small dataset and normal operation), and
>> the "client backend" and "parallel worker" stuck on the same wait events as
>> listed below  with the same "SELECT COUNT (*)" SQL statement.
>>
>> One thing to note as well, besides this being workstation level hardware with
>> ECC RAM, is that I now also activated 'pg_checksums' on the PostgreSQL
>> databases, and reloaded all data, so all data should now have checksums.  No
>> PostgreSQL error at all is generated via the ODBC connection I use to access
>> and update the database when this happens and PostgreSQL appears stuck on the
>> autovacuum. So I guess this now means I can now pretty much exclude a hardware
>> error, and this must be some software issue, considering the checksums.
> You might want to run these queries and show us the output, in case it
> suggests a cause:
>
>     SELECT version();
>     
>     -- non-default server settings
>     SELECT name, current_setting(name), source
>     FROM pg_settings
>     WHERE source NOT IN ('default', 'override');
>



Unfortunately, it now also again, as I have seen before in the same 
situation, took my PG14.4 server down by filling up my entire Ubuntu 
file system root, the PostgreSQL server process shut down the connection 
automatically as a consequence.

It is not clear to me what is actually being written out that causes 
this. I am not really an Ubuntu / Linux expert. Any suggestions for 
commands to quickly find out where PostgreSQL may have dumped 100's of 
GB of data in file system root, as I had over 700 GB free space there?

Op 22-7-2022 om 21:07 schreef Bruce Momjian:
> On Fri, Jul 22, 2022 at 09:56:06AM +0200, Marco Boeringa wrote:
>> Unfortunately, after more testing, it turns out this issue still persists in
>> PostgreSQL 14.4.
>>
>> I have now encountered exactly the same problem as described in the original
>> issue below: One autovacuum session that never finishes with no wait event (or
>> is just incredibly slow and not finishing after many hours although it should
>> in minutes considering the relative small dataset and normal operation), and
>> the "client backend" and "parallel worker" stuck on the same wait events as
>> listed below  with the same "SELECT COUNT (*)" SQL statement.
>>
>> One thing to note as well, besides this being workstation level hardware with
>> ECC RAM, is that I now also activated 'pg_checksums' on the PostgreSQL
>> databases, and reloaded all data, so all data should now have checksums.  No
>> PostgreSQL error at all is generated via the ODBC connection I use to access
>> and update the database when this happens and PostgreSQL appears stuck on the
>> autovacuum. So I guess this now means I can now pretty much exclude a hardware
>> error, and this must be some software issue, considering the checksums.
> You might want to run these queries and show us the output, in case it
> suggests a cause:
>
>     SELECT version();
>     
>     -- non-default server settings
>     SELECT name, current_setting(name), source
>     FROM pg_settings
>     WHERE source NOT IN ('default', 'override');
>



Ok, I found it out using 'sudo baobab'. It is the:

'var/lib/postgresql/14/main/pg_wal'

folder that is filled up with 890 GB of data... causing the file system 
root to run out of space and Ubuntu opening the disk usage analyzer and 
a warning as a consequence.

I have never seen this happen under normal operation when I am not 
seeing this issue popup. So this is something to do with WAL.

Op 22-7-2022 om 23:08 schreef Marco Boeringa:
> Unfortunately, it now also again, as I have seen before in the same 
> situation, took my PG14.4 server down by filling up my entire Ubuntu 
> file system root, the PostgreSQL server process shut down the 
> connection automatically as a consequence.
>
> It is not clear to me what is actually being written out that causes 
> this. I am not really an Ubuntu / Linux expert. Any suggestions for 
> commands to quickly find out where PostgreSQL may have dumped 100's of 
> GB of data in file system root, as I had over 700 GB free space there?
>
> Op 22-7-2022 om 21:07 schreef Bruce Momjian:
>> On Fri, Jul 22, 2022 at 09:56:06AM +0200, Marco Boeringa wrote:
>>> Unfortunately, after more testing, it turns out this issue still 
>>> persists in
>>> PostgreSQL 14.4.
>>>
>>> I have now encountered exactly the same problem as described in the 
>>> original
>>> issue below: One autovacuum session that never finishes with no wait 
>>> event (or
>>> is just incredibly slow and not finishing after many hours although 
>>> it should
>>> in minutes considering the relative small dataset and normal 
>>> operation), and
>>> the "client backend" and "parallel worker" stuck on the same wait 
>>> events as
>>> listed below  with the same "SELECT COUNT (*)" SQL statement.
>>>
>>> One thing to note as well, besides this being workstation level 
>>> hardware with
>>> ECC RAM, is that I now also activated 'pg_checksums' on the PostgreSQL
>>> databases, and reloaded all data, so all data should now have 
>>> checksums.  No
>>> PostgreSQL error at all is generated via the ODBC connection I use 
>>> to access
>>> and update the database when this happens and PostgreSQL appears 
>>> stuck on the
>>> autovacuum. So I guess this now means I can now pretty much exclude 
>>> a hardware
>>> error, and this must be some software issue, considering the checksums.
>> You might want to run these queries and show us the output, in case it
>> suggests a cause:
>>
>>     SELECT version();
>>
>>     -- non-default server settings
>>     SELECT name, current_setting(name), source
>>     FROM pg_settings
>>     WHERE source NOT IN ('default', 'override');
>>



Marco Boeringa <marco@boeringa.demon.nl> writes:
> Ok, I found it out using 'sudo baobab'. It is the:
> 'var/lib/postgresql/14/main/pg_wal'

> folder that is filled up with 890 GB of data... causing the file system 
> root to run out of space and Ubuntu opening the disk usage analyzer and 
> a warning as a consequence.

The most likely explanations for this are
(a) misconfiguration of WAL archiving, so that the server thinks
it should keep WAL files till they've been archived, only that
never happens.
(b) inability to complete checkpoints for some reason, preventing
WAL files from being recycled.

It doesn't look like you have wal_archiving on, so (a) *should*
be ruled out, but you never know.  If there are a ton of "nnn.ready"
files underneath pg_wal then trouble here would be indicated.

As for (b), you might try enabling log_checkpoints and seeing if
the log messages give any clue.

            regards, tom lane



Hi Tom,

Thanks for the response, but please review the exact conditions I 
already mentioned in my previous mails:

- When this issue happens, there is absolutely no other activity going 
on than the three active sessions I mentioned: the autovacuum worker 
with no wait event, and the two "SELECT COUNT(*) FROM <table>" related 
sessions with both a wait event, see the original post. There are no 
other active sessions doing any kind of editing work, no INSERTS, 
UPDATES, DELETES or whatever in this point of the processing that could 
generate WAL.

- Note that this is a custom written geoprocessing workflow with just 
one user on the database, not a public database with hundreds of users 
emitting whatever unknown queries against the database, so I know 
exactly at what point in my processing flow it fails and what goes on then.

- The database affected itself is just a few dozen GBs. While I 
appreciate, if I understand PostgreSQL and the concept of WAL good 
enough (I don't consider myself a PostgreSQL expert), that WAL might 
potentially exceed the size of the database when heavy editing is going, 
890 GB of WAL being written seems like an anomaly given in the context 
of the first points.

- This problem only first reared its head after the issues starting in 
PG14.2 related to SELECT COUNT(*)

So, does your suggested option (b) still make sense in this context?

If not, and we assume this is a bug needing reporting, what exact 
information will you guys need to pinpoint the issue besides the 
information already given? What is the best course of action? I have 
never before reported a bug for PostgreSQL, so I am slightly at loss as 
to what exact information you will need. E.g., besides your suggestion 
of activating 'log_checkpoints', what other suggestions for specific 
logging?

I fully appreciate the main answer will be to submit the typical 
"smallest reproducible case", but that will be extremely hard in this 
particular case, as the geoprocessing workflow processing OpenStreetMap 
data goes through a whole chain of largely auto-generated SQL statements 
(based on settings in the input tool), that are nearly impossible to 
share. Although it is also again questionable if it is actually 
relevant, as the point where it fails only has the mentioned sessions 
and single SELECT COUNT(*) SQL statement going on. The issues is 
intermittent as well, so there wouldn't be guarantees it would reproduce 
on the first try, even if I could share it.

I also appreciate I might need to hire an expert for some remote 
debugging, but before going that way, I appreciate some more insights.

Marco

Op 23-7-2022 om 17:33 schreef Tom Lane:
> Marco Boeringa <marco@boeringa.demon.nl> writes:
>> Ok, I found it out using 'sudo baobab'. It is the:
>> 'var/lib/postgresql/14/main/pg_wal'
>> folder that is filled up with 890 GB of data... causing the file system
>> root to run out of space and Ubuntu opening the disk usage analyzer and
>> a warning as a consequence.
> The most likely explanations for this are
> (a) misconfiguration of WAL archiving, so that the server thinks
> it should keep WAL files till they've been archived, only that
> never happens.
> (b) inability to complete checkpoints for some reason, preventing
> WAL files from being recycled.
>
> It doesn't look like you have wal_archiving on, so (a) *should*
> be ruled out, but you never know.  If there are a ton of "nnn.ready"
> files underneath pg_wal then trouble here would be indicated.
>
> As for (b), you might try enabling log_checkpoints and seeing if
> the log messages give any clue.
>
>             regards, tom lane



To extend on this, two interesting questions that come to mind are:

- Does running SELECT COUNT(*) create WAL?

- Is it potentially conceivable that there is a kind of cross-database 
vulnerability *within one and the same PostgreSQL cluster*, where an 
issue in one database causes the WAL in another database to no longer 
successfully be written to disk during checkpoints? I have never seen 
processing errors where PostgreSQL emitted true PostgreSQL errors with 
error numbers cause issues like that and affect a second database in the 
same cluster, but since no error is generated here, and there might be 
some uncatched error, I wonder?

I am especially asking the second question since, although I wrote there 
is no edit activity going on potentially generating WAL in the affected 
small database, which is true, there *was* processing on Planet sized 
data going on in a second database in the same cluster. That certainly 
*is* capable of generating 890GB of WAL if nothing is cleaned up during 
checkpoints due to checkpoints failing.

Marco

Op 24-7-2022 om 09:55 schreef Marco Boeringa:
> Hi Tom,
>
> Thanks for the response, but please review the exact conditions I 
> already mentioned in my previous mails:
>
> - When this issue happens, there is absolutely no other activity going 
> on than the three active sessions I mentioned: the autovacuum worker 
> with no wait event, and the two "SELECT COUNT(*) FROM <table>" related 
> sessions with both a wait event, see the original post. There are no 
> other active sessions doing any kind of editing work, no INSERTS, 
> UPDATES, DELETES or whatever in this point of the processing that 
> could generate WAL.
>
> - Note that this is a custom written geoprocessing workflow with just 
> one user on the database, not a public database with hundreds of users 
> emitting whatever unknown queries against the database, so I know 
> exactly at what point in my processing flow it fails and what goes on 
> then.
>
> - The database affected itself is just a few dozen GBs. While I 
> appreciate, if I understand PostgreSQL and the concept of WAL good 
> enough (I don't consider myself a PostgreSQL expert), that WAL might 
> potentially exceed the size of the database when heavy editing is 
> going, 890 GB of WAL being written seems like an anomaly given in the 
> context of the first points.
>
> - This problem only first reared its head after the issues starting in 
> PG14.2 related to SELECT COUNT(*)
>
> So, does your suggested option (b) still make sense in this context?
>
> If not, and we assume this is a bug needing reporting, what exact 
> information will you guys need to pinpoint the issue besides the 
> information already given? What is the best course of action? I have 
> never before reported a bug for PostgreSQL, so I am slightly at loss 
> as to what exact information you will need. E.g., besides your 
> suggestion of activating 'log_checkpoints', what other suggestions for 
> specific logging?
>
> I fully appreciate the main answer will be to submit the typical 
> "smallest reproducible case", but that will be extremely hard in this 
> particular case, as the geoprocessing workflow processing 
> OpenStreetMap data goes through a whole chain of largely 
> auto-generated SQL statements (based on settings in the input tool), 
> that are nearly impossible to share. Although it is also again 
> questionable if it is actually relevant, as the point where it fails 
> only has the mentioned sessions and single SELECT COUNT(*) SQL 
> statement going on. The issues is intermittent as well, so there 
> wouldn't be guarantees it would reproduce on the first try, even if I 
> could share it.
>
> I also appreciate I might need to hire an expert for some remote 
> debugging, but before going that way, I appreciate some more insights.
>
> Marco
>
> Op 23-7-2022 om 17:33 schreef Tom Lane:
>> Marco Boeringa <marco@boeringa.demon.nl> writes:
>>> Ok, I found it out using 'sudo baobab'. It is the:
>>> 'var/lib/postgresql/14/main/pg_wal'
>>> folder that is filled up with 890 GB of data... causing the file system
>>> root to run out of space and Ubuntu opening the disk usage analyzer and
>>> a warning as a consequence.
>> The most likely explanations for this are
>> (a) misconfiguration of WAL archiving, so that the server thinks
>> it should keep WAL files till they've been archived, only that
>> never happens.
>> (b) inability to complete checkpoints for some reason, preventing
>> WAL files from being recycled.
>>
>> It doesn't look like you have wal_archiving on, so (a) *should*
>> be ruled out, but you never know.  If there are a ton of "nnn.ready"
>> files underneath pg_wal then trouble here would be indicated.
>>
>> As for (b), you might try enabling log_checkpoints and seeing if
>> the log messages give any clue.
>>
>>             regards, tom lane




On 7/25/22 08:04, Marco Boeringa wrote:
> To extend on this, two interesting questions that come to mind are:
> 
> - Does running SELECT COUNT(*) create WAL?
> 

Yes. An obvious example is updating the visibility map (which is always
logged to WAL) or hint bits (which may be WAL logged). I'd also bet we
may generate WAL for indexes, e.g. to kill deleted tuples.

> - Is it potentially conceivable that there is a kind of cross-database
> vulnerability *within one and the same PostgreSQL cluster*, where an
> issue in one database causes the WAL in another database to no longer
> successfully be written to disk during checkpoints? I have never seen
> processing errors where PostgreSQL emitted true PostgreSQL errors with
> error numbers cause issues like that and affect a second database in the
> same cluster, but since no error is generated here, and there might be
> some uncatched error, I wonder?
> 
> I am especially asking the second question since, although I wrote there
> is no edit activity going on potentially generating WAL in the affected
> small database, which is true, there *was* processing on Planet sized
> data going on in a second database in the same cluster. That certainly
> *is* capable of generating 890GB of WAL if nothing is cleaned up during
> checkpoints due to checkpoints failing.
> 

WAL is a resource shared by all the databases in the cluster, so if that
gets broken it's broken for everyone.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Thanks Tomas for the explanation.

This all still points to a bug then, with WAL being affected, and 
possibly the processing of 'Planet' data in the secondary database that 
did not show an issue, being responsible for the enormous amount of WAL 
being written due to checkpoint failures. Although the latter is still 
speculation, it may be caused by another as of yet to determine cause.

To be absolutely clear about this: I have never seen so much WAL being 
generated with Planet processing in ordinary situations when this issue 
doesn't rear its head. In fact, I am currently processing Planet again, 
and the total size of files on "file system root" has not been over 
about 250GB, with still 750GB free space. This is what I see under 
ordinary situations.

So I am also pretty much convinced there is no misconfiguration of the 
WAL settings in my 'posgresql.conf' file.

Any suggestions for a further course of action, or do you people have at 
least enough info for now to give it a first try to find out what might 
be wrong?

Marco

Op 25-7-2022 om 15:39 schreef Tomas Vondra:
>
> On 7/25/22 08:04, Marco Boeringa wrote:
>> To extend on this, two interesting questions that come to mind are:
>>
>> - Does running SELECT COUNT(*) create WAL?
>>
> Yes. An obvious example is updating the visibility map (which is always
> logged to WAL) or hint bits (which may be WAL logged). I'd also bet we
> may generate WAL for indexes, e.g. to kill deleted tuples.
>
>> - Is it potentially conceivable that there is a kind of cross-database
>> vulnerability *within one and the same PostgreSQL cluster*, where an
>> issue in one database causes the WAL in another database to no longer
>> successfully be written to disk during checkpoints? I have never seen
>> processing errors where PostgreSQL emitted true PostgreSQL errors with
>> error numbers cause issues like that and affect a second database in the
>> same cluster, but since no error is generated here, and there might be
>> some uncatched error, I wonder?
>>
>> I am especially asking the second question since, although I wrote there
>> is no edit activity going on potentially generating WAL in the affected
>> small database, which is true, there *was* processing on Planet sized
>> data going on in a second database in the same cluster. That certainly
>> *is* capable of generating 890GB of WAL if nothing is cleaned up during
>> checkpoints due to checkpoints failing.
>>
> WAL is a resource shared by all the databases in the cluster, so if that
> gets broken it's broken for everyone.
>
>
> regards
>



Now its getting interesting. I had another, different failure mode now. 
The server closed unexpectedly, but without my "file system root" being 
entirely clogged like last time, and PostgreSQL in fact successfully 
recovering after the failure.

I had another look at my system and opened the 'postgresql-14-main.log', 
see the log below. Please note I did not attempt to manually stop an 
autovacuum session, so that entry must be some automatic generated 
message whenever PostgreSQL is unable to start autovacuum? Anyway, the 
interesting entry is the third one, that has the PANIC with a reference 
to the 'visibilitymap' that Tomas also referred to in relation to SELECT 
COUNT(*), so that at least ties this second, different failure, 
potentially to the previous issue (although it may still be an unrelated 
issue). Now to be clear about this, as can also be seen from the EXECUTE 
statements, there is no SELECT COUNT(*) going on here (the referenced 
'arc_update' prepared statement doesn't use it)

Any remarks related to this log?

Marco

2022-07-27 02:59:56.892 CEST [281070] ERROR:  canceling autovacuum task
2022-07-27 02:59:56.892 CEST [281070] CONTEXT:  while scanning block 
46845 of relation "pg_toast.pg_toast_359621860"  automatic vacuum of 
table "gis.pg_toast.pg_toast_359621860"

2022-07-27 03:02:49.721 CEST [281124] osm@gis PANIC:  wrong buffer 
passed to visibilitymap_clear
2022-07-27 03:02:49.721 CEST [281124] osm@gis STATEMENT:  EXECUTE 
arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE 
arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE 
arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE 
arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE 
arc_update(329349);EXECUTE arc_update(237168019);EXECUTE 
arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE 
arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE 
arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE 
arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE 
arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE 
arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE 
arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE 
arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE 
arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE 
arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE 
arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE 
arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE 
arc_update(323713375);EXECUTE arc_update(362377910);EXECUTE 
arc_update(306066948);EXECUTE arc_update(688704);EXECUTE 
arc_update(600560187);EXECUTE arc_update(213569619);EXECUTE 
arc_update(180294245);EXECUTE arc_update(343433621);EXECUTE 
arc_update(199994366);EXECUTE arc_update(314610973);EXECUTE 
arc_update(353722823);EXECUTE arc_update(222912592);EXECUTE 
arc_update(305662591);EXECUTE arc_update(361771102);EXECUTE 
arc_update(472591085);EXECUTE arc_update(146245966);
2022-07-27 04:07:14.422 CEST [1563] LOG:  server process (PID 281124) 
was terminated by signal 6: Aborted
2022-07-27 04:07:14.422 CEST [1563] DETAIL:  Failed process was running: 
EXECUTE arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE 
arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE 
arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE 
arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE 
arc_update(329349);EXECUTE arc_update(237168019);EXECUTE 
arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE 
arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE 
arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE 
arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE 
arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE 
arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE 
arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE 
arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE 
arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE 
arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE 
arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE 
arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE arc_u
2022-07-27 04:07:14.422 CEST [1563] LOG:  terminating any other active 
server processes
2022-07-27 04:07:18.978 CEST [1563] LOG:  all server processes 
terminated; reinitializing
2022-07-27 04:07:34.563 CEST [281625] LOG:  database system was 
interrupted; last known up at 2022-07-27 03:02:59 CEST
2022-07-27 04:07:39.646 CEST [281625] LOG:  database system was not 
properly shut down; automatic recovery in progress
2022-07-27 04:07:39.656 CEST [281625] LOG:  redo starts at 2595/7FB578D8
2022-07-27 04:09:43.329 CEST [281625] LOG:  invalid record length at 
259B/9375C7A0: wanted 24, got 0
2022-07-27 04:09:43.329 CEST [281625] LOG:  redo done at 259B/9375C6E0 
system usage: CPU: user: 71.84 s, system: 36.82 s, elapsed: 123.67 s
2022-07-27 04:10:23.425 CEST [1563] LOG:  database system is ready to 
accept connections


Op 25-7-2022 om 15:39 schreef Tomas Vondra:
>
> On 7/25/22 08:04, Marco Boeringa wrote:
>> To extend on this, two interesting questions that come to mind are:
>>
>> - Does running SELECT COUNT(*) create WAL?
>>
> Yes. An obvious example is updating the visibility map (which is always
> logged to WAL) or hint bits (which may be WAL logged). I'd also bet we
> may generate WAL for indexes, e.g. to kill deleted tuples.
>
>> - Is it potentially conceivable that there is a kind of cross-database
>> vulnerability *within one and the same PostgreSQL cluster*, where an
>> issue in one database causes the WAL in another database to no longer
>> successfully be written to disk during checkpoints? I have never seen
>> processing errors where PostgreSQL emitted true PostgreSQL errors with
>> error numbers cause issues like that and affect a second database in the
>> same cluster, but since no error is generated here, and there might be
>> some uncatched error, I wonder?
>>
>> I am especially asking the second question since, although I wrote there
>> is no edit activity going on potentially generating WAL in the affected
>> small database, which is true, there *was* processing on Planet sized
>> data going on in a second database in the same cluster. That certainly
>> *is* capable of generating 890GB of WAL if nothing is cleaned up during
>> checkpoints due to checkpoints failing.
>>
> WAL is a resource shared by all the databases in the cluster, so if that
> gets broken it's broken for everyone.
>
>
> regards
>