Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? - Mailing list pgsql-bugs

From Marco Boeringa
Subject Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?
Date
Msg-id 6555d816-1e47-b5fe-84eb-656810e3e28d@boeringa.demon.nl
Whole thread Raw
In response to Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-bugs
Now its getting interesting. I had another, different failure mode now. 
The server closed unexpectedly, but without my "file system root" being 
entirely clogged like last time, and PostgreSQL in fact successfully 
recovering after the failure.

I had another look at my system and opened the 'postgresql-14-main.log', 
see the log below. Please note I did not attempt to manually stop an 
autovacuum session, so that entry must be some automatic generated 
message whenever PostgreSQL is unable to start autovacuum? Anyway, the 
interesting entry is the third one, that has the PANIC with a reference 
to the 'visibilitymap' that Tomas also referred to in relation to SELECT 
COUNT(*), so that at least ties this second, different failure, 
potentially to the previous issue (although it may still be an unrelated 
issue). Now to be clear about this, as can also be seen from the EXECUTE 
statements, there is no SELECT COUNT(*) going on here (the referenced 
'arc_update' prepared statement doesn't use it)

Any remarks related to this log?

Marco

2022-07-27 02:59:56.892 CEST [281070] ERROR:  canceling autovacuum task
2022-07-27 02:59:56.892 CEST [281070] CONTEXT:  while scanning block 
46845 of relation "pg_toast.pg_toast_359621860"  automatic vacuum of 
table "gis.pg_toast.pg_toast_359621860"

2022-07-27 03:02:49.721 CEST [281124] osm@gis PANIC:  wrong buffer 
passed to visibilitymap_clear
2022-07-27 03:02:49.721 CEST [281124] osm@gis STATEMENT:  EXECUTE 
arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE 
arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE 
arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE 
arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE 
arc_update(329349);EXECUTE arc_update(237168019);EXECUTE 
arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE 
arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE 
arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE 
arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE 
arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE 
arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE 
arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE 
arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE 
arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE 
arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE 
arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE 
arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE 
arc_update(323713375);EXECUTE arc_update(362377910);EXECUTE 
arc_update(306066948);EXECUTE arc_update(688704);EXECUTE 
arc_update(600560187);EXECUTE arc_update(213569619);EXECUTE 
arc_update(180294245);EXECUTE arc_update(343433621);EXECUTE 
arc_update(199994366);EXECUTE arc_update(314610973);EXECUTE 
arc_update(353722823);EXECUTE arc_update(222912592);EXECUTE 
arc_update(305662591);EXECUTE arc_update(361771102);EXECUTE 
arc_update(472591085);EXECUTE arc_update(146245966);
2022-07-27 04:07:14.422 CEST [1563] LOG:  server process (PID 281124) 
was terminated by signal 6: Aborted
2022-07-27 04:07:14.422 CEST [1563] DETAIL:  Failed process was running: 
EXECUTE arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE 
arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE 
arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE 
arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE 
arc_update(329349);EXECUTE arc_update(237168019);EXECUTE 
arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE 
arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE 
arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE 
arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE 
arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE 
arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE 
arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE 
arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE 
arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE 
arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE 
arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE 
arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE arc_u
2022-07-27 04:07:14.422 CEST [1563] LOG:  terminating any other active 
server processes
2022-07-27 04:07:18.978 CEST [1563] LOG:  all server processes 
terminated; reinitializing
2022-07-27 04:07:34.563 CEST [281625] LOG:  database system was 
interrupted; last known up at 2022-07-27 03:02:59 CEST
2022-07-27 04:07:39.646 CEST [281625] LOG:  database system was not 
properly shut down; automatic recovery in progress
2022-07-27 04:07:39.656 CEST [281625] LOG:  redo starts at 2595/7FB578D8
2022-07-27 04:09:43.329 CEST [281625] LOG:  invalid record length at 
259B/9375C7A0: wanted 24, got 0
2022-07-27 04:09:43.329 CEST [281625] LOG:  redo done at 259B/9375C6E0 
system usage: CPU: user: 71.84 s, system: 36.82 s, elapsed: 123.67 s
2022-07-27 04:10:23.425 CEST [1563] LOG:  database system is ready to 
accept connections


Op 25-7-2022 om 15:39 schreef Tomas Vondra:
>
> On 7/25/22 08:04, Marco Boeringa wrote:
>> To extend on this, two interesting questions that come to mind are:
>>
>> - Does running SELECT COUNT(*) create WAL?
>>
> Yes. An obvious example is updating the visibility map (which is always
> logged to WAL) or hint bits (which may be WAL logged). I'd also bet we
> may generate WAL for indexes, e.g. to kill deleted tuples.
>
>> - Is it potentially conceivable that there is a kind of cross-database
>> vulnerability *within one and the same PostgreSQL cluster*, where an
>> issue in one database causes the WAL in another database to no longer
>> successfully be written to disk during checkpoints? I have never seen
>> processing errors where PostgreSQL emitted true PostgreSQL errors with
>> error numbers cause issues like that and affect a second database in the
>> same cluster, but since no error is generated here, and there might be
>> some uncatched error, I wonder?
>>
>> I am especially asking the second question since, although I wrote there
>> is no edit activity going on potentially generating WAL in the affected
>> small database, which is true, there *was* processing on Planet sized
>> data going on in a second database in the same cluster. That certainly
>> *is* capable of generating 890GB of WAL if nothing is cleaned up during
>> checkpoints due to checkpoints failing.
>>
> WAL is a resource shared by all the databases in the cluster, so if that
> gets broken it's broken for everyone.
>
>
> regards
>



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17559: Inconsistent visibility in trigger function
Next
From: Yugo NAGATA
Date:
Subject: Re: BUG #17434: CREATE/DROP DATABASE can be executed in the same transaction with other commands