Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? - Mailing list pgsql-bugs
From | Marco Boeringa |
---|---|
Subject | Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? |
Date | |
Msg-id | 6555d816-1e47-b5fe-84eb-656810e3e28d@boeringa.demon.nl Whole thread Raw |
In response to | Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4? (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
List | pgsql-bugs |
Now its getting interesting. I had another, different failure mode now. The server closed unexpectedly, but without my "file system root" being entirely clogged like last time, and PostgreSQL in fact successfully recovering after the failure. I had another look at my system and opened the 'postgresql-14-main.log', see the log below. Please note I did not attempt to manually stop an autovacuum session, so that entry must be some automatic generated message whenever PostgreSQL is unable to start autovacuum? Anyway, the interesting entry is the third one, that has the PANIC with a reference to the 'visibilitymap' that Tomas also referred to in relation to SELECT COUNT(*), so that at least ties this second, different failure, potentially to the previous issue (although it may still be an unrelated issue). Now to be clear about this, as can also be seen from the EXECUTE statements, there is no SELECT COUNT(*) going on here (the referenced 'arc_update' prepared statement doesn't use it) Any remarks related to this log? Marco 2022-07-27 02:59:56.892 CEST [281070] ERROR: canceling autovacuum task 2022-07-27 02:59:56.892 CEST [281070] CONTEXT: while scanning block 46845 of relation "pg_toast.pg_toast_359621860" automatic vacuum of table "gis.pg_toast.pg_toast_359621860" 2022-07-27 03:02:49.721 CEST [281124] osm@gis PANIC: wrong buffer passed to visibilitymap_clear 2022-07-27 03:02:49.721 CEST [281124] osm@gis STATEMENT: EXECUTE arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE arc_update(329349);EXECUTE arc_update(237168019);EXECUTE arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE arc_update(323713375);EXECUTE arc_update(362377910);EXECUTE arc_update(306066948);EXECUTE arc_update(688704);EXECUTE arc_update(600560187);EXECUTE arc_update(213569619);EXECUTE arc_update(180294245);EXECUTE arc_update(343433621);EXECUTE arc_update(199994366);EXECUTE arc_update(314610973);EXECUTE arc_update(353722823);EXECUTE arc_update(222912592);EXECUTE arc_update(305662591);EXECUTE arc_update(361771102);EXECUTE arc_update(472591085);EXECUTE arc_update(146245966); 2022-07-27 04:07:14.422 CEST [1563] LOG: server process (PID 281124) was terminated by signal 6: Aborted 2022-07-27 04:07:14.422 CEST [1563] DETAIL: Failed process was running: EXECUTE arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE arc_update(329349);EXECUTE arc_update(237168019);EXECUTE arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE arc_u 2022-07-27 04:07:14.422 CEST [1563] LOG: terminating any other active server processes 2022-07-27 04:07:18.978 CEST [1563] LOG: all server processes terminated; reinitializing 2022-07-27 04:07:34.563 CEST [281625] LOG: database system was interrupted; last known up at 2022-07-27 03:02:59 CEST 2022-07-27 04:07:39.646 CEST [281625] LOG: database system was not properly shut down; automatic recovery in progress 2022-07-27 04:07:39.656 CEST [281625] LOG: redo starts at 2595/7FB578D8 2022-07-27 04:09:43.329 CEST [281625] LOG: invalid record length at 259B/9375C7A0: wanted 24, got 0 2022-07-27 04:09:43.329 CEST [281625] LOG: redo done at 259B/9375C6E0 system usage: CPU: user: 71.84 s, system: 36.82 s, elapsed: 123.67 s 2022-07-27 04:10:23.425 CEST [1563] LOG: database system is ready to accept connections Op 25-7-2022 om 15:39 schreef Tomas Vondra: > > On 7/25/22 08:04, Marco Boeringa wrote: >> To extend on this, two interesting questions that come to mind are: >> >> - Does running SELECT COUNT(*) create WAL? >> > Yes. An obvious example is updating the visibility map (which is always > logged to WAL) or hint bits (which may be WAL logged). I'd also bet we > may generate WAL for indexes, e.g. to kill deleted tuples. > >> - Is it potentially conceivable that there is a kind of cross-database >> vulnerability *within one and the same PostgreSQL cluster*, where an >> issue in one database causes the WAL in another database to no longer >> successfully be written to disk during checkpoints? I have never seen >> processing errors where PostgreSQL emitted true PostgreSQL errors with >> error numbers cause issues like that and affect a second database in the >> same cluster, but since no error is generated here, and there might be >> some uncatched error, I wonder? >> >> I am especially asking the second question since, although I wrote there >> is no edit activity going on potentially generating WAL in the affected >> small database, which is true, there *was* processing on Planet sized >> data going on in a second database in the same cluster. That certainly >> *is* capable of generating 890GB of WAL if nothing is cleaned up during >> checkpoints due to checkpoints failing. >> > WAL is a resource shared by all the databases in the cluster, so if that > gets broken it's broken for everyone. > > > regards >
pgsql-bugs by date: