Thread: pg16: invalid page/page verification failed

pg16: invalid page/page verification failed

From

Justin Pryzby

Date:

05 October 2023, 16:45:18

On an instance running pg16.0:

log_time         | 2023-10-05 10:03:00.014-05
backend_type     | autovacuum worker
left             | page verification failed, calculated checksum 5074 but expected 5050
context          | while scanning block 119 of relation "public.postgres_log_2023_10_05_0900"

This is the only error I've seen so far, and for all I know there's a
issue on the storage behind the VM, or a cosmic ray hit.  But I moved
the table out of the way and saved a copy of get_raw_page() in case
someone wants to ask about it.

 public | BROKEN_postgres_log_2023_10_05_0900 | table | postgres | permanent   | heap | 1664 kB

This table is what it sounds like: a partition into which CSV logs are
COPY'ed.  It would've been created around 8am.  There's no special
params set for the table nor for autovacuum.

Although we have a ZFS tablespace, these tables aren't on it, and
full_page_writes=on.  There's no crashes, and the instance has been up
since it was pg_upgraded from v15.4 on sep25.

pg_stat_all_tables indicates that the table was never (successfully)
vacuumed.

This was compiled to RPM on centos7, and might include a few commits
made since v16.0.

postgres=# SELECT * FROM heap_page_item_attrs(get_raw_page(801594131::regclass::text, 119), 801594131);
 lp  | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits |
t_oid| t_attrs 
 
   1 |   2304 |        1 |     16 |        |        |          |        |             |            |        |        |
    | 
 
   2 |   8160 |        1 |     16 |        |        |          |        |             |            |        |        |
    | 
 
   3 |   8144 |        1 |     16 |        |        |          |        |             |            |        |        |
    | 
 
...all the same except for lp_off...
 365 |   2352 |        1 |     16 |        |        |          |        |             |            |        |        |
    | 
 
 366 |   2336 |        1 |     16 |        |        |          |        |             |            |        |        |
    | 
 
 367 |   2320 |        1 |     16 |        |        |          |        |             |            |        |        |
    | 
 

postgres=# SELECT FROM (SELECT tuple_data_split(801594131, t_data, t_infomask, t_infomask2, t_bits) a FROM
heap_page_items(get_raw_page(801594131::regclass::text,119))) WHERE a IS NOT NULL;
 
WARNING:  page verification failed, calculated checksum 5074 but expected 5050
(0 rows)

-- 
Justin

Re: pg16: invalid page/page verification failed

From

Matthias van de Meent

Date:

05 October 2023, 17:16:31

On Thu, 5 Oct 2023 at 18:48, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On an instance running pg16.0:
>
> log_time         | 2023-10-05 10:03:00.014-05
> backend_type     | autovacuum worker
> left             | page verification failed, calculated checksum 5074 but expected 5050
> context          | while scanning block 119 of relation "public.postgres_log_2023_10_05_0900"
>
> This is the only error I've seen so far, and for all I know there's a
> issue on the storage behind the VM, or a cosmic ray hit.  But I moved
> the table out of the way and saved a copy of get_raw_page() in case
> someone wants to ask about it.
>
> postgres=# SELECT * FROM heap_page_item_attrs(get_raw_page(801594131::regclass::text, 119), 801594131);
>  lp  | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits
|t_oid | t_attrs
 
>    1 |   2304 |        1 |     16 |        |        |          |        |             |            |        |
|      |
 
>    2 |   8160 |        1 |     16 |        |        |          |        |             |            |        |
|      |
 
>    3 |   8144 |        1 |     16 |        |        |          |        |             |            |        |
|      |
 
> ...all the same except for lp_off...
>  365 |   2352 |        1 |     16 |        |        |          |        |             |            |        |
|      |
 
>  366 |   2336 |        1 |     16 |        |        |          |        |             |            |        |
|      |
 
>  367 |   2320 |        1 |     16 |        |        |          |        |             |            |        |
|      |
 

That's not a HEAP page; it looks more like a btree page: lp_len is too
short for heap (which starts at lp_len = 24), and there are too many
line pointers for an 8KiB heap page. btree often has lp_len of 16: 8
bytes indextuple header, one maxalign of data (e.g. int or bigint).

So, assuming it's a block of a different relation kind, then it's also
likely it was originally located elsewhere in that other relation,
indeed causing the checksum failure. You can further validate this by
looking at the page header's pd_special value - if it is 8176, that'd
be another indicator for it being a btree.

Kind regards,

Matthias van de Meent.

Re: pg16: invalid page/page verification failed

From

Justin Pryzby

Date:

05 October 2023, 18:48:00

On Thu, Oct 05, 2023 at 07:16:31PM +0200, Matthias van de Meent wrote:
> On Thu, 5 Oct 2023 at 18:48, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On an instance running pg16.0:
> >
> > log_time         | 2023-10-05 10:03:00.014-05
> > backend_type     | autovacuum worker
> > left             | page verification failed, calculated checksum 5074 but expected 5050
> > context          | while scanning block 119 of relation "public.postgres_log_2023_10_05_0900"
> >
> > This is the only error I've seen so far, and for all I know there's a
> > issue on the storage behind the VM, or a cosmic ray hit.  But I moved
> > the table out of the way and saved a copy of get_raw_page() in case
> > someone wants to ask about it.
> >
> > postgres=# SELECT * FROM heap_page_item_attrs(get_raw_page(801594131::regclass::text, 119), 801594131);
> >  lp  | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |
t_bits| t_oid | t_attrs
 
> >    1 |   2304 |        1 |     16 |        |        |          |        |             |            |        |
|       |
 
> >    2 |   8160 |        1 |     16 |        |        |          |        |             |            |        |
|       |
 
> >    3 |   8144 |        1 |     16 |        |        |          |        |             |            |        |
|       |
 
> > ...all the same except for lp_off...
> >  365 |   2352 |        1 |     16 |        |        |          |        |             |            |        |
|       |
 
> >  366 |   2336 |        1 |     16 |        |        |          |        |             |            |        |
|       |
 
> >  367 |   2320 |        1 |     16 |        |        |          |        |             |            |        |
|       |
 
> 
> That's not a HEAP page; it looks more like a btree page: lp_len is too
> short for heap (which starts at lp_len = 24), and there are too many
> line pointers for an 8KiB heap page. btree often has lp_len of 16: 8
> bytes indextuple header, one maxalign of data (e.g. int or bigint).
> 
> So, assuming it's a block of a different relation kind, then it's also
> likely it was originally located elsewhere in that other relation,
> indeed causing the checksum failure. You can further validate this by
> looking at the page header's pd_special value - if it is 8176, that'd
> be another indicator for it being a btree.

Nice point.

postgres=# SET ignore_checksum_failure=on; SELECT * FROM generate_series(115,119) AS a,
page_header(get_raw_page(801594131::regclass::text,a)) AS b;
 
WARNING:  page verification failed, calculated checksum 5074 but expected 5050
  a  |     lsn      | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
-----+--------------+----------+-------+-------+-------+---------+----------+---------+-----------
 115 | B61/A9436C8  |   -23759 |     4 |    92 |   336 |    8192 |     8192 |       4 |         0
 116 | B61/A944FA0  |     3907 |     4 |   104 |   224 |    8192 |     8192 |       4 |         0
 117 | B61/A946828  |   -24448 |     4 |    76 |   264 |    8192 |     8192 |       4 |         0
 118 | B61/A94CCE0  |    26915 |     4 |    28 |  6256 |    8192 |     8192 |       4 |         0
 119 | B5C/9F30D1C8 |     5050 |     0 |  1492 |  2304 |    8176 |     8192 |       4 |         0

The table itself has a few btree indexes on text columns and a brin
index on log_timestamp, but not on the integers.

It sounds like it's what's expected at this point, but after I
"SET ignore_checksum_failure=on", and read the page in, vacuum kicked
off and then crashed (in heap_page_prune() if that half of the stack
trace can be trusted).

*** stack smashing detected ***: postgres: autovacuum worker postgres terminated

< 2023-10-05 12:35:30.764 CDT  >LOG:  server process (PID 30692) was terminated by signal 11: Segmentation fault
< 2023-10-05 12:35:30.764 CDT  >DETAIL:  Failed process was running: autovacuum: VACUUM ANALYZE
public.BROKEN_postgres_log_2023_10_05_0900

I took the opportunity to fsck the FS, which showed no errors.

I was curious if the relfilenodes had gotten confused/corrupted/??
But this seems to indicate not; the problem is only one block.

postgres=# SELECT oid, relfilenode, oid=relfilenode, relname FROM pg_class WHERE oid BETWEEN 801550000 AND 801594199
ORDERBY 1;
 
    oid    | relfilenode | ?column? |                     relname                     
-----------+-------------+----------+-------------------------------------------------
 801564542 |   801564542 | t        | postgres_log_2023_10_05_0800
 801564545 |   801564545 | t        | pg_toast_801564542
 801564546 |   801564546 | t        | pg_toast_801564542_index
 801564547 |   801564547 | t        | postgres_log_2023_10_05_0800_log_time_idx
 801564548 |   801564548 | t        | postgres_log_2023_10_05_0800_error_severity_idx
 801564549 |   801564549 | t        | postgres_log_2023_10_05_0800_error_message_idx
 801564550 |   801564550 | t        | postgres_log_2023_10_05_0800_duration_idx
 801564551 |   801564551 | t        | postgres_log_2023_10_05_0800_tempfile_idx
 801594131 |   801594131 | t        | BROKEN_postgres_log_2023_10_05_0900
 801594134 |   801594134 | t        | pg_toast_801594131
 801594135 |   801594135 | t        | pg_toast_801594131_index
 801594136 |   801594136 | t        | postgres_log_2023_10_05_0900_log_time_idx
 801594137 |   801594137 | t        | postgres_log_2023_10_05_0900_error_severity_idx
 801594138 |   801594138 | t        | postgres_log_2023_10_05_0900_error_message_idx
 801594139 |   801594139 | t        | postgres_log_2023_10_05_0900_duration_idx
 801594140 |   801594140 | t        | postgres_log_2023_10_05_0900_tempfile_idx

Before anybody asks, we didn't retain WAL from this morning.

FYI, the storage is ext4/LVM/scsi (it looks like this didn't use
vmw_pvscsi but an emulated hardware driver).

/dev/mapper/data-postgres on /var/lib/pgsql type ext4 (rw,relatime,seclabel,data=ordered)
[    0.000000] Linux version 3.10.0-1160.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623
(RedHat 4.8.5-44) (GCC) ) #1 SMP Mon Oct 19 16:18:59 UTC 2020
 
[    1.446380] scsi 2:0:1:0: Direct-Access     VMware   Virtual disk     1.0  PQ: 0 ANSI: 2
[    1.470764] scsi target2:0:1: Beginning Domain Validation
[    1.471077] scsi target2:0:1: Domain Validation skipping write tests
[    1.471079] scsi target2:0:1: Ending Domain Validation
[    1.471099] scsi target2:0:1: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
[    1.484109] sd 2:0:1:0: [sdb] 1048576000 512-byte logical blocks: (536 GB/500 GiB)
[    1.484136] sd 2:0:1:0: [sdb] Write Protect is off
[    1.484139] sd 2:0:1:0: [sdb] Mode Sense: 45 00 00 00
[    1.484163] sd 2:0:1:0: [sdb] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
[    1.485808] sd 2:0:1:0: [sdb] Attached SCSI disk
[    4.271339] sd 2:0:1:0: Attached scsi generic sg1 type 0

-- 
Justin

Re: pg16: invalid page/page verification failed

From

Michael Paquier

Date:

06 October 2023, 00:20:05

On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> This table is what it sounds like: a partition into which CSV logs are
> COPY'ed.  It would've been created around 8am.  There's no special
> params set for the table nor for autovacuum.

This may be an important bit of information.  31966b151e6a is new as
of Postgres 16, has changed the way relations are extended and COPY
was one area touched.  I am adding Andres in CC.
--
Michael

Attachment

signature.asc

Re: pg16: invalid page/page verification failed

From

Justin Pryzby

Date:

06 October 2023, 15:00:57

On Fri, Oct 06, 2023 at 09:20:05AM +0900, Michael Paquier wrote:
> On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > This table is what it sounds like: a partition into which CSV logs are
> > COPY'ed.  It would've been created around 8am.  There's no special
> > params set for the table nor for autovacuum.
> 
> This may be an important bit of information.  31966b151e6a is new as
> of Postgres 16, has changed the way relations are extended and COPY
> was one area touched.  I am adding Andres in CC.

Also, I realized that someone kicked off a process just after 9am which
would've done a lot of INSERT ON CONFLICT DO UPDATE, VACUUM FULL, and
VACUUM.  Which consumed and dirtied buffers about 100x faster than normal.

log_time         | 2023-10-05 10:00:55.794-05
pid              | 31754
left             | duration: 51281.001 ms  statement: VACUUM (FULL,FREEZE) othertable...

log_time         | 2023-10-05 10:01:01.784-05
backend_type     | checkpointer
left             | checkpoint starting: time

log_time         | 2023-10-05 10:01:02.935-05
pid              | 10023
left             | page verification failed, calculated checksum 5074 but expected 5050
context          | COPY postgres_log, line 947
left             | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv

log_time         | 2023-10-05 10:01:02.935-05
pid              | 10023
left             | invalid page in block 119 of relation base/16409/801594131
context          | COPY postgres_log, line 947
left             | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv

log_time         | 2023-10-05 10:01:11.636-05
pid              | 31754
left             | duration: 15838.374 ms  statement: VACUUM (FREEZE) othertable...

I meant to point out that the issue is on the last block.

postgres=# SELECT pg_relation_size('"BROKEN_postgres_log_2023_10_05_0900"')/8192;
?column? | 120

It sounds like there may be an issue locking (pinning?) a page, or
rather not locking it, or releasing the lock too early.

-- 
Justin

Re: pg16: invalid page/page verification failed

From

Andres Freund

Date:

06 October 2023, 15:47:39

Hi,

On 2023-10-06 09:20:05 +0900, Michael Paquier wrote:
> On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > This table is what it sounds like: a partition into which CSV logs are
> > COPY'ed.  It would've been created around 8am.  There's no special
> > params set for the table nor for autovacuum.
> 
> This may be an important bit of information.  31966b151e6a is new as
> of Postgres 16, has changed the way relations are extended and COPY
> was one area touched.  I am adding Andres in CC.

Hm, is there any chance the COPY targets more than one partition? If so, this
sounds like it might be the issue described here
https://postgr.es/m/20230925213746.fwqauhhifjgefyzk%40alap3.anarazel.de

I think at this stage the easiest fix might be just to copy the approach of
calling ReleaseBulkInsertStatePin(), even though I think that's
architecturally wrong.

Greetings,

Andres Freund

Re: pg16: invalid page/page verification failed

From

Justin Pryzby

Date:

06 October 2023, 21:36:20

On Fri, Oct 06, 2023 at 08:47:39AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2023-10-06 09:20:05 +0900, Michael Paquier wrote:
> > On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > > This table is what it sounds like: a partition into which CSV logs are
> > > COPY'ed.  It would've been created around 8am.  There's no special
> > > params set for the table nor for autovacuum.
> > 
> > This may be an important bit of information.  31966b151e6a is new as
> > of Postgres 16, has changed the way relations are extended and COPY
> > was one area touched.  I am adding Andres in CC.
> 
> Hm, is there any chance the COPY targets more than one partition? If so, this
> sounds like it might be the issue described here
> https://postgr.es/m/20230925213746.fwqauhhifjgefyzk%40alap3.anarazel.de

The first error was from:
log_time | 2023-10-05 09:57:01.939-05
left     | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

Unfortunately, I no longer have the CSV files which caused errors.
After I moved the broken table out of the way and created a new
partition, they would've been imported successfully, and then removed.

Also, it's sad, but the original 2023_10_05_0900 partition I created was
itself rotated out of existence a few hours ago (I still have the most
interesting lines, though).

I've seen that it's possible for a CSV to include some data that ideally
would've gone into the "next" CSV:  2023-01-01_180000.csv might include a line
of data after 6pm.  For example, with log_rotation_age=2min,
postgresql-2023-10-06_120800.csv had a row after 12:10:
2023-10-06 12:10:00.101 CDT,"pryzbyj","pryzbyj",5581,"[local]",65203f66.15cd,2,...

But I'm not sure how that can explain this issue, because this was
095600.csv, and not 095800.csv.  My script knows to create the "next"
partition, to handle the case that the file includes some data that
should've gone to the next logfile.  I'm handling that case with the
anticipation that there might be a few tenths of a second or even a few
seconds of logs in the wrong file - typically 0 lines and sometimes 1
line.  I don't know if it's even possible to have multiple lines in the
"wrong" file.  In any case, I'm not not expecting log rotation to be 2
minutes behind.

Also, not only was the data in the CSV earlier than 10am, but the error
*itself* was also earlier.  The error importing the CSV was at 9:57, so
the CSV couldn't have had data after 10:00.  Not that it matters, but my
script doesn't import the most recent logfile, and also avoids importing
files written within the last minute.

I don't see how a CSV with a 2 minute interval of data beginning at 9:56
could straddle hourly partitions.

log_time | 2023-10-05 09:57:01.939-05
left     | invalid page in block 119 of relation base/16409/801594131
left     | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

log_time | 2023-10-05 09:57:01.939-05
left     | page verification failed, calculated checksum 5074 but expected 50
left     | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

-- 
Justin