Re: pg13.2: invalid memory alloc request size NNNN - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: pg13.2: invalid memory alloc request size NNNN |
Date | |
Msg-id | 20210212181052.GH1793@telsasoft.com Whole thread Raw |
In response to | Re: pg13.2: invalid memory alloc request size NNNN (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
List | pgsql-hackers |
On Fri, Feb 12, 2021 at 06:44:54PM +0100, Tomas Vondra wrote: > > (gdb) p len > > $1 = -4 > > > > This VM had some issue early today and I killed the VM, causing PG to execute > > recovery. I'm tentatively blaming that on zfs, so this could conceivably be a > > data error (although recovery supposedly would have resolved it). I just > > checked and data_checksums=off. > > This seems very much like a corrupted varlena header - length (-4) is > clearly bogus, and it's what triggers the problem, because that's what wraps > around to 18446744073709551613 (which is 0xFFFFFFFFFFFFFFFD). > > This has to be a value stored in a table, not some intermediate value > created during execution. So I don't think the exact query matters. Can you > try doing something like pg_dump, which has to detoast everything? Right, COPY fails and VACUUM FULL crashes. message | invalid memory alloc request size 18446744073709551613 query | COPY child.tt TO '/dev/null'; > The question is whether this is due to the VM getting killed in some strange > way (what VM system is this, how is the storage mounted?) or whether the > recovery is borked and failed to do the right thing. This is qemu/kvm, with block storage: <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/data/postgres'/> And then more block devices for ZFS vdevs: <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/data/zfs2'/> ... Those are LVM volumes (I know that ZFS/LVM is discouraged). $ zpool list -v NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zfs 762G 577G 185G - - 71% 75% 1.00x ONLINE - vdj 127G 92.7G 34.3G - - 64% 73.0% - ONLINE vdd 127G 95.6G 31.4G - - 74% 75.2% - ONLINE vdf 127G 96.0G 31.0G - - 75% 75.6% - ONLINE vdg 127G 95.8G 31.2G - - 74% 75.5% - ONLINE vdh 127G 95.5G 31.5G - - 74% 75.2% - ONLINE vdi 128G 102G 25.7G - - 71% 79.9% - ONLINE This is recently upgraded to ZFS 2.0.0, and then to 2.0.1: Jan 21 09:33:26 Installed: zfs-dkms-2.0.1-1.el7.noarch Dec 23 08:41:21 Installed: zfs-dkms-2.0.0-1.el7.noarch The VM has gotten "wedged" and I've had to kill it a few times in the last 24h (needless to say this is not normal). That part seems like a kernel issue and not postgres problem. It's unclear if that's due to me trying to tickle the postgres ERROR. It's the latest centos7 kernel: 3.10.0-1160.15.2.el7.x86_64 -- Justin
pgsql-hackers by date: