Re: PANIC: could not flush dirty data: Cannot allocate memory - Mailing list pgsql-general

From klaus.mailinglists@pernau.at
Subject Re: PANIC: could not flush dirty data: Cannot allocate memory
Date
Msg-id 4eeb184a1f907c0deab774429602568b@pernau.at
Whole thread Raw
In response to Re: PANIC: could not flush dirty data: Cannot allocate memory  (klaus.mailinglists@pernau.at)
Responses Re: PANIC: could not flush dirty data: Cannot allocate memory  (klaus.mailinglists@pernau.at)
List pgsql-general
Hello all!


Thanks for the many hints to look for. We did some tuning and further 
debugging and here are the outcomes, answering all questions in a single 
email.


> In the meantime, you could experiment with setting 
> checkpoint_flush_after to 0
We did this:
# SHOW checkpoint_flush_after;
  checkpoint_flush_after
------------------------
  0
(1 row)

But we STILL have PANICs. I tried to understand the code but failed. I 
guess that there are some code paths which call pg_flush_data() without 
checking this settings, or the check does not work.



> Did this start after upgrading to 22.04? Or after a certain kernel 
> upgrade?

For sure it only started with Ubuntu 22.04. We did not had and still not 
have any issues on servers with Ubuntu 20.04 and 18.04.


> I would believe that the kernel would raise
> a bunch of printks if it hit ENOMEM in the commonly used paths, so
> you would see something in dmesg or wherever you collect your kernel
> log if it happened where it was expected.

There is nothing in the kernel logs (dmesg)


> Do you use cgroups or such to limit memory usage of postgres?

No


> Any uncommon options on the filesystem or the mount point?
No. Also no Antivirus:
/dev/xvda2 / ext4 noatime,nodiratime,errors=remount-ro 0 1
or
LABEL=cloudimg-rootfs   /        ext4   discard,errors=remount-ro       
0 1


> does this happen on all the hosts, or is it limited to one host or one 
> technology?

It happens on XEN VMs, KVM VMs and VMware VMs. On Intel and AMD 
plattforms.


> Another interesting thing would be to know the mount and file system 
> options
> for the FS that triggers the failures. E.g.

# tune2fs -l /dev/sda1
tune2fs 1.46.5 (30-Dec-2021)
Filesystem volume name:   cloudimg-rootfs
Last mounted on:          /
Filesystem UUID:          0522e6b3-8d40-4754-a87e-5678a6921e37
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index 
filetype needs_recovery extent 64bit flex_bg encrypt sparse_super 
large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              12902400
Block count:              26185979
Reserved block count:     0
Overhead clusters:        35096
Free blocks:              18451033
Free inodes:              12789946
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      243
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16128
Inode blocks per group:   1008
Flex block group size:    16
Filesystem created:       Wed Apr 20 18:31:24 2022
Last mount time:          Thu Nov 10 09:49:34 2022
Last write time:          Thu Nov 10 09:49:34 2022
Mount count:              7
Maximum mount count:      -1
Last checked:             Wed Apr 20 18:31:24 2022
Check interval:           0 (<none>)
Lifetime writes:          252 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
First orphan inode:       42571
Default directory hash:   half_md4
Directory Hash Seed:      c5ef129b-fbee-4f35-8f28-ad7cc93c1c43
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xb74ebbc3


Thanks
Klaus




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Upgrading to v12
Next
From: shashidhar Reddy
Date:
Subject: plpgsql_check_function issue after upgrade