Thread: Re: dsa_allocate() faliure

Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Tue, Nov 27, 2018 at 7:45 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2018-Nov-26, Jakub Glapa wrote:
> > Justin thanks for the information!
> > I'm running Ubuntu 16.04.
> > I'll try to prepare for the next crash.
> > Couldn't find anything this time.
>
> As I recall, the appport stuff in Ubuntu is terrible ... I've seen it
> take 40 minutes to write the crash dump to disk, during which the
> database was "down".  I don't know why it is so slow (it's a rather
> silly python script that apparently processes the core dump one byte at
> a time, and you can imagine that with a few gigabytes of shared memory
> that takes a while).  Anyway my recommendation was to *remove* that
> stuff from the server and make sure the core file is saved by normal
> means.

Thanks for CC-ing me.  I didn't see this thread earlier because I'm
not subscribed to -performance.  Let's move it over to -hackers since
it looks like it's going to be a debugging exercise.  So, reading
through the thread[1], I think there might be two independent problems
here:

1.  Jakub has a many-partition Parallel Bitmap Heap Scan query that
segfaults when run with max_parallel_workers = 0.  That sounds
suspiciously like an instance of a class of bug we've run into before.
We planned a parallel query, but were unable to launch one due to lack
of DSM slots or process slots, so we run the parallel plan in a kind
of degraded non-parallel mode that needs to cope with various pointers
into shared memory being NULL.  A back trace from a core file should
hopefully make it very obvious what's going on.

2.  The same query when run in real parallel query mode occasionally
reaches an error "dsa_allocate could not find 7 free pages", which
should not happen.  This is on 10.6, so it has the commit "Fix
segment_bins corruption in dsa.c.".

Hmm.  I will see if I can come up with a many-partition torture test
reproducer for this.

[1] https://www.postgresql.org/message-id/flat/CAJk1zg10iCNsxFvQ4pgKe1B0rdjNG9iELA7AzLXjXnQm5T%3DKzQ%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hmm.  I will see if I can come up with a many-partition torture test
> reproducer for this.

No luck.  I suppose one theory that could link both failure modes
would a buffer overrun, where in the non-shared case it trashes a
pointer that is later dereferenced, and in the shared case it writes
past the end of allocated 4KB pages and corrupts the intrusive btree
that lives in spare pages to track available space.

--
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
Hi, just a small update. 
I've configured the OS for taking crash dumps on Ubuntu 16.04 with the following (maybe somebody will find it helpful):
I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service under [Service] section
I've reloaded the service config with sudo systemctl daemon-reload
Changed the core pattern to: sudo echo /var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern
I had tested it with kill -ABRT pidofbackend and it behaved correctly. A crash dump was written.

In the last days I've been monitoring no segfault occurred but the das_allocation did.
I'm starting to doubt if the segfault I've found in dmesg was actually related.

I've grepped the postgres log for dsa_allocated:
Why do the messages occur sometimes as FATAL and sometimes as ERROR?

2018-11-29 07:59:06 CET::@:[20584]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 09:04:13 CET::@:[27341]: FATAL:  dsa_allocate could not find 13 free pages
2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR:  dsa_allocate could not find 13 free pages
2018-11-30 09:28:38 CET::@:[30215]: FATAL:  dsa_allocate could not find 4 free pages
2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR:  dsa_allocate could not find 4 free pages
2018-11-30 16:37:16 CET::@:[14385]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET::@:[14375]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15359]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15363]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL:  dsa_allocate could not find 7 free pages


There's quite a bit errors from today but I was launching the problematic query in parallel from 2-3 sessions. 
Sometimes it was breaking sometimes not. 
Couldn't find any pattern. 
The workload on this db is not really constant, rather bursting.

--
regards,
Jakub Glapa


On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hmm.  I will see if I can come up with a many-partition torture test
> reproducer for this.

No luck.  I suppose one theory that could link both failure modes
would a buffer overrun, where in the non-shared case it trashes a
pointer that is later dereferenced, and in the shared case it writes
past the end of allocated 4KB pages and corrupts the intrusive btree
that lives in spare pages to track available space.

--
Thomas Munro
http://www.enterprisedb.com

Re: dsa_allocate() faliure

From
Justin Pryzby
Date:
On Fri, Nov 30, 2018 at 08:20:49PM +0100, Jakub Glapa wrote:
> In the last days I've been monitoring no segfault occurred but the
> das_allocation did.
> I'm starting to doubt if the segfault I've found in dmesg was actually
> related.

The dmesg looks like a real crash, not just OOM.  You can hopefully find the
timestamp of the segfaults in /var/log/syslog, and compare with postgres logs
if they go back far enough.  All the postgres processes except the parent
would've been restarted at that time.

> I've grepped the postgres log for dsa_allocated:
> Why do the messages occur sometimes as FATAL and sometimes as ERROR?

I believe it may depend if it happens in a parallel worker or the leader.

You may get more log detail if you enable CSV logging (although unfortunately
as I recall it doesn't indicate it's a parallel worker).

You could force it to dump core if you recompile postgres with an assert() (see
patch below).

You could build an .deb by running dpkg-buildpackage -rfakeroot or similar (i
haven't done this in awhile), or you could compile, install, and launch
debugging binaries from your homedir (or similar)

You'd want to compile the same version (git checkout REL_10_6) and with the
proper configure flags..perhaps starting with:
./configure --with-libxml --with-libxslt --enable-debug --prefix=$HOME/src/postgresql.bin --enable-cassert && time make
&&make install
 

Be careful if you have extensions installed that they still work.

Justin

--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -727,4 +727,7 @@ dsa_allocate_extended(dsa_area *area, size_t size, int flags)
                if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
+               {
                        elog(FATAL,
                                 "dsa_allocate could not find %zu free pages", npages);
+                       abort()
+               }



Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Sat, Dec 1, 2018 at 9:46 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>                         elog(FATAL,
>                                  "dsa_allocate could not find %zu free pages", npages);
> +                       abort()

If anyone can reproduce this problem with a debugger, it'd be
interesting to see the output of dsa_dump(area), and
FreePageManagerDump(segment_map->fpm).  This error condition means
that get_best_segment() selected a segment from a segment bin that
holds segments with a certain minimum number of contiguous free pages
>= the requested number npages, but then FreePageManagerGet() found
that it didn't have npages of contiguous free memory after all when it
consulted the segment's btree of free space.  Possible explanations
include: the segment bin lists are somehow messed up, the FPM in the
segment was corrupted by someone scribbling on free pages (which hold
the btree), the btree was corrupted by an incorrect sequence of
allocate/free calls (for example double frees, allocating from one
area and freeing to another etc), freepage.c fails to track its
largest size correctly.

There is a macro FPM_EXTRA_ASSERTS that can be defined to double-check
the largest contiguous page tracking.  I have also been wondering
about a debug mode that would mprotect(PROT_READ) free pages when they
aren't being modified to detect unexpected writes, which should work
on systems that have 4k pages.

One thing I noticed is that it is failing on a "large" allocation,
where we go straight to the btree of 4k pages, but the equivalent code
where we allocate a superblock for "small" allocations doesn't report
the same kind of FATAL this-can't-happen error, it just fails the
allocation via the regular error path without explanation.  I also
spotted a path that doesn't respect the DSA_ALLOC_NO_OOM flag (you get
a null pointer instead of an error).  I should fix those
inconsistencies (draft patch attached), but those are incidental
problems AFAIK.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment