Re: strange parallel query behavior after OOM crashes - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: strange parallel query behavior after OOM crashes
Date
Msg-id CAEepm=25dryuuuMzG=ZonYu6dVZ55eJxEKuVCCkqP1P4w+DAqQ@mail.gmail.com
Whole thread Raw
Responses Re: strange parallel query behavior after OOM crashes  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
List pgsql-hackers
On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Hi,
>
> While doing some benchmarking, I've ran into a fairly strange issue with OOM
> breaking LaunchParallelWorkers() after the restart. What I see happening is
> this:
>
> 1) a query is executed, and at the end of LaunchParallelWorkers we get
>
>     nworkers=8 nworkers_launched=8
>
> 2) the query does a Hash Aggregate, but ends up eating much more memory due
> to n_distinct underestimate (see [1] from 2015 for details), and gets killed
> by OOM
>
> 3) the server restarts, the query is executed again, but this time we get in
> LaunchParallelWorkers
>
>     nworkers=8 nworkers_launched=0
>
> There's nothing else running on the server, and there definitely should be
> free parallel workers.
>
> 4) The query gets killed again, and on the next execution we get
>
>     nworkers=8 nworkers_launched=8
>
> again, although not always. I wonder whether the exact impact depends on OOM
> killing the leader or worker, for example.

I don't know what's going on but I think I have seen this once or
twice myself while hacking on test code that crashed.  I wonder if the
DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
the DSM control is somehow confused?

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Logical decoding on standby
Next
From: Andres Freund
Date:
Subject: Re: WIP: Covering + unique indexes.