Re: weird hash plan cost, starting with pg10 - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: weird hash plan cost, starting with pg10
Date
Msg-id CA+hUKGKoxhHo07+CoQs30dyM1rtCaRxndhDK1RfBvLvNFcwjoQ@mail.gmail.com
Whole thread Raw
In response to Re: weird hash plan cost, starting with pg10  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: weird hash plan cost, starting with pg10  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On Tue, Mar 24, 2020 at 9:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Mar 24, 2020 at 6:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > > While messing with EXPLAIN on a query emitted by pg_dump, I noticed that
> > > current Postgres 10 emits weird bucket/batch/memory values for certain
> > > hash nodes:
> >
> > >                          ->  Hash  (cost=0.11..0.11 rows=10 width=12) (actual time=0.002..0.002 rows=1 loops=8)
> > >                                Buckets: 2139062143  Batches: 2139062143  Memory Usage: 8971876904722400kB
> > >                                ->  Function Scan on unnest init_1  (cost=0.01..0.11 rows=10 width=12) (actual
time=0.001..0.001rows=1 loops=8)
 
> >
> > Looks suspiciously like uninitialized memory ...
>
> I think "hashtable" might have been pfree'd before
> ExecHashGetInstrumentation() ran, because those numbers look like
> CLOBBER_FREED_MEMORY's pattern:
>
> >>> hex(2139062143)
> '0x7f7f7f7f'
> >>> hex(8971876904722400 / 1024)
> '0x7f7f7f7f7f7'
>
> Maybe there is something wrong with the shutdown order of nested subplans.

I think there might be a case like this:

* ExecRescanHashJoin() decides it can't reuse the hash table for a
rescan, so it calls ExecHashTableDestroy(), clears HashJoinState's
hj_HashTable and sets hj_JoinState to HJ_BUILD_HASHTABLE
* the HashState node still has a reference to the pfree'd HashJoinTable!
* HJ_BUILD_HASHTABLE case reaches the empty-outer optimisation case so
it doesn't bother to build a new hash table
* EXPLAIN examines the HashState's pointer to a freed HashJoinTable struct

You could fix the dangling pointer problem by clearing it, but then
you'd have no data for EXPLAIN to show in this case.  Some other
solution is probably needed, but I didn't have time to dig further
today.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Define variables in the approprieate scope
Next
From: Tomas Vondra
Date:
Subject: Re: Parallel grouping sets