Thread: pgsql: Add some regression tests that exercise hash join code.

pgsql: Add some regression tests that exercise hash join code.

From
Andres Freund
Date:
Add some regression tests that exercise hash join code.

Although hash joins are already tested by many queries, these tests
systematically cover the four different states we can reach as part of
the strategy for respecting work_mem.

Author: Thomas Munro
Reviewed-By: Andres Freund

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/fa330f9adf4e83c0707b0b1164e7bf09c9204b3d

Modified Files
--------------
src/test/regress/expected/join.out | 457 +++++++++++++++++++++++++++++++++++++
src/test/regress/sql/join.sql      | 253 ++++++++++++++++++++
2 files changed, 710 insertions(+)


Re: pgsql: Add some regression tests that exercise hash join code.

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Add some regression tests that exercise hash join code.

At least one buildfarm member doesn't like this ...
        regards, tom lane


Re: pgsql: Add some regression tests that exercise hash join code.

From
Thomas Munro
Date:
On Thu, Nov 30, 2017 at 4:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Add some regression tests that exercise hash join code.
>
> At least one buildfarm member doesn't like this ...
 $$);  initially_multibatch | increased_batches ----------------------+-------------------
!  t                    | f (1 row)
 rollback to settings;
--- 6002,6008 ---- $$);  initially_multibatch | increased_batches ----------------------+-------------------
!                       | (1 row)

Hmm.  aholehole didn't give me the hash join EXPLAIN ANALYZE output.
All other animals are fine so far.  Gah.  My guess is that this is the
following unlikely sequence:

1.  Worker launched.
2.  Leader descheduled.
3.  Worker runs whole join.
4.  Leader awakes from slumber, begins running plan and tries to scan
outer relation, sees EOF, takes empty-outer optimisation and skips
building hash table.
5.  Explain has no data.

It's arguably a bug that EXPLAIN ANALYZE can fail to give you what you
asked for because of details of timing and I have a patch in
development to fix that but it's not yet baked.  Hmm.  Wondering if
there is a quick way to avoid that case in the meantime...

-- 
Thomas Munro
http://www.enterprisedb.com


Re: pgsql: Add some regression tests that exercise hash join code.

From
Thomas Munro
Date:
On Thu, Nov 30, 2017 at 4:47 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Nov 30, 2017 at 4:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> Add some regression tests that exercise hash join code.
>>
>> At least one buildfarm member doesn't like this ...
>
>   $$);
>    initially_multibatch | increased_batches
>   ----------------------+-------------------
> !  t                    | f
>   (1 row)
>
>   rollback to settings;
> --- 6002,6008 ----
>   $$);
>    initially_multibatch | increased_batches
>   ----------------------+-------------------
> !                       |
>   (1 row)
>
> Hmm.  aholehole didn't give me the hash join EXPLAIN ANALYZE output.
> All other animals are fine so far.  Gah.  My guess is that this is the
> following unlikely sequence:
>
> 1.  Worker launched.
> 2.  Leader descheduled.
> 3.  Worker runs whole join.
> 4.  Leader awakes from slumber, begins running plan and tries to scan
> outer relation, sees EOF, takes empty-outer optimisation and skips
> building hash table.
> 5.  Explain has no data.
>
> It's arguably a bug that EXPLAIN ANALYZE can fail to give you what you
> asked for because of details of timing and I have a patch in
> development to fix that but it's not yet baked.  Hmm.  Wondering if
> there is a quick way to avoid that case in the meantime...

A couple more have failed in the same way.  If my theory above is
correct, the attached should fix the problem by skipping the affected
bits of the test for now.  Once we fix EXPLAIN so that it reliably
prints hash table info no matter what (for which I should have a patch
fairly soon), we can uncomment them again.  Is that an acceptable
solution for now?  BTW this commit tipped nodeHash.c from yellow to
green on coverage.postgresql.org :-)

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pgsql: Add some regression tests that exercise hash join code.

From
Andres Freund
Date:
Hi,

On 2017-11-30 23:53:55 +1300, Thomas Munro wrote:
> On Thu, Nov 30, 2017 at 4:47 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > On Thu, Nov 30, 2017 at 4:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Andres Freund <andres@anarazel.de> writes:
> >>> Add some regression tests that exercise hash join code.
> >>
> >> At least one buildfarm member doesn't like this ...
> >
> >   $$);
> >    initially_multibatch | increased_batches
> >   ----------------------+-------------------
> > !  t                    | f
> >   (1 row)
> >
> >   rollback to settings;
> > --- 6002,6008 ----
> >   $$);
> >    initially_multibatch | increased_batches
> >   ----------------------+-------------------
> > !                       |
> >   (1 row)
> >
> > Hmm.  aholehole didn't give me the hash join EXPLAIN ANALYZE output.
> > All other animals are fine so far.  Gah.  My guess is that this is the
> > following unlikely sequence:
> >
> > 1.  Worker launched.
> > 2.  Leader descheduled.
> > 3.  Worker runs whole join.
> > 4.  Leader awakes from slumber, begins running plan and tries to scan
> > outer relation, sees EOF, takes empty-outer optimisation and skips
> > building hash table.
> > 5.  Explain has no data.
> >
> > It's arguably a bug that EXPLAIN ANALYZE can fail to give you what you
> > asked for because of details of timing and I have a patch in
> > development to fix that but it's not yet baked.  Hmm.  Wondering if
> > there is a quick way to avoid that case in the meantime...
> 
> A couple more have failed in the same way.  If my theory above is
> correct, the attached should fix the problem by skipping the affected
> bits of the test for now.  Once we fix EXPLAIN so that it reliably
> prints hash table info no matter what (for which I should have a patch
> fairly soon), we can uncomment them again.  Is that an acceptable
> solution for now?  BTW this commit tipped nodeHash.c from yellow to
> green on coverage.postgresql.org :-)

At the current rate of failure I'm slightly inclined to just leave the
occasionally failing test, which sounds like an actual bug imo, in place
till we have the fix.  But if there's any pushback I'll go with your
disablign patch till then.

- Andres


Re: pgsql: Add some regression tests that exercise hash join code.

From
Andres Freund
Date:
On 2017-11-30 08:06:44 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-30 23:53:55 +1300, Thomas Munro wrote:
> > On Thu, Nov 30, 2017 at 4:47 PM, Thomas Munro
> > <thomas.munro@enterprisedb.com> wrote:
> > > On Thu, Nov 30, 2017 at 4:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> Andres Freund <andres@anarazel.de> writes:
> > >>> Add some regression tests that exercise hash join code.
> > >>
> > >> At least one buildfarm member doesn't like this ...
> > >
> > >   $$);
> > >    initially_multibatch | increased_batches
> > >   ----------------------+-------------------
> > > !  t                    | f
> > >   (1 row)
> > >
> > >   rollback to settings;
> > > --- 6002,6008 ----
> > >   $$);
> > >    initially_multibatch | increased_batches
> > >   ----------------------+-------------------
> > > !                       |
> > >   (1 row)
> > >
> > > Hmm.  aholehole didn't give me the hash join EXPLAIN ANALYZE output.
> > > All other animals are fine so far.  Gah.  My guess is that this is the
> > > following unlikely sequence:
> > >
> > > 1.  Worker launched.
> > > 2.  Leader descheduled.
> > > 3.  Worker runs whole join.
> > > 4.  Leader awakes from slumber, begins running plan and tries to scan
> > > outer relation, sees EOF, takes empty-outer optimisation and skips
> > > building hash table.
> > > 5.  Explain has no data.
> > >
> > > It's arguably a bug that EXPLAIN ANALYZE can fail to give you what you
> > > asked for because of details of timing and I have a patch in
> > > development to fix that but it's not yet baked.  Hmm.  Wondering if
> > > there is a quick way to avoid that case in the meantime...
> > 
> > A couple more have failed in the same way.  If my theory above is
> > correct, the attached should fix the problem by skipping the affected
> > bits of the test for now.  Once we fix EXPLAIN so that it reliably
> > prints hash table info no matter what (for which I should have a patch
> > fairly soon), we can uncomment them again.  Is that an acceptable
> > solution for now?  BTW this commit tipped nodeHash.c from yellow to
> > green on coverage.postgresql.org :-)
> 
> At the current rate of failure I'm slightly inclined to just leave the
> occasionally failing test, which sounds like an actual bug imo, in place
> till we have the fix.  But if there's any pushback I'll go with your
> disablign patch till then.

Thomas, any updates on the status of that explain fix?

Greetings,

Andres Freund


Re: pgsql: Add some regression tests that exercise hash join code.

From
Thomas Munro
Date:
On Mon, Dec 4, 2017 at 7:39 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-11-30 08:06:44 -0800, Andres Freund wrote:
>> At the current rate of failure I'm slightly inclined to just leave the
>> occasionally failing test, which sounds like an actual bug imo, in place
>> till we have the fix.  But if there's any pushback I'll go with your
>> disablign patch till then.
>
> Thomas, any updates on the status of that explain fix?

Will post today.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: pgsql: Add some regression tests that exercise hash join code.

From
Thomas Munro
Date:
On Mon, Dec 4, 2017 at 9:12 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, Dec 4, 2017 at 7:39 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-11-30 08:06:44 -0800, Andres Freund wrote:
>>> At the current rate of failure I'm slightly inclined to just leave the
>>> occasionally failing test, which sounds like an actual bug imo, in place
>>> till we have the fix.  But if there's any pushback I'll go with your
>>> disablign patch till then.
>>
>> Thomas, any updates on the status of that explain fix?
>
> Will post today.

Here it is.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pgsql: Add some regression tests that exercise hash join code.

From
Thomas Munro
Date:
On Mon, Dec 4, 2017 at 12:18 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, Dec 4, 2017 at 9:12 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Mon, Dec 4, 2017 at 7:39 AM, Andres Freund <andres@anarazel.de> wrote:
>>> On 2017-11-30 08:06:44 -0800, Andres Freund wrote:
>>>> At the current rate of failure I'm slightly inclined to just leave the
>>>> occasionally failing test, which sounds like an actual bug imo, in place
>>>> till we have the fix.  But if there's any pushback I'll go with your
>>>> disablign patch till then.
>>>
>>> Thomas, any updates on the status of that explain fix?
>>
>> Will post today.
>
> Here it is.  Thoughts?

Here's a version with a better commit message.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pgsql: Add some regression tests that exercise hash join code.

From
Andres Freund
Date:
On 2017-12-04 12:28:56 +1300, Thomas Munro wrote:
> From 43eeea0bc35204440d262224b56efca958b33428 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.munro@enterprisedb.com>
> Date: Mon, 4 Dec 2017 11:52:11 +1300
> Subject: [PATCH] Fix EXPLAIN ANALYZE of hash join when the leader doesn't
>  execute.
> 
> If a hash join appears in a parallel query, there may be no hash table
> available for explain.c to inspect even though a hash table may have been
> built in other processes.  This could happen either because
> parallel_leader_participation was set to off or because the leader happened to
> hit the end of the outer relation immediately (even though the complete
> relation is not empty) and decided not to build the hash table.
> 
> Commit bf11e7ee introduced a way for workers to exchange instrumentation via
> the DSM segment for Sort nodes even though they are not parallel-aware.  This
> commit does the same for Hash nodes, so that explain.c has a way to find
> instrumentation data from an arbitrary participant that actually built the
> hash table.

Makes sense. Seems ok to only fix this inv11, even if present earlier.

> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index 447f69d044e..1ffe635d66c 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2379,34 +2379,62 @@ show_sort_info(SortState *sortstate, ExplainState *es)
>  static void
>  show_hash_info(HashState *hashstate, ExplainState *es)
>  {
> -    HashJoinTable hashtable;
> +    HashInstrumentation *hinstrument = NULL;
>  
> -    hashtable = hashstate->hashtable;
> +    /*
> +     * In a parallel query, the leader process may or may not have run the
> +     * hash join, and even if it did it may not have built a hash table due to
> +     * timing (if it started late it might have seen no tuples in the outer
> +     * relation and skipped building the hash table).  Therefore we have to be
> +     * prepared to get instrumentation data from a worker if there is no hash
> +     * table.
> +     */
> +    if (hashstate->hashtable)
> +    {
> +        hinstrument = (HashInstrumentation *)
> +            palloc(sizeof(HashInstrumentation));
> +        ExecHashGetInstrumentation(hinstrument, hashstate->hashtable);
> +    }
> +    else if (hashstate->shared_info)
> +    {
> +        SharedHashInfo *shared_info = hashstate->shared_info;
> +        int        i;
> +
> +        /* Find the first worker that built a hash table. */
> +        for (i = 0; i < shared_info->num_workers; ++i)
> +        {
> +            if (shared_info->hinstrument[i].nbatch > 0)
> +            {
> +                hinstrument = &shared_info->hinstrument[i];
> +                break;
> +            }
> +        }
> +    }

I can't think of a way that, before the parallel HJ patch, results in a
difference between the size of the hashtable between workers. Short of
some participant not having a hashtable at all, that is. Wwe also don't
currently display lookup information etc. So just more or less randomly
choosing one worker's seems ok.


> +        case T_HashState:
> +            /* even when not parallel-aware */

"..., for stats" or such maybe?

> +            ExecHashEstimate((HashState *) planstate, e->pcxt);
> +            break;
>          case T_SortState:
>              /* even when not parallel-aware */

I'd just update that comment as well.


>  #endif                            /* NODEHASH_H */
> diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
> index e05bc04f525..6c322e57c00 100644
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -1980,6 +1980,29 @@ typedef struct GatherMergeState
>      struct binaryheap *gm_heap; /* binary heap of slot indices */
>  } GatherMergeState;
>  
> +/* ----------------
> + *     Values displayed by EXPLAIN ANALYZE
> + * ----------------
> + */
> +typedef struct HashInstrumentation
> +{
> +    int            nbuckets;
> +    int            nbuckets_original;
> +    int            nbatch;
> +    int            nbatch_original;
> +    size_t        space_peak;
> +} HashInstrumentation;

Maybe I'm being pedantic here, but I think it'd not hurt to explain what
these mean. It's obviously pretty clear what nbuckets/batch mean, but
the difference between original and not is far less clea.r


Looks reasonable these nitpicks aside.

Greetings,

Andres Freund


Re: pgsql: Add some regression tests that exercise hash join code.

From
Thomas Munro
Date:
On Tue, Dec 5, 2017 at 9:21 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-12-04 12:28:56 +1300, Thomas Munro wrote:
>> From 43eeea0bc35204440d262224b56efca958b33428 Mon Sep 17 00:00:00 2001
>> From: Thomas Munro <thomas.munro@enterprisedb.com>
>> Date: Mon, 4 Dec 2017 11:52:11 +1300
>> Subject: [PATCH] Fix EXPLAIN ANALYZE of hash join when the leader doesn't
>>  execute.
>>
>> If a hash join appears in a parallel query, there may be no hash table
>> available for explain.c to inspect even though a hash table may have been
>> built in other processes.  This could happen either because
>> parallel_leader_participation was set to off or because the leader happened to
>> hit the end of the outer relation immediately (even though the complete
>> relation is not empty) and decided not to build the hash table.
>>
>> Commit bf11e7ee introduced a way for workers to exchange instrumentation via
>> the DSM segment for Sort nodes even though they are not parallel-aware.  This
>> commit does the same for Hash nodes, so that explain.c has a way to find
>> instrumentation data from an arbitrary participant that actually built the
>> hash table.
>
> Makes sense. Seems ok to only fix this inv11, even if present earlier.

+1

I doubt any v10 users will notice.  v11 makes it more noticeable
because it has the new parallel_leader_participation GUC and these new
regression tests which hit the racy case surprisingly often when
running on somebody's toaster in the buildfarm.

> I can't think of a way that, before the parallel HJ patch, results in a
> difference between the size of the hashtable between workers. Short of
> some participant not having a hashtable at all, that is. Wwe also don't
> currently display lookup information etc. So just more or less randomly
> choosing one worker's seems ok.

Right.  Finding the first arbitrary backend that actually built the
hash table and using its instrumentation data works OK for now because
they each do either exactly the same hash build or nothing.  Even with
the later PHJ patch the batch/buckets sizing info will always be the
same across all workers that built the hash table, so it just needs
some special handling to choose what peak size to report (or some more
elaborate information on space usage, to be bikeshedded later).

>> +             case T_HashState:
>> +                     /* even when not parallel-aware */
>
> "..., for stats" or such maybe?

How about ", for EXPLAIN ANALYZE"?  Done that way.

>> +                     ExecHashEstimate((HashState *) planstate, e->pcxt);
>> +                     break;
>>               case T_SortState:
>>                       /* even when not parallel-aware */
>
> I'd just update that comment as well.

Done.

Looking for all the places where Robert has that comment, I realised
that I probably also needed to supply ExecHashReInitializeDSM() to zap
the instrumentation array for each loop.  Done.

>> +/* ----------------
>> + *    Values displayed by EXPLAIN ANALYZE
>> + * ----------------
>> + */
>> +typedef struct HashInstrumentation
>> +{
>> +     int                     nbuckets;
>> +     int                     nbuckets_original;
>> +     int                     nbatch;
>> +     int                     nbatch_original;
>> +     size_t          space_peak;
>> +} HashInstrumentation;
>
> Maybe I'm being pedantic here, but I think it'd not hurt to explain what
> these mean. It's obviously pretty clear what nbuckets/batch mean, but
> the difference between original and not is far less clea.r

Fixed.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: pgsql: Add some regression tests that exercise hash join code.

From
Andres Freund
Date:
Hi,

On 2017-12-05 12:59:21 +1300, Thomas Munro wrote:
>  src/backend/commands/explain.c      |  58 ++++++++++++++------

This didn't actually include nodeHash.h, therefore
ExecHashGetInstrumentation() didn't have a prototype. I added the
relevant include and remove the hashjoin.h include which isn't required
anymore.

Pushed, thanks for the patch. Let's see what the buildfarm says...

- Andres


Re: pgsql: Add some regression tests that exercise hash join code.

From
Thomas Munro
Date:
On Wed, Dec 6, 2017 at 8:03 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-12-05 12:59:21 +1300, Thomas Munro wrote:
>>  src/backend/commands/explain.c      |  58 ++++++++++++++------
>
> This didn't actually include nodeHash.h, ...

Ugh, I'd lost -Werror from my COPT.  Back in there it goes.

> Pushed, thanks for the patch. Let's see what the buildfarm says...

Thank you.  I'm keeping an eye on it.

By the way, the current coding of ExecSortRetrieveInstrumentation()
and now also ExecHashRetrieveInstrumentation() is wrong in the case of
rescans, since they get called multiple times but are written in a way
that assumes they'll be called only once:

https://www.postgresql.org/message-id/CAA4eK1JBj4YCEQKeTua5%3DBMXy7zW7zNOvoXomzBP%3Dkb_aqMF7w%40mail.gmail.com

Amit's patch would move ExecParallelRetrieveInstrumentation() so that
they are called only once, so that would fix that problem without any
change to this code.  More over on that thread.

-- 
Thomas Munro
http://www.enterprisedb.com