Re: pgsql: Add some regression tests that exercise hash join code. - Mailing list pgsql-committers

From Andres Freund
Subject Re: pgsql: Add some regression tests that exercise hash join code.
Date
Msg-id 20171204202142.spwkwc545vvxmed7@alap3.anarazel.de
Whole thread Raw
In response to Re: pgsql: Add some regression tests that exercise hash join code.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: pgsql: Add some regression tests that exercise hash join code.  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-committers
On 2017-12-04 12:28:56 +1300, Thomas Munro wrote:
> From 43eeea0bc35204440d262224b56efca958b33428 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.munro@enterprisedb.com>
> Date: Mon, 4 Dec 2017 11:52:11 +1300
> Subject: [PATCH] Fix EXPLAIN ANALYZE of hash join when the leader doesn't
>  execute.
> 
> If a hash join appears in a parallel query, there may be no hash table
> available for explain.c to inspect even though a hash table may have been
> built in other processes.  This could happen either because
> parallel_leader_participation was set to off or because the leader happened to
> hit the end of the outer relation immediately (even though the complete
> relation is not empty) and decided not to build the hash table.
> 
> Commit bf11e7ee introduced a way for workers to exchange instrumentation via
> the DSM segment for Sort nodes even though they are not parallel-aware.  This
> commit does the same for Hash nodes, so that explain.c has a way to find
> instrumentation data from an arbitrary participant that actually built the
> hash table.

Makes sense. Seems ok to only fix this inv11, even if present earlier.

> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index 447f69d044e..1ffe635d66c 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2379,34 +2379,62 @@ show_sort_info(SortState *sortstate, ExplainState *es)
>  static void
>  show_hash_info(HashState *hashstate, ExplainState *es)
>  {
> -    HashJoinTable hashtable;
> +    HashInstrumentation *hinstrument = NULL;
>  
> -    hashtable = hashstate->hashtable;
> +    /*
> +     * In a parallel query, the leader process may or may not have run the
> +     * hash join, and even if it did it may not have built a hash table due to
> +     * timing (if it started late it might have seen no tuples in the outer
> +     * relation and skipped building the hash table).  Therefore we have to be
> +     * prepared to get instrumentation data from a worker if there is no hash
> +     * table.
> +     */
> +    if (hashstate->hashtable)
> +    {
> +        hinstrument = (HashInstrumentation *)
> +            palloc(sizeof(HashInstrumentation));
> +        ExecHashGetInstrumentation(hinstrument, hashstate->hashtable);
> +    }
> +    else if (hashstate->shared_info)
> +    {
> +        SharedHashInfo *shared_info = hashstate->shared_info;
> +        int        i;
> +
> +        /* Find the first worker that built a hash table. */
> +        for (i = 0; i < shared_info->num_workers; ++i)
> +        {
> +            if (shared_info->hinstrument[i].nbatch > 0)
> +            {
> +                hinstrument = &shared_info->hinstrument[i];
> +                break;
> +            }
> +        }
> +    }

I can't think of a way that, before the parallel HJ patch, results in a
difference between the size of the hashtable between workers. Short of
some participant not having a hashtable at all, that is. Wwe also don't
currently display lookup information etc. So just more or less randomly
choosing one worker's seems ok.


> +        case T_HashState:
> +            /* even when not parallel-aware */

"..., for stats" or such maybe?

> +            ExecHashEstimate((HashState *) planstate, e->pcxt);
> +            break;
>          case T_SortState:
>              /* even when not parallel-aware */

I'd just update that comment as well.


>  #endif                            /* NODEHASH_H */
> diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
> index e05bc04f525..6c322e57c00 100644
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -1980,6 +1980,29 @@ typedef struct GatherMergeState
>      struct binaryheap *gm_heap; /* binary heap of slot indices */
>  } GatherMergeState;
>  
> +/* ----------------
> + *     Values displayed by EXPLAIN ANALYZE
> + * ----------------
> + */
> +typedef struct HashInstrumentation
> +{
> +    int            nbuckets;
> +    int            nbuckets_original;
> +    int            nbatch;
> +    int            nbatch_original;
> +    size_t        space_peak;
> +} HashInstrumentation;

Maybe I'm being pedantic here, but I think it'd not hurt to explain what
these mean. It's obviously pretty clear what nbuckets/batch mean, but
the difference between original and not is far less clea.r


Looks reasonable these nitpicks aside.

Greetings,

Andres Freund


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Support boolean columns in functional-dependency statistics.
Next
From: Robert Haas
Date:
Subject: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table,log i