Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

From David Rowley
Subject Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date
Msg-id CAApHDvpeUqAsaBAnvfekXkVhGhW97aargELwJ3uBBztuWDffTA@mail.gmail.com
Whole thread Raw
In response to Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
Thanks for having a look at this.

I've taken most of your suggestions. The things quoted below are just
the ones I didn't agree with or didn't understand.

On Thu, 28 Jan 2021 at 18:43, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Tue, Dec 08, 2020 at 08:15:52PM +1300, David Rowley wrote:
> > +typedef struct EstimationInfo
> > +{
> > +     int                     flags;                  /* Flags, as defined above to mark special
> > +                                                              * properties of the estimation. */
>
> Maybe it should be a bits32 ?

I've changed this to uint32.  There are a few examples in the code
base of bit flags using int. e.g PlannedStmt.jitFlags and
_mdfd_getseg()'s "behavior" parameter, there are also quite a few
using unsigned types.

> (Also, according to Michael, some people preferred 0x01 to 1<<0)

I'd rather keep the (1 << 0).  I think that it gets much easier to
read when we start using more significant bits. Granted the codebase
has lots of examples of each. I just picked the one I prefer.  If
there's some consensus that we switch the bit-shifting to hex
constants for other bitflag defines then I'll change it.

> I think "result cache nodes" should be added here:
>
> doc/src/sgml/config.sgml-       <para>
> doc/src/sgml/config.sgml-        Hash-based operations are generally more sensitive to memory
> doc/src/sgml/config.sgml-        availability than equivalent sort-based operations.  The
> doc/src/sgml/config.sgml-        memory available for hash tables is computed by multiplying
> doc/src/sgml/config.sgml-        <varname>work_mem</varname> by
> doc/src/sgml/config.sgml:        <varname>hash_mem_multiplier</varname>.  This makes it
> doc/src/sgml/config.sgml-        possible for hash-based operations to use an amount of memory
> doc/src/sgml/config.sgml-        that exceeds the usual <varname>work_mem</varname> base
> doc/src/sgml/config.sgml-        amount.
> doc/src/sgml/config.sgml-       </para>

I'd say it would be better to mention it in the previous paragraph.
I've done that. It now looks like:

        Hash tables are used in hash joins, hash-based aggregation, result
        cache nodes and hash-based processing of <literal>IN</literal>
        subqueries.
       </para>

Likely setops should be added to that list too, but not by this patch.

> Language fixen follow:
>
> > + * Initialize the hash table to empty.
>
> as empty

Perhaps, but I've kept the "to empty" as it's used in
nodeRecursiveunion.c and nodeSetOp.c to do the same thing.  If you
propose a patch that gets transaction to change those instances then
I'll switch this one too.

I'm just in the middle of considering some other changes to the patch
and will post an updated version once I'm done with that.

David



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: Bharath Rupireddy
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit