Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date
Msg-id CAKU4AWpoQ_c9Rj2sTVPMGxAb+CAAe4z5ejcrL-89QirCxk=k+A@mail.gmail.com
Whole thread Raw
In response to Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


On Mon, Nov 2, 2020 at 3:44 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml@gmail.com> wrote:
>
> So far benchmarking shows there's still a regression from the v8
> version of the patch. This is using count(*). An earlier test [1] did
> show speedups when we needed to deform tuples returned by the nested
> loop node. I've not yet repeated that test again. I was disappointed
> to see v9 slower than v8 after having spent about 3 days rewriting the
> patch

I did some further tests this time with some tuple deforming.  Again,
it does seem that v9 is slower than v8.

I run your test case on v8 and v9,  I can produce a stable difference between them. 

v8:
statement latencies in milliseconds:
      1603.611  select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;

v9: 
statement latencies in milliseconds:
      1772.287  select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;

then I did a perf on the 2 version,  Is it possible that you called tts_minimal_clear twice in 
the v9 version?  Both ExecClearTuple and  ExecStoreMinimalTuple called tts_minimal_clear
on the same  slot. 

With the following changes:

diff --git a/src/backend/executor/execMRUTupleCache.c b/src/backend/executor/execMRUTupleCache.c
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache, MRUCacheKey *key)
        TupleTableSlot *tslot = mrucache->tableslot;
        int                             numKeys = mrucache->nkeys;

-       ExecClearTuple(pslot);
-
        if (key == NULL)
        {
+               ExecClearTuple(pslot);
                /* Set the probeslot's values based on the current parameter values */
                for (int i = 0; i < numKeys; i++)
                        pslot->tts_values[i] = ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
                                        {
                                                mrucache->state = MRUCACHE_FETCH_NEXT_TUPLE;

-                                               ExecClearTuple(mrucache->cachefoundslot);
+                                               // ExecClearTuple(mrucache->cachefoundslot);
                                                slot = mrucache->cachefoundslot;
                                                ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
                                                return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
                                        return NULL;
                                }

-                               ExecClearTuple(mrucache->cachefoundslot);
+                               // ExecClearTuple(mrucache->cachefoundslot);
                                slot = mrucache->cachefoundslot;
                                ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
                                return slot;


v9 has the following result:
      1608.048  select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;

 
Graphs attached

Looking at profiles, I don't really see any obvious reason as to why
this is.  I'm very much inclined to just pursue the v8 patch (separate
Result Cache node) and just drop the v9 idea altogether.

David


--
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Next
From: Marina Polyakova
Date:
Subject: Re: pgbench stopped supporting large number of client connections on Windows