Home > mailing lists

Re: Subplan result caching - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Subplan result caching
Date	July 9, 2018 15:08:07
Msg-id	CAKJS1f-kAk1cGVvzg9TXCLhPsxx_oFVOrTGSR5yTRXKWntTVFA@mail.gmail.com Whole thread Raw
In response to	Subplan result caching (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses	Re: Subplan result caching Re: Subplan result caching Re: Subplan result caching
List	pgsql-hackers

Tree view

On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I've been working on a patch to add a little cache to SubPlans, to speed up
> queries with correlated subqueries, where the same subquery is currently
> executed multiple times with the same parameters. The idea is to cache the
> result of the subplan, with the correlation vars as the cache key.

Hi,

This seems like an interesting area to make improvements, so I've
signed up to review the patch.

From looking at the code I see that the caching is being done inside
nodeSubplan.c. I don't think this is the right approach to the
problem. The problem exists for any parameterized path, so I think a
more general approach would be much better.

We already have Materialize nodes to cache the results of an entire
subplan, and this seems to have quite a bit in common with that, only
we'd want to cache multiple results with a key to determine which
result set should be returned. Due to the similarities with
Materialize, I think that the cache should be a node itself and not
bury the cache logic in some other node type that's meant for some
other purpose.

"LazyMaterialize" seems like a good option for a name. It seems better
than "LazyHash" since you may not want to restrict it to a hash table
based cache in the future.  A binary search tree may be a good option
for types that cannot be hashed.

Materialize nodes are injected above the inner side node of MergeJoins
based on cost, so I think this node type could just do the same. Maybe
something like estimate_num_groups(<exprs being compared to params>) /
path->rows is below some defined constant, perhaps something like 0.5.
Although experimentation would be required. It might be good to take
into account some other cost factors too.

I imagine we'd want to only allow this optimisation for hashjoinable
types. This seems pretty natural since your cache implementation is a
hash table, so, of course, we're going to need a hash function.

Wondering your thoughts on this idea.

I'll mark as waiting on author in the meantime.

It's great to see someone working on this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Kyotaro HORIGUCHI
Date: 09 July 2018, 15:07:24
Subject: Re: Let's remove DSM_IMPL_NONE.

From: Kyotaro HORIGUCHI
Date: 09 July 2018, 15:14:13
Subject: Re: Non-reserved replication slots and slot advancing

Re: Subplan result caching - Mailing list pgsql-hackers

Previous

Next