Home > mailing lists

Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date	June 12, 2020 04:10:26
Msg-id	CAKU4AWpHBWWX5rpQP_ZOFMvb_eh5bN+MxKF2HzE17eUZXX5XZQ@mail.gmail.com Whole thread Raw
In response to	Re: Hybrid Hash/Nested Loop joins and caching results from subplans (Andy Fan <zhihui.fan1213@gmail.com>)
Responses	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
List	pgsql-hackers

Tree view

On Wed, Jun 3, 2020 at 10:36 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Thanks for running those tests. I had a quick look at the results and
I think to say that all 4 are better is not quite right. One is
actually a tiny bit slower and one is only faster due to a plan
change.

Yes.. Thanks for pointing it out.

Q18 uses a result cache for 2 x nested loop joins and has a 0% hit
ratio. The execution time is reduced to 91% of the original time only
because the planner uses a different plan, which just happens to be
faster by chance.

This case should be caused by wrong rows estimations on condition

o_orderkey in (select l_orderkey from lineitem group by l_orderkey having
sum(l_quantity) > 312). The estimation is 123766 rows, but the fact is 10 rows.

This estimation is hard and I don't think we should address this issue on this

patch.

Q20 uses a result cache for the subplan and has a 0% hit ratio. The
execution time is 100.27% of the original time. There are 8620 cache
misses.

This is by design for current implementation.

> For subplans, since we plan subplans before we're done planning the

> outer plan, there's very little information to go on about the number
> of times that the cache will be looked up. For now, I've coded things
> so the cache is always used for EXPR_SUBLINK type subplans. "

I first tried to see if we can have a row estimation before the subplan

is created and it looks very complex. The subplan was created during

preprocess_qual_conditions, at that time, we even didn't create the base

RelOptInfo , to say nothing of join_rel which the rows estimation happens

much later.

Then I see if we can delay the cache decision until we have the rows estimation,

ExecInitSubPlan may be a candidate. At this time we can't add a new

ResutCache node, but we can add a cache function to SubPlan node with costed

based. However the num_of_distinct values for parameterized variable can't be

calculated which I still leave it as an open issue.

Best Regards

Andy Fan

pgsql-hackers by date:

From: Fujii Masao
Date: 12 June 2020, 03:56:16
Subject: Re: Resetting spilled txn statistics in pg_stat_replication

From: Masahiko Sawada
Date: 12 June 2020, 04:23:59
Subject: Re: Transactions involving multiple postgres foreign servers, take 2

Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

Previous

Next