Home > mailing lists

Re: Trying to pull up EXPR SubLinks - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Trying to pull up EXPR SubLinks
Date	April 24, 2020 06:25:14
Msg-id	CAKU4AWqLB5FbsvZG1uQ8cSMXF-qcHmO9zQHHFw35J6CaqYAGrQ@mail.gmail.com Whole thread Raw
In response to	Re: Trying to pull up EXPR SubLinks (Richard Guo <guofenglinux@gmail.com>)
Responses	Re: Trying to pull up EXPR SubLinks Re: Trying to pull up EXPR SubLinks
List	pgsql-hackers

Tree view

Actually I have a different opinion to handle this issue, to execute the

a > (select avg(a) from tinner where x = touer.x); The drawback of current

path is because it may calculates the same touer.x value multi-times. So

if we cache the values we have calculated before, we can avoid the cost.

Material path may be the one we can reference but it assumes all the tuples

in the tuplestore matches the input params, which is not the fact here.

But what if the input params doesn't change? If so we can use Material path

to optimize this case. But since we don't know if the if the input params changed

or not during plan time, we just add the path (let's assume we can add it with some

rules or cost calculation). If the input params is not changed, we use the cached

values, if the input params changed, we can ReScan the Material node. To optimize

the the cache invalidation frequent issue like (1, 2, 1, 2, 1, 2) case, we may consider

a sort path to change the input values to (1, 1, 1, 2, 2, 2). But overall it is a big effort.

As a independent small optimization maybe if the input params doesn't change, we

can use the tuples in the Material node again. Suppose it will not demage our current

framework if we can add the material path by either rules based or cost based.

Suppose we have the following data:

demo=# select * from j1 limit 10;
i | im5 | im100 | im1000
----+-----+-------+--------
1 | 1 | 1 | 1
2 | 2 | 2 | 2
3 | 3 | 3 | 3
4 | 4 | 4 | 4
5 | 0 | 5 | 5
6 | 1 | 6 | 6
7 | 2 | 7 | 7
8 | 3 | 8 | 8
9 | 4 | 9 | 9
10 | 0 | 10 | 10
(10 rows)

totally we have j1 = 10,000,002 rows, the extra 2 rows because we have 3 rows for i=1

demo=# select * from j1 where i = 1;
i | im5 | im100 | im1000
---+-----+-------+--------
1 | 1 | 1 | 1
1 | 1 | 1 | 1
1 | 1 | 1 | 1
(3 rows)

Then select * from j1 j1o where im5 = (select avg(im5) from j1 where im5 = j1o.im5) and i = 1;

will hit our above optimizations. The plan is

QUERY PLAN
-----------------------------------------------
Index Scan using j1_idx1 on j1 j1o
Index Cond: (i = 1)
Filter: ((im5)::numeric < (SubPlan 1))
SubPlan 1
-> Materialize
-> Aggregate
-> Seq Scan on j1
Filter: (im5 = j1o.im5)
(8 rows)

and the Aggregate is just executed once (execution time dropped from 8.x s

to 2.6s).

----

The attached is a very PoC patch, but it can represent my idea for

current discuss, Some notes about the implementation.

1. We need to check if the input params is really not changed. Currently I just

comment it out for quick test.

- planstate->chgParam = bms_add_member(planstate->chgParam, paramid);
+ // planstate->chgParam = bms_add_member(planstate->chgParam, paramid);

Looks we have a lot of places to add a params

to chgParam without checking the actual value. The place I found this case is

during ExecNestLoop. So we may need a handy and efficient way to do the

check for all the places. However it is not a must for current case

2. I probably misunderstand the the usage of MaterialState->eflags. since I don't

know why the eflag need to be checked ExecMaterial. and I have to remove it to

let my PoC work.

- if (tuplestorestate == NULL && node->eflags != 0)
+ if (tuplestorestate == NULL)

3. I added the material path in a very hacked way, the if check just to make

sure it take effect on my test statement only. If you want to test this patch locally,

you need to change the oid for your case.

+ if (linitial_node(RangeTblEntry, root->parse->rtable)->relid == 25634)
+ best_path = (Path *) create_material_path(final_rel, best_path);

But when we take this action to production case, how to cost this strategy is

challenge since it can neither reduce the total_cost nor result in a new PathKey.

I will check other place to see how this kind can be added.

Best Regards

Andy Fan

Attachment

pgsql-hackers by date:

From: Atsushi Torikoshi
Date: 24 April 2020, 05:56:01
Subject: Re: [Proposal] Add accumulated statistics for wait event

From: Prabhat Sahu
Date: 24 April 2020, 07:55:41
Subject: Re: [Proposal] Global temporary tables

Re: Trying to pull up EXPR SubLinks - Mailing list pgsql-hackers

Attachment

Previous

Next