Re: Improvement discussion of custom and generic plans - Mailing list pgsql-hackers
From | Quan Zongliang |
---|---|
Subject | Re: Improvement discussion of custom and generic plans |
Date | |
Msg-id | c8389d1b-e34d-4a93-a972-b22dad529c29@yeah.net Whole thread Raw |
In response to | Re: Improvement discussion of custom and generic plans (Quan Zongliang <quanzongliang@yeah.net>) |
List | pgsql-hackers |
Add the GUC parameter. On 2024/1/30 21:25, Quan Zongliang wrote: > > > On 2023/11/3 15:27, Quan Zongliang wrote: >> Hi >> >> We have one such problem. A table field has skewed data. Statistics: >> n_distinct | -0.4481973 >> most_common_vals | {5f006ca25b52ed78e457b150ee95a30c} >> most_common_freqs | {0.5518474} >> >> >> Data generation: >> >> CREATE TABLE s_user ( >> user_id varchar(32) NOT NULL, >> corp_id varchar(32), >> status int NOT NULL >> ); >> >> insert into s_user >> select md5('user_id ' || a), md5('corp_id ' || a), >> case random()<0.877675 when true then 1 else -1 end >> FROM generate_series(1,10031) a; >> >> insert into s_user >> select md5('user_id ' || a), md5('corp_id 10032'), >> case random()<0.877675 when true then 1 else -1 end >> FROM generate_series(10031,22383) a; >> >> CREATE INDEX s_user_corp_id_idx ON s_user USING btree (corp_id); >> >> analyze s_user; >> >> >> 1. First, define a PREPARE statement >> prepare stmt as select count(*) from s_user where status=1 and corp_id >> = $1; >> >> 2. Run it five times. Choose the custom plan. >> explain (analyze,buffers) execute >> stmt('5f006ca25b52ed78e457b150ee95a30c'); >> >> Here's the plan: >> Aggregate (cost=639.84..639.85 rows=1 width=8) (actual >> time=4.653..4.654 rows=1 loops=1) >> Buffers: shared hit=277 >> -> Seq Scan on s_user (cost=0.00..612.76 rows=10830 width=0) >> (actual time=1.402..3.747 rows=10836 loops=1) >> Filter: ((status = 1) AND ((corp_id)::text = >> '5f006ca25b52ed78e457b150ee95a30c'::text)) >> Rows Removed by Filter: 11548 >> Buffers: shared hit=277 >> Planning Time: 0.100 ms >> Execution Time: 4.674 ms >> (8 rows) >> >> 3.From the sixth time. Choose generic plan. >> We can see that there is a huge deviation between the estimate and the >> actual value: >> Aggregate (cost=11.83..11.84 rows=1 width=8) (actual >> time=4.424..4.425 rows=1 loops=1) >> Buffers: shared hit=154 read=13 >> -> Bitmap Heap Scan on s_user (cost=4.30..11.82 rows=2 width=0) >> (actual time=0.664..3.371 rows=10836 loops=1) >> Recheck Cond: ((corp_id)::text = $1) >> Filter: (status = 1) >> Rows Removed by Filter: 1517 >> Heap Blocks: exact=154 >> Buffers: shared hit=154 read=13 >> -> Bitmap Index Scan on s_user_corp_id_idx >> (cost=0.00..4.30 rows=2 width=0) (actual time=0.635..0.635 rows=12353 >> loops=1) >> Index Cond: ((corp_id)::text = $1) >> Buffers: shared read=13 >> Planning Time: 0.246 ms >> Execution Time: 4.490 ms >> (13 rows) >> >> This is because in the choose_custom_plan function, the generic plan >> is attempted after executing the custom plan five times. >> >> if (plansource->num_custom_plans < 5) >> return true; >> >> The generic plan uses var_eq_non_const to estimate the average >> selectivity. >> >> These are facts that many people already know. So a brief introduction. >> >> >> Our users actually use such parameter conditions in very complex >> PREPARE statements. Once they use the generic plan for the sixth time. >> The execution time will change from 5 milliseconds to 5 minutes. >> >> >> To improve this problem. The following approaches can be considered: >> >> 1. Determine whether data skew exists in the PREPARE statement >> parameter conditions based on the statistics. >> However, there is no way to know if the user will use the skewed >> parameter. >> >> 2.When comparing the cost of the generic plan with the average cost of >> the custom plan(function choose_custom_plan). Consider whether the >> maximum cost of a custom plan executed is an order of magnitude >> different from the cost of a generic plan. >> If the first five use a small selectivity condition. And after the >> sixth use a high selectivity condition. Problems will still arise. >> >> 3.Trace the execution time of the PREPARE statement. When an execution >> time is found to be much longer than the average execution time, the >> custom plan is forced to run. >> >> >> Is there any better idea? >> > I tried to do a demo. Add a member paramid to Const. When Const is > generated by Param, the Const is identified as coming from Param. Then > check in var_eq_const to see if the field in the condition using this > parameter is skewed. If so, choose_custom_plan returns true every time, > forcing custom_plan to be used. > Only conditional expressions such as var eq param or param eq var can be > supported. > If it makes sense. Continue to improve this patch. > >> -- >> Quan Zongliang >> >>
Attachment
pgsql-hackers by date: