Re: Major performance degradation with joins in 15.8 or 15.7? - Mailing list pgsql-performance
From | Ed Sabol |
---|---|
Subject | Re: Major performance degradation with joins in 15.8 or 15.7? |
Date | |
Msg-id | 4190FC09-590B-443F-9064-88B6F7C1EBE2@gmail.com Whole thread Raw |
In response to | Re: Major performance degradation with joins in 15.8 or 15.7? (Andrei Lepikhov <lepihov@gmail.com>) |
Responses |
Re: Major performance degradation with joins in 15.8 or 15.7?
|
List | pgsql-performance |
On Nov 7, 2024, at 9:54 PM, Andrei Lepikhov <lepihov@gmail.com> wrote: > On 11/8/24 09:45, Ed Sabol wrote: >> On Nov 7, 2024, at 9:27 PM, Andrei Lepikhov <lepihov@gmail.com> wrote: >>> Postgres didn't want Materialize in this example because of the low estimation on its outer subquery. AFAIC, by increasingthe *_page_cost's value, you added extra weight to the inner subquery >> What kind of extended statistics do you suggest for this? ndistinct, dependencies, or mcv? >> CREATE STATISTICS tablename_stats (<statistics type?>) ON relation, type FROM tablename; >> ANALYZE tablename; > I'd recommend to use all of them - MCV is helpful in most of the cases (and relatively cheap), distinct is actually usedin Postgres now to calculate number of groups (GROUP-BY, Sort, Memoize, etc.); dependencies - to find correlations betweencolumns - usually in scan filters. OK, I've executed the following: CREATE STATISTICS tablename_stats_rt_nd (ndistinct) ON relation, type FROM tablename; CREATE STATISTICS tablename_stats_rt_mcv (mcv) ON relation, type FROM tablename; CREATE STATISTICS tablename_stats_rt_dep (dependencies) ON relation, type FROM tablename; CREATE STATISTICS tablename_stats_rv_nd (ndistinct) ON relation, value FROM tablename; CREATE STATISTICS tablename_stats_rv_mcv (mcv) ON relation, value FROM tablename; CREATE STATISTICS tablename_stats_rv_dep (dependencies) ON relation, value FROM tablename; CREATE STATISTICS tablename_stats_nr_nd (ndistinct) ON name, relation FROM tablename; CREATE STATISTICS tablename_stats_nr_mcv (mcv) ON name, relation FROM tablename; CREATE STATISTICS tablename_stats_nr_dep (dependencies) ON name, relation FROM tablename; CREATE STATISTICS tablename_stats_nt_nd (ndistinct) ON name, type FROM tablename; CREATE STATISTICS tablename_stats_nt_mcv (mcv) ON name, type FROM tablename; CREATE STATISTICS tablename_stats_nt_dep (dependencies) ON name, type FROM tablename; CREATE STATISTICS tablename_stats_nv_nd (ndistinct) ON name, value FROM tablename; CREATE STATISTICS tablename_stats_nv_mcv (mcv) ON name, value FROM tablename; CREATE STATISTICS tablename_stats_nv_dep (dependencies) ON name, value FROM tablename; ANALYZE tablename; Now with random_page_cost = 4.0, the optimizer materializes, and it's fast again: Nested Loop (cost=1226.12..11129.87 rows=1 width=112) (actual time=30.965..31.333 rows=1 loops=1) Join Filter: (a.name = d.name) Buffers: shared hit=7447 -> Nested Loop (cost=1225.70..11112.51 rows=1 width=108) (actual time=30.921..31.208 rows=1 loops=1) Buffers: shared hit=7418 -> Hash Join (cost=1225.27..11093.62 rows=1 width=86) (actual time=30.862..31.078 rows=1 loops=1) Hash Cond: ((a.name || '.doc'::text) = b_1.name) Buffers: shared hit=7389 -> Nested Loop (cost=1167.53..11019.89 rows=11 width=70) (actual time=27.143..27.347 rows=1 loops=1) Join Filter: (CASE WHEN ("position"(a.name, 'zz'::text) = 1) THEN a.name ELSE ('h_'::text || a.name)END = "*SELECT* 1".table_name) Rows Removed by Join Filter: 1021 Buffers: shared hit=6268 -> Bitmap Heap Scan on tablename a (cost=456.55..5407.28 rows=1077 width=38) (actual time=2.986..15.865rows=1022 loops=1) Recheck Cond: (relation = 'description'::text) Filter: (type = 'table'::text) Rows Removed by Filter: 37044 Heap Blocks: exact=4024 Buffers: shared hit=4065 -> Bitmap Index Scan on tablename_r (cost=0.00..456.29 rows=38915 width=0) (actual time=2.336..2.336rows=44980 loops=1) Index Cond: (relation = 'description'::text) Buffers: shared hit=41 -> Materialize (cost=710.98..5564.15 rows=2 width=64) (actual time=0.008..0.009 rows=1 loops=1022) Buffers: shared hit=2203 -> Append (cost=710.98..5564.14 rows=2 width=64) (actual time=7.519..7.548 rows=1 loops=1) Buffers: shared hit=2203 -> Subquery Scan on "*SELECT* 1" (cost=710.98..3537.89 rows=1 width=64) (actual time=6.629..6.636rows=0 loops=1) Buffers: shared hit=1380 -> Bitmap Heap Scan on tablename (cost=710.98..3537.88 rows=1 width=96) (actualtime=6.628..6.633 rows=0 loops=1) Recheck Cond: ((relation = ANY ('{start_time,end_time,dataset_id_column,dataset_id_prefix,original_mission_name,defaultSearchRadius,author,tableType,bibcode,priority,regime,author,includesTypes,Mission,subject}'::text[])) AND(type = 'table'::text)) Filter: ((CASE relation WHEN 'Mission'::text THEN upper(value) ELSE value END= 'foo'::text) AND (CASE relation WHEN 'defaultSearchRadius'::text THEN 'default_search_radius'::text WHEN 'Mission'::textTHEN 'o_name'::text WHEN 'priority'::text THEN 'table_priority'::text WHEN 'bibcode'::text THEN 'catalog_bibcode'::textWHEN 'regime'::text THEN 'frequency_regime'::text WHEN 'author'::text THEN 'table_author'::text WHEN'tableType'::text THEN 'table_type'::text WHEN 'subject'::text THEN 'row_type'::text ELSE relation END = 'o_name'::text)) Rows Removed by Filter: 8253 Heap Blocks: exact=1276 Buffers: shared hit=1380 -> BitmapAnd (cost=710.94..710.94 rows=1275 width=0) (actual time=3.346..3.350rows=0 loops=1) Buffers: shared hit=104 -> Bitmap Index Scan on tablename_r (cost=0.00..134.96 rows=9145 width=0)(actual time=0.573..0.574 rows=9998 loops=1) Index Cond: (relation = ANY ('{start_time,end_time,dataset_id_column,dataset_id_prefix,original_mission_name,defaultSearchRadius,author,tableType,bibcode,priority,regime,author,includesTypes,Mission,subject}'::text[])) Buffers: shared hit=49 -> Bitmap Index Scan on tablename_t (cost=0.00..575.73 rows=49507 width=0)(actual time=2.693..2.693 rows=59373 loops=1) Index Cond: (type = 'table'::text) Buffers: shared hit=55 -> Subquery Scan on "*SELECT* 5" (cost=10.28..2026.24 rows=1 width=64) (actual time=0.886..0.904rows=1 loops=1) Buffers: shared hit=823 -> Bitmap Heap Scan on tablename tablename_1 (cost=10.28..2026.23 rows=1 width=96)(actual time=0.884..0.899 rows=1 loops=1) Recheck Cond: (relation = 'containedBy'::text) Filter: ((substr(value, 1, 8) = 'mission:'::text) AND (upper("substring"(value,9)) = 'foo'::text)) Rows Removed by Filter: 721 Heap Blocks: exact=820 Buffers: shared hit=823 -> Bitmap Index Scan on tablename_r (cost=0.00..10.28 rows=781 width=0) (actualtime=0.085..0.085 rows=905 loops=1) Index Cond: (relation = 'containedBy'::text) Index Cond: (relation = 'containedBy'::text) Buffers: shared hit=3 -> Hash (cost=44.87..44.87 rows=1030 width=38) (actual time=5.334..5.342 rows=1025 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 124kB Buffers: shared hit=1121 -> Bitmap Heap Scan on tablename b_1 (cost=33.06..44.87 rows=1030 width=38) (actual time=1.157..4.018rows=1025 loops=1) Recheck Cond: ((relation = 'located'::text) AND (type = 'document'::text)) Heap Blocks: exact=1113 Buffers: shared hit=1121 -> BitmapAnd (cost=33.06..33.06 rows=3 width=0) (actual time=0.765..0.769 rows=0 loops=1) Buffers: shared hit=8 -> Bitmap Index Scan on tablename_r (cost=0.00..16.15 rows=1030 width=0) (actual time=0.347..0.347rows=1227 loops=1) Index Cond: (relation = 'located'::text) Buffers: shared hit=4 -> Bitmap Index Scan on tablename_t (cost=0.00..16.15 rows=1030 width=0) (actual time=0.314..0.315rows=1227 loops=1) Index Cond: (type = 'document'::text) Buffers: shared hit=4 -> Index Scan using tablename_n on tablename c (cost=0.42..18.88 rows=1 width=22) (actual time=0.048..0.115 rows=1loops=1) Index Cond: (name = a.name) Filter: (relation = 'lastUpdated'::text) Rows Removed by Filter: 58 Buffers: shared hit=29 -> Index Scan using tablename_n on tablename d (cost=0.42..17.33 rows=1 width=22) (actual time=0.034..0.104 rows=1 loops=1) Index Cond: (name = c.name) Filter: (relation = 'rowcount'::text) Rows Removed by Filter: 58 Buffers: shared hit=29 Planning: Buffers: shared hit=64 Planning Time: 5.086 ms Execution Time: 32.226 ms (81 rows) This was a nice learning experience and I hope it will help with performance going forward, but I still think I'm going tokeep random_page_cost = 2.0. None of this really explains why this became a problem after ~10 years of it not being one, but I think the only likely reasonis that the table just grew gradually over time and reached some threshold that changed the optimizer's plan very adversely. Thanks, Ed
pgsql-performance by date: