Really dumb planner decision - Mailing list pgsql-performance
From | Matthew Wakeling |
---|---|
Subject | Really dumb planner decision |
Date | |
Msg-id | alpine.DEB.2.00.0904151800310.4053@aragorn.flymine.org Whole thread Raw |
Responses |
Re: Really dumb planner decision
(Grzegorz Jaśkiewicz <gryzman@gmail.com>)
Re: Really dumb planner decision (Grzegorz Jaśkiewicz <gryzman@gmail.com>) |
List | pgsql-performance |
I have a query that is executed really badly by Postgres. It is a nine table join, where two of the tables are represented in a view. If I remove one of the tables from the query, then the query runs very quickly using a completely different plan. Here is the view: release-16.0-preview-09-apr=# \d locatedsequencefeatureoverlappingfeatures View "public.locatedsequencefeatureoverlappingfeatures" Column | Type | Modifiers ------------------------+---------+----------- overlappingfeatures | integer | locatedsequencefeature | integer | View definition: SELECT l1.subjectid AS overlappingfeatures, l2.subjectid AS locatedsequencefeature FROM location l1, location l2 WHERE l1.objectid = l2.objectid AND l1.subjectid <> l2.subjectid AND bioseg_create(l1.intermine_start, l1.intermine_end)&& bioseg_create(l2.intermine_start, l2.intermine_end); Here is the query that works: SELECT * FROM gene AS a1_, intergenicregion AS a2_, regulatoryregion AS a3_, chromosome AS a4_, location AS a5_, LocatedSequenceFeatureOverlappingFeatures AS indirect0, BioEntitiesDataSets AS indirect1 WHERE a1_.id = 1267676 AND a1_.upstreamIntergenicRegionId = a2_.id AND a2_.id = indirect0.LocatedSequenceFeature AND indirect0.OverlappingFeatures = a3_.id AND a3_.chromosomeid = a4_.id AND a3_.chromosomeLocationId = a5_.id AND a3_.id = indirect1.BioEntities QUERY PLAN ----------------------------------------------------------------- Nested Loop (cost=0.00..44.82 rows=1 width=787) (actual time=18.347..184.178 rows=105 loops=1) -> Nested Loop (cost=0.00..44.54 rows=1 width=626) (actual time=18.329..182.837 rows=105 loops=1) -> Nested Loop (cost=0.00..43.82 rows=1 width=561) (actual time=18.249..180.801 rows=105 loops=1) -> Nested Loop (cost=0.00..43.51 rows=1 width=380) (actual time=10.123..178.471 rows=144 loops=1) -> Nested Loop (cost=0.00..42.85 rows=1 width=372) (actual time=0.854..31.446 rows=142 loops=1) -> Nested Loop (cost=0.00..38.57 rows=1 width=168) (actual time=0.838..29.505 rows=142 loops=1) Join Filter: ((l1.subjectid <> l2.subjectid) AND (l2.objectid = l1.objectid)) -> Nested Loop (cost=0.00..10.02 rows=1 width=176) (actual time=0.207..0.218 rows=1 loops=1) -> Index Scan using gene_pkey on gene a1_ (cost=0.00..4.29 rows=1 width=160) (actual time=0.107..0.110 rows=1 loops=1) Index Cond: (id = 1267676) -> Index Scan using location__key_all on location l2 (cost=0.00..5.70 rows=2 width=16) (actual time=0.090..0.093 rows=1 loops=1) Index Cond: (l2.subjectid = a1_.upstreamintergenicregionid) -> Index Scan using location_bioseg on location l1 (cost=0.00..12.89 rows=696 width=16) (actual time=0.095..26.458 rows=1237 loops=1) Index Cond: (bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start,l2.intermine_end)) -> Index Scan using intergenicregion_pkey on intergenicregion a2_ (cost=0.00..4.27 rows=1 width=204) (actual time=0.004..0.006 rows=1 loops=142) Index Cond: (a2_.id = a1_.upstreamintergenicregionid) -> Index Scan using bioentitiesdatasets__bioentities on bioentitiesdatasets indirect1 (cost=0.00..0.63 rows=2 width=8) (actual time=1.026..1.028 rows=1 loops=142) Index Cond: (indirect1.bioentities = l1.subjectid) -> Index Scan using regulatoryregion_pkey on regulatoryregion a3_ (cost=0.00..0.29 rows=1 width=181) (actual time=0.008..0.009 rows=1 loops=144) Index Cond: (a3_.id = l1.subjectid) -> Index Scan using location_pkey on location a5_ (cost=0.00..0.71 rows=1 width=65) (actual time=0.010..0.012 rows=1 loops=105) Index Cond: (a5_.id = a3_.chromosomelocationid) -> Index Scan using chromosome_pkey on chromosome a4_ (cost=0.00..0.27 rows=1 width=161) (actual time=0.003..0.005 rows=1 loops=105) Index Cond: (a4_.id = a3_.chromosomeid) Total runtime: 184.596 ms (25 rows) Here is the query that does not work: SELECT * FROM gene AS a1_, intergenicregion AS a2_, regulatoryregion AS a3_, chromosome AS a4_, location AS a5_, dataset AS a6_, LocatedSequenceFeatureOverlappingFeatures AS indirect0, BioEntitiesDataSets AS indirect1 WHERE a1_.id = 1267676 AND a1_.upstreamIntergenicRegionId = a2_.id AND a2_.id = indirect0.LocatedSequenceFeature AND indirect0.OverlappingFeatures = a3_.id AND a3_.chromosomeid = a4_.id AND a3_.chromosomeLocationId = a5_.id AND a3_.id = indirect1.BioEntities AND indirect1.DataSets = a6_.id I just left this running overnight, and it hasn't completed an EXPLAIN ANALYSE. It is basically the previous query (which returns 105 rows) with another table attached on a primary key. Should be very quick. QUERY PLAN --------------------------------------------------------------------------- Nested Loop (cost=0.21..49789788.95 rows=1 width=960) -> Nested Loop (cost=0.21..49789788.67 rows=1 width=799) -> Nested Loop (cost=0.21..49789787.94 rows=1 width=734) -> Merge Join (cost=0.21..49789787.64 rows=1 width=553) Merge Cond: (a1_.upstreamintergenicregionid = a2_.id) -> Nested Loop (cost=0.00..99575037.26 rows=2 width=349) -> Nested Loop (cost=0.00..99575036.70 rows=2 width=176) -> Nested Loop (cost=0.00..99575036.05 rows=1 width=168) Join Filter: (a1_.upstreamintergenicregionid = l2.subjectid) -> Index Scan using gene__upstreamintergenicregion on gene a1_ (cost=0.00..6836.09 rows=1 width=160) Index Cond: (id = 1267676) -> Nested Loop (cost=0.00..99507386.51 rows=4865076 width=8) Join Filter: ((l1.subjectid <> l2.subjectid) AND (l1.objectid = l2.objectid)) -> Index Scan using location__key_all on location l1 (cost=0.00..158806.58 rows=3479953 width=16) -> Index Scan using location_bioseg on location l2 (cost=0.00..12.89 rows=696 width=16) Index Cond: (bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start,l2.intermine_end)) -> Index Scan using bioentitiesdatasets__bioentities on bioentitiesdatasets indirect1 (cost=0.00..0.63 rows=2 width=8) Index Cond: (indirect1.bioentities = l1.subjectid) -> Index Scan using dataset_pkey on dataset a6_ (cost=0.00..0.27 rows=1 width=173) Index Cond: (a6_.id = indirect1.datasets) -> Index Scan using intergenicregion_pkey on intergenicregion a2_ (cost=0.00..2132.03 rows=54785 width=204) -> Index Scan using regulatoryregion_pkey on regulatoryregion a3_ (cost=0.00..0.29 rows=1 width=181) Index Cond: (a3_.id = l1.subjectid) -> Index Scan using location_pkey on location a5_ (cost=0.00..0.71 rows=1 width=65) Index Cond: (a5_.id = a3_.chromosomelocationid) -> Index Scan using chromosome_pkey on chromosome a4_ (cost=0.00..0.27 rows=1 width=161) Index Cond: (a4_.id = a3_.chromosomeid) (27 rows) I'm curious about two things - firstly why is it choosing such a dumb way of joining l1 to l2, with a full index scan on l1, where it could use a conditional index scan on l1 as with the working query? Secondly, why is the merge join's cost half that of the nested loop inside it? geqo threshold is set to 15, so this is not the genetic optimiser stuffing up. Besides, it creates the same plan each time. The database is fully analysed with a reasonably high statistics target. Here are all the non-comment entries in postgresql.conf: listen_addresses = '*' # what IP address(es) to listen on; max_connections = 300 # (change requires restart) shared_buffers = 500MB # min 128kB or max_connections*16kB temp_buffers = 100MB # min 800kB work_mem = 2000MB # min 64kB maintenance_work_mem = 1600MB # min 1MB max_stack_depth = 9MB # min 100kB max_fsm_pages = 204800 # min max_fsm_relations*16, 6 bytes each random_page_cost = 2.0 # same scale as above effective_cache_size = 23GB geqo_threshold = 15 default_statistics_target = 500 # range 1-1000 log_destination = 'stderr' # Valid values are combinations of logging_collector = on # Enable capturing of stderr and csvlog log_directory = 'pg_log' # directory where log files are written, log_truncate_on_rotation = on # If on, an existing log file of the log_rotation_age = 1d # Automatic rotation of logfiles will log_rotation_size = 0 # Automatic rotation of logfiles will log_min_duration_statement = 0 # -1 is disabled, 0 logs all statements log_duration = on log_line_prefix = '%t ' # special values: log_statement = 'all' # none, ddl, mod, all datestyle = 'iso, mdy' lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting default_text_search_config = 'pg_catalog.english' Anything I can do to solve this? Matthew -- Surely the value of C++ is zero, but C's value is now 1? -- map36, commenting on the "No, C++ isn't equal to D. 'C' is undeclared [...] C++ should really be called 1" response to "C++ -- shouldn't it be called D?"
pgsql-performance by date: