Thread: BUG #16824: Planner chooses poor path on query with Merge Join and pagination
BUG #16824: Planner chooses poor path on query with Merge Join and pagination
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16824 Logged by: Jacob Mitash Email address: jacobmitash@gmail.com PostgreSQL version: 13.1 Operating system: Alpine Linux (Docker container) Description: Hi, I have been working with this relatively simple query that is used to iterate over a large subset of rows across two tables, one page at a time. The query was originally using an OFFSET/LIMIT to execute the paging, which you might guess performs poorly in the later pages. I changed the query to do pagination using a combination of ORDER BY id, WHERE id > id_from_prev_page, and LIMIT page_size. I'm not sure if that strategy has a name, but I'll refer to it as "ID pagination". My dataset involves just two tables, described by this DDL: ``` -- DDL: create table sub ( -- 4.3M records subscription_id varchar(255) -- UUIDs not null constraint pk_sub primary key, subscription_status varchar(255) -- (ACTIVE, PENDING, or CANCELLED) ); create table sub_item ( -- 4.4M records, mostly 1:1 with subscriptions subscription_item_id varchar(255) -- UUIDs not null constraint pk_sub_item primary key, subscription_id varchar(255) -- FK to sub constraint fk_sub_item_sub_id references sub, item_ref varchar(255) -- 25 character strings, mostly the same ); ``` And here's the query I've been using to select pages while testing performance. The literal value of the subscription_id is to represent grabbing a page in the middle of the data. ``` -- Query: Grab a page from the middle EXPLAIN ANALYZE SELECT s.subscription_id, * FROM sub s JOIN sub_item si ON s.subscription_id = si.subscription_id AND si.item_ref = '1001107599999910222001000' WHERE s.subscription_status = 'ACTIVE' AND s.SUBSCRIPTION_ID > '7ca1cf6b-2452-4d1b-bd03-1ba68b63b528' ORDER BY s.subscription_id LIMIT 50; ``` My understanding of this ID pagination is that it should be very quick as it needs only find the ID in the index, and then scan the next 50 entries. However, the execution time is slow: ``` Limit (cost=2.41..325.76 rows=50 width=179) (actual time=10336.095..10340.389 rows=50 loops=1) -> Merge Join (cost=2.41..765832.96 rows=118421 width=179) (actual time=10336.086..10339.900 rows=50 loops=1) Merge Cond: ((s.subscription_id)::text = (si.subscription_id)::text) -> Index Scan using pk_sub on sub s (cost=0.56..270648.21 rows=632913 width=45) (actual time=0.025..1.643 rows=90 loops=1) Index Cond: ((subscription_id)::text > '7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'::text) Filter: ((subscription_status)::text = 'ACTIVE'::text) Rows Removed by Filter: 211 -> Index Scan using idx_sub_item_sub_id_btree on sub_item si (cost=0.56..490385.99 rows=813151 width=98) (actual time=0.016..8260.586 rows=393058 loops=1) Filter: ((item_ref)::text = '1001107599999910222001000'::text) Rows Removed by Filter: 1742475 Planning Time: 0.309 ms Execution Time: 10340.691 ms ``` Just fiddling around, I saw that performance significantly improved when I ran: SET enable_mergejoin = OFF; ``` Limit (cost=1.11..446.97 rows=50 width=179) (actual time=0.150..4.351 rows=50 loops=1) -> Nested Loop (cost=1.11..1055964.59 rows=118421 width=179) (actual time=0.140..3.850 rows=50 loops=1) -> Index Scan using pk_sub on sub s (cost=0.56..270648.21 rows=632913 width=45) (actual time=0.064..0.898 rows=90 loops=1) Index Cond: ((subscription_id)::text > '7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'::text) Filter: ((subscription_status)::text = 'ACTIVE'::text) Rows Removed by Filter: 211 -> Index Scan using idx_sub_item_sub_id_btree on sub_item si (cost=0.56..1.23 rows=1 width=98) (actual time=0.013..0.016 rows=1 loops=90) Index Cond: ((subscription_id)::text = (s.subscription_id)::text) Filter: ((item_ref)::text = '1001107599999910222001000'::text) Rows Removed by Filter: 0 Planning Time: 0.196 ms Execution Time: 4.623 ms ``` I originally asked about this on the DBA StackExchange. I had two questions to which I'll summarize here: 1. Why is the Merge Join performing so slowly? It seems to be because the planner doesn't recognize that it can apply the subscription_id index condition on the inner table. If I explicitly tell it: "AND si.subscription_id > '7ca1...'", then it applies an index condition and is almost instant. 2. Why does the planner believe that Merge Join (as-is) is optimal here? I don't have an answer for this. Potentially a bug related to the previous answer? So I think there's two things going on here that may or may not be classified as bugs: 1. Using Merge Join, the planner does not apply an index condition to the inner table that could be implied from the clause on the outer table. 2. In the same scenario, the planner's cost estimation is grossly different from the actual execution time.
Re: BUG #16824: Planner chooses poor path on query with Merge Join and pagination
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > My understanding of this ID pagination is that it should be very quick as it > needs only find the ID in the index, and then scan the next 50 entries. Well, that's what it's doing, so far as the "sub s" scan is concerned. Your beef is with the subsequent join. > 1. Why is the Merge Join performing so slowly? > It seems to be because the planner doesn't recognize that it can apply the > subscription_id index condition on the inner table. If I explicitly tell it: > "AND si.subscription_id > '7ca1...'", then it applies an index condition and > is almost instant. We don't attempt to infer derived inequalities. Given "a = b AND b = c", the planner will deduce "a = c". However, given "a = b AND b > c", we do not deduce "a > c". This is an empirical decision based on the frequency with which such a deduction would help versus the planner cycles that would be spent looking for such cases. > 2. Why does the planner believe that Merge Join (as-is) is optimal here? The cost is estimated to be slightly lower. It might be right, so far as the time to run the join to completion (without a LIMIT) is concerned. Large nestloop joins tend to suck :-(. But the reason that it then makes the wrong choice with the LIMIT applied, fundamentally, is that the fraction of the total cost that will actually be incurred with the LIMIT present is nonlinear, and it doesn't know that. Doing better is a research problem. In short, there's nothing here that I'd call a bug that we're likely to fix anytime soon. In the meantime, if you can improve matters by manually injecting the extra inequality, that seems like the thing to do. regards, tom lane
Re: BUG #16824: Planner chooses poor path on query with Merge Join and pagination
From
Kisung Kim
Date:
Tom Lane-2 wrote > This is an empirical decision based on the > frequency with which such a deduction would help versus the planner > cycles that would be spent looking for such cases. Is there any thing referring to this decision, like mail threads or conference materials? -- Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html
Re: BUG #16824: Planner chooses poor path on query with Merge Join and pagination
From
Tom Lane
Date:
Kisung Kim <kskim80@gmail.com> writes: > Tom Lane-2 wrote >> This is an empirical decision based on the >> frequency with which such a deduction would help versus the planner >> cycles that would be spent looking for such cases. > Is there any thing referring to this decision, like mail threads or > conference materials? [ shrug... ] It's doubtless been discussed on pgsql-hackers in the past, but I don't plan to go digging through those archives for you. Looking for constraint exclusion via https://www.postgresql.org/search/?m=1 or your favorite search engine might help you. regards, tom lane