Thread: BUG #16824: Planner chooses poor path on query with Merge Join and pagination

BUG #16824: Planner chooses poor path on query with Merge Join and pagination

From

PG Bug reporting form

Date:

13 January 2021, 22:13:55

The following bug has been logged on the website:

Bug reference:      16824
Logged by:          Jacob Mitash
Email address:      jacobmitash@gmail.com
PostgreSQL version: 13.1
Operating system:   Alpine Linux (Docker container)
Description:

Hi,

I have been working with this relatively simple query that is used to
iterate over a large subset of rows across two tables, one page at a time.
The query was originally using an OFFSET/LIMIT to execute the paging, which
you might guess performs poorly in the later pages. I changed the query to
do pagination using a combination of ORDER BY id, WHERE id >
id_from_prev_page, and LIMIT page_size. I'm not sure if that strategy has a
name, but I'll refer to it as "ID pagination".

My dataset involves just two tables, described by this DDL:
```
-- DDL:
create table sub ( -- 4.3M records
    subscription_id     varchar(255) -- UUIDs
        not null constraint pk_sub primary key,
    subscription_status varchar(255) -- (ACTIVE, PENDING, or CANCELLED)
);

create table sub_item ( -- 4.4M records, mostly 1:1 with subscriptions
    subscription_item_id varchar(255) -- UUIDs
        not null constraint pk_sub_item primary key,
    subscription_id      varchar(255) -- FK to sub
        constraint fk_sub_item_sub_id references sub,
    item_ref             varchar(255) -- 25 character strings, mostly the
same
);
```

And here's the query I've been using to select pages while testing
performance. The literal value of the subscription_id is to represent
grabbing a page in the middle of the data.
```
-- Query: Grab a page from the middle
EXPLAIN ANALYZE
SELECT s.subscription_id, *
FROM sub s
    JOIN sub_item si ON s.subscription_id = si.subscription_id AND
si.item_ref = '1001107599999910222001000'
WHERE s.subscription_status = 'ACTIVE'
  AND s.SUBSCRIPTION_ID > '7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'
ORDER BY s.subscription_id
LIMIT 50;
```

My understanding of this ID pagination is that it should be very quick as it
needs only find the ID in the index, and then scan the next 50 entries.
However, the execution time is slow:
```
Limit  (cost=2.41..325.76 rows=50 width=179) (actual
time=10336.095..10340.389 rows=50 loops=1)
  ->  Merge Join  (cost=2.41..765832.96 rows=118421 width=179) (actual
time=10336.086..10339.900 rows=50 loops=1)
        Merge Cond: ((s.subscription_id)::text =
(si.subscription_id)::text)
        ->  Index Scan using pk_sub on sub s  (cost=0.56..270648.21
rows=632913 width=45) (actual time=0.025..1.643 rows=90 loops=1)
              Index Cond: ((subscription_id)::text >
'7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'::text)
              Filter: ((subscription_status)::text = 'ACTIVE'::text)
              Rows Removed by Filter: 211
        ->  Index Scan using idx_sub_item_sub_id_btree on sub_item si
(cost=0.56..490385.99 rows=813151 width=98) (actual time=0.016..8260.586
rows=393058 loops=1)
              Filter: ((item_ref)::text =
'1001107599999910222001000'::text)
              Rows Removed by Filter: 1742475
Planning Time: 0.309 ms
Execution Time: 10340.691 ms
```

Just fiddling around, I saw that performance significantly improved when I
ran: SET enable_mergejoin = OFF;
```
Limit  (cost=1.11..446.97 rows=50 width=179) (actual time=0.150..4.351
rows=50 loops=1)
  ->  Nested Loop  (cost=1.11..1055964.59 rows=118421 width=179) (actual
time=0.140..3.850 rows=50 loops=1)
        ->  Index Scan using pk_sub on sub s  (cost=0.56..270648.21
rows=632913 width=45) (actual time=0.064..0.898 rows=90 loops=1)
              Index Cond: ((subscription_id)::text >
'7ca1cf6b-2452-4d1b-bd03-1ba68b63b528'::text)
              Filter: ((subscription_status)::text = 'ACTIVE'::text)
              Rows Removed by Filter: 211
        ->  Index Scan using idx_sub_item_sub_id_btree on sub_item si
(cost=0.56..1.23 rows=1 width=98) (actual time=0.013..0.016 rows=1
loops=90)
              Index Cond: ((subscription_id)::text =
(s.subscription_id)::text)
              Filter: ((item_ref)::text =
'1001107599999910222001000'::text)
              Rows Removed by Filter: 0
Planning Time: 0.196 ms
Execution Time: 4.623 ms
```

I originally asked about this on the DBA StackExchange. I had two questions
to which I'll summarize here:

1. Why is the Merge Join performing so slowly?
It seems to be because the planner doesn't recognize that it can apply the
subscription_id index condition on the inner table. If I explicitly tell it:
"AND si.subscription_id > '7ca1...'", then it applies an index condition and
is almost instant.
2. Why does the planner believe that Merge Join (as-is) is optimal here?
I don't have an answer for this. Potentially a bug related to the previous
answer?

So I think there's two things going on here that may or may not be
classified as bugs:
1. Using Merge Join, the planner does not apply an index condition to the
inner table that could be implied from the clause on the outer table.
2. In the same scenario, the planner's cost estimation is grossly different
from the actual execution time.

Re: BUG #16824: Planner chooses poor path on query with Merge Join and pagination

From

Tom Lane

Date:

14 January 2021, 00:16:39

PG Bug reporting form <noreply@postgresql.org> writes:
> My understanding of this ID pagination is that it should be very quick as it
> needs only find the ID in the index, and then scan the next 50 entries.

Well, that's what it's doing, so far as the "sub s" scan is concerned.
Your beef is with the subsequent join.

> 1. Why is the Merge Join performing so slowly?
> It seems to be because the planner doesn't recognize that it can apply the
> subscription_id index condition on the inner table. If I explicitly tell it:
> "AND si.subscription_id > '7ca1...'", then it applies an index condition and
> is almost instant.

We don't attempt to infer derived inequalities.  Given "a = b AND b = c",
the planner will deduce "a = c".  However, given "a = b AND b > c", we
do not deduce "a > c".  This is an empirical decision based on the
frequency with which such a deduction would help versus the planner
cycles that would be spent looking for such cases.

> 2. Why does the planner believe that Merge Join (as-is) is optimal here?

The cost is estimated to be slightly lower.  It might be right, so
far as the time to run the join to completion (without a LIMIT) is
concerned.  Large nestloop joins tend to suck :-(.  But the reason
that it then makes the wrong choice with the LIMIT applied,
fundamentally, is that the fraction of the total cost that will
actually be incurred with the LIMIT present is nonlinear, and it
doesn't know that.  Doing better is a research problem.

In short, there's nothing here that I'd call a bug that we're likely
to fix anytime soon.  In the meantime, if you can improve matters
by manually injecting the extra inequality, that seems like the
thing to do.

            regards, tom lane

Re: BUG #16824: Planner chooses poor path on query with Merge Join and pagination

From

Kisung Kim

Date:

16 January 2021, 22:34:17

Tom Lane-2 wrote
> This is an empirical decision based on the
> frequency with which such a deduction would help versus the planner
> cycles that would be spent looking for such cases.

Is there any thing referring to this decision, like mail threads or
conference materials?



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html

Re: BUG #16824: Planner chooses poor path on query with Merge Join and pagination

From

Tom Lane

Date:

16 January 2021, 23:28:23

Kisung Kim <kskim80@gmail.com> writes:
> Tom Lane-2 wrote
>> This is an empirical decision based on the
>> frequency with which such a deduction would help versus the planner
>> cycles that would be spent looking for such cases.

> Is there any thing referring to this decision, like mail threads or
> conference materials?

[ shrug... ]  It's doubtless been discussed on pgsql-hackers in the past,
but I don't plan to go digging through those archives for you.  Looking
for constraint exclusion via

https://www.postgresql.org/search/?m=1

or your favorite search engine might help you.

            regards, tom lane