Re: Seq scan instead of index scan querying single row from primary key on large table - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Seq scan instead of index scan querying single row from primary key on large table |
Date | |
Msg-id | 3990647.1721334167@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Seq scan instead of index scan querying single row from primary key on large table (James Coleman <jtc331@gmail.com>) |
Responses |
Re: Seq scan instead of index scan querying single row from primary key on large table
|
List | pgsql-hackers |
James Coleman <jtc331@gmail.com> writes: > On Thu, Jul 18, 2024 at 2:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Are you really sure nothing changed about what the ORM is emitting? > Yes, aside from the fact that application code didn't change, we > reproduced the problem by restoring a physical snapshot of the > database and were able to get the bad plan and then get it to change > to the good plan by analyzing object_audits. OK. After staring at this for awhile, I think I see what's happening, and you're right that it's basically a problem with how we do LIMIT. Consider this simple test case: regression=# create table t1 (id int primary key, link int); CREATE TABLE regression=# insert into t1 select id, id/1000 from generate_series(1,1000000) id; INSERT 0 1000000 regression=# vacuum analyze t1; VACUUM regression=# set enable_indexscan TO 0; SET regression=# set enable_bitmapscan TO 0; SET regression=# set max_parallel_workers_per_gather TO 0; SET regression=# explain select * from t1 where id = 42 limit 1; QUERY PLAN ------------------------------------------------------------ Limit (cost=0.00..16925.00 rows=1 width=8) -> Seq Scan on t1 (cost=0.00..16925.00 rows=1 width=8) Filter: (id = 42) (3 rows) Because we'll pick the winning Limit plan based just on its total_cost, this set of cost assignments amounts to assuming that the one matching tuple will appear at the end of the seqscan. That's overly conservative perhaps, but it's fine. But now look what happens when we join to another table that has many matching rows: regression=# explain select * from t1 t1 left join t1 t2 on (t1.id=t2.link) where t1.id = 42 limit 1; QUERY PLAN ----------------------------------------------------------------------- Limit (cost=0.00..33.96 rows=1 width=16) -> Nested Loop Left Join (cost=0.00..33859.97 rows=997 width=16) -> Seq Scan on t1 (cost=0.00..16925.00 rows=1 width=8) Filter: (id = 42) -> Seq Scan on t1 t2 (cost=0.00..16925.00 rows=997 width=8) Filter: (link = 42) (6 rows) We have, basically, completely forgotten about that conservative positioning estimate: this set of costs amounts to assuming that the first matching tuple will be found 1/1000th of the way through the entire join, which is a lot less than what we'd just decided it would cost to retrieve the first t1 tuple. In your example, with over 13 million rows to be joined, that allows us to make a ridiculously small estimate of the time to find the first match, and so it's able to win a cost comparison against the far-more-sanely-estimated indexscan plan. I'm not sure what to do about this. I think that things might work out better if we redefined the startup cost as "estimated cost to retrieve the first tuple", rather than its current very-squishy definition as "cost to initialize the scan". That would end up with the LIMIT node having a cost that's at least the sum of the startup costs of the input scans, which would fix this problem. But changing that everywhere would be a lotta work, and I'm far from sure that it would not have any negative side-effects. regards, tom lane
pgsql-hackers by date: