Re: Seq scan instead of index scan querying single row from primary key on large table - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Seq scan instead of index scan querying single row from primary key on large table
Date
Msg-id 3990647.1721334167@sss.pgh.pa.us
Whole thread Raw
In response to Re: Seq scan instead of index scan querying single row from primary key on large table  (James Coleman <jtc331@gmail.com>)
Responses Re: Seq scan instead of index scan querying single row from primary key on large table
List pgsql-hackers
James Coleman <jtc331@gmail.com> writes:
> On Thu, Jul 18, 2024 at 2:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Are you really sure nothing changed about what the ORM is emitting?

> Yes, aside from the fact that application code didn't change, we
> reproduced the problem by restoring a physical snapshot of the
> database and were able to get the bad plan and then get it to change
> to the good plan by analyzing object_audits.

OK.  After staring at this for awhile, I think I see what's happening,
and you're right that it's basically a problem with how we do LIMIT.

Consider this simple test case:

regression=# create table t1 (id int primary key, link int);
CREATE TABLE
regression=# insert into t1 select id, id/1000 from generate_series(1,1000000) id;
INSERT 0 1000000
regression=# vacuum analyze t1;
VACUUM
regression=# set enable_indexscan TO 0;
SET
regression=# set enable_bitmapscan TO 0;
SET
regression=# set max_parallel_workers_per_gather TO 0;
SET
regression=# explain select * from t1 where id = 42 limit 1;
                         QUERY PLAN
------------------------------------------------------------
 Limit  (cost=0.00..16925.00 rows=1 width=8)
   ->  Seq Scan on t1  (cost=0.00..16925.00 rows=1 width=8)
         Filter: (id = 42)
(3 rows)

Because we'll pick the winning Limit plan based just on its
total_cost, this set of cost assignments amounts to assuming that the
one matching tuple will appear at the end of the seqscan.  That's
overly conservative perhaps, but it's fine.

But now look what happens when we join to another table that has many
matching rows:

regression=# explain select * from t1 t1 left join t1 t2 on (t1.id=t2.link) where t1.id = 42 limit 1;
                              QUERY PLAN
-----------------------------------------------------------------------
 Limit  (cost=0.00..33.96 rows=1 width=16)
   ->  Nested Loop Left Join  (cost=0.00..33859.97 rows=997 width=16)
         ->  Seq Scan on t1  (cost=0.00..16925.00 rows=1 width=8)
               Filter: (id = 42)
         ->  Seq Scan on t1 t2  (cost=0.00..16925.00 rows=997 width=8)
               Filter: (link = 42)
(6 rows)

We have, basically, completely forgotten about that conservative
positioning estimate: this set of costs amounts to assuming that the
first matching tuple will be found 1/1000th of the way through the
entire join, which is a lot less than what we'd just decided it would
cost to retrieve the first t1 tuple.  In your example, with over 13
million rows to be joined, that allows us to make a ridiculously small
estimate of the time to find the first match, and so it's able to
win a cost comparison against the far-more-sanely-estimated indexscan
plan.

I'm not sure what to do about this.  I think that things might work
out better if we redefined the startup cost as "estimated cost to
retrieve the first tuple", rather than its current very-squishy
definition as "cost to initialize the scan".  That would end up
with the LIMIT node having a cost that's at least the sum of the
startup costs of the input scans, which would fix this problem.
But changing that everywhere would be a lotta work, and I'm far
from sure that it would not have any negative side-effects.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: documentation structure
Next
From: Nathan Bossart
Date:
Subject: Re: improve performance of pg_dump with many sequences