Thread: disabling seqscan not using primary key index?

disabling seqscan not using primary key index?

From
Luca Ferrari
Date:
Hi all,
doing a little and trivial experiment, I decided to populate a table
with a primary key (and thus an automatically generated btree index):

testdb=# create table foo( pk serial primary key, i int );
CREATE TABLE
testdb=# insert into foo( i ) select v from generate_series( 1, 1000000 ) v;
INSERT 0 1000000
testdb=# explain select * from foo where i > 10;
                        QUERY PLAN
------------------------------------------------------------
Seq Scan on foo  (cost=0.00..16925.00 rows=999900 width=8)
  Filter: (i > 10)
(2 rows)




So far so good, but if I disable seqscan I would expect the planner to
choose the primary key index, because that "should be" the preferred
way to access the table.
On the other hand, the planner enables JIT machinery and executes
again a seqsca.

testdb=# set enable_seqscan to off;
SET
testdb=# explain select * from foo where i > 10;
                                 QUERY PLAN
-------------------------------------------------------------------------------
Seq Scan on foo  (cost=10000000000.00..10000016925.00 rows=999900 width=8)
  Filter: (i > 10)
JIT:
  Functions: 2
  Options: Inlining true, Optimization true, Expressions true, Deforming true
(5 rows)



Now, I know the planner has done things right, and thus has selected
the minor cost plan for the situation, but I would like to better
understand why the primary key index is excluded and why the JIT
machinery is used in this case.

I know that columns pk and i are pretty identitcal, and the same query
made against the column pk do exploits the index:

testdb=# explain select * from foo where pk > 10;
                                 QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on foo  (cost=6243.89..14835.76 rows=333350 width=8)
  Recheck Cond: (pk > 10)
  ->  Bitmap Index Scan on foo_pkey  (cost=0.00..6160.55 rows=333350 width=0)
        Index Cond: (pk > 10)
(4 rows)


and the final cost using the index is much lower than the cost of the
seqscan + JIT, that makes me think the index should have been used
also in the previous query.
Any suggestion to better understand?

Please note that even with an extended statistic, the planner does not
choose to use an index:

testdb=# create statistics st_foo ( dependencies ) on pk,i from foo;
CREATE STATISTICS
testdb=# analyze foo;
ANALYZE


testdb=# set enable_seqscan to off;
SET
testdb=# explain select * from foo where i > 10;
                                 QUERY PLAN
-------------------------------------------------------------------------------
Seq Scan on foo  (cost=10000000000.00..10000016925.00 rows=999900 width=8)
  Filter: (i > 10)
JIT:
  Functions: 2
  Options: Inlining true, Optimization true, Expressions true, Deforming true
(5 rows)

Even doing a quick computation about how much does it cost to get from
index to the table for all tuples seems to me much lower than the
initial cost for a not-sequentially-enabled cost:

testdb=# select sum( relpages + reltuples * current_setting(
'cpu_tuple_cost' )::numeric ) from pg_class where relname in ( 'foo',
'foo_pkey' );
 sum
-------
27170


I'm clearly missing something here.

Thanks,
Luca



Re: disabling seqscan not using primary key index?

From
David Rowley
Date:
On Sun, 16 May 2021, 12:15 am Luca Ferrari, <fluca1978@gmail.com> wrote:
> So far so good, but if I disable seqscan I would expect the planner to
> choose the primary key index, because that "should be" the preferred
> way to access the table.
> On the other hand, the planner enables JIT machinery and executes
> again a seqsca.

The answer is fairly simple, the planner just never considers using
the primary key index as there are no possible cases where it would be
useful.  There are no quals it can help filter and there is no
ordering require that it can help provide presorted input for.  If
you'd added an ORDER BY pk, you'll notice the planner does consider
the index and it does come out much cheaper than the penalised seq
scan. So the planner had no choice but to use the seqscan.

You should also be aware that the majority of the time when you
disable a given planner node that we only just add a large startup
cost penalty when costing paths for that node type.  There are a
handful of nodes that are hard disabled.  The reason we just add the
large penalty rather than stop that node it being used is that in many
cases we'd just fail to produce a plan due to there being no other
means to get the required results.

As for JIT being enabled.  The query's cost is above jit_above_cost,
so JIT is enabled. The reason or that is that enable_seqscan TO off
added the startup penalty which pushed the plan's cost well above the
jit threshold.

David



Re: disabling seqscan not using primary key index?

From
Luca Ferrari
Date:
On Sat, May 15, 2021 at 4:40 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> The answer is fairly simple, the planner just never considers using
> the primary key index as there are no possible cases where it would be
> useful.

Does this mean that any UNIQUE constraint on the table is subject to
the same consideration?

> There are no quals it can help filter and there is no
> ordering require that it can help provide presorted input for.  If
> you'd added an ORDER BY pk, you'll notice the planner does consider
> the index and it does come out much cheaper than the penalised seq
> scan. So the planner had no choice but to use the seqscan.

Yes, of course with an order by I would have been expecting the index,
my doubt was about the fact that was not used even with seqscan to
off, and since you explained that the primary index is never used, now
it does make sense to me.

>
> You should also be aware that the majority of the time when you
> disable a given planner node that we only just add a large startup
> cost penalty when costing paths for that node type.  There are a
> handful of nodes that are hard disabled.  The reason we just add the
> large penalty rather than stop that node it being used is that in many
> cases we'd just fail to produce a plan due to there being no other
> means to get the required results.
>

Yes, thanks, I understood this: the planner must have some choices, so
in order to "disable" a plan PostgreSQL raises the cost in the "hope"
the disabled access method will not be chosen.


> As for JIT being enabled.  The query's cost is above jit_above_cost,
> so JIT is enabled. The reason or that is that enable_seqscan TO off
> added the startup penalty which pushed the plan's cost well above the
> jit threshold.

Ah, shame on me, I forgot about the jit_above_cost!

Thanks,
Luca



Re: disabling seqscan not using primary key index?

From
Tom Lane
Date:
Luca Ferrari <fluca1978@gmail.com> writes:
> On Sat, May 15, 2021 at 4:40 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> The answer is fairly simple, the planner just never considers using
>> the primary key index as there are no possible cases where it would be
>> useful.

> Does this mean that any UNIQUE constraint on the table is subject to
> the same consideration?

David's statement applies to any index.  Per the code in indxpath.c:

     * 4. Generate an indexscan path if there are relevant restriction clauses
     * in the current clauses, OR the index ordering is potentially useful for
     * later merging or final output ordering, OR the index has a useful
     * predicate, OR an index-only scan is possible.

If none of those cases apply, an indexscan is guaranteed to be worse
than a seqscan, so we don't consider it.

            regards, tom lane