Thread: DELETE not seeming to use the PK index..

DELETE not seeming to use the PK index..

From
Wells Oliver
Date:
The table mytable has a PK of pkcol1 integer, pkcol2 guid, pkcol3 smallint, pkcol4 smallint.

Trying to DELETE FROM mytable WHERE pkcol1 IN (SELECT pkcol1 from temptable)

Where there might be 30 records in temptable.

This is the explain:

Delete on mytable  (cost=2166.10..6429628.06 rows=36519504 width=12) (actual time=75240.118..75240.118 rows=0 loops=1)
  ->  Hash Join  (cost=2166.10..6429628.06 rows=36519504 width=12) (actual time=75240.100..75240.102 rows=0 loops=1)
        Hash Cond: (mytable.pkcol1 = temptable.pkcol1)
        ->  Seq Scan on mytable  (cost=0.00..5829455.08 rows=73039008 width=10) (actual time=313.337..66846.625 rows=73046795 loops=1)
        ->  Hash  (cost=2163.60..2163.60 rows=200 width=10) (actual time=60.862..60.863 rows=1 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  HashAggregate  (cost=2161.60..2163.60 rows=200 width=10) (actual time=60.852..60.854 rows=1 loops=1)
                    Group Key: temptable.pkcol1
                    ->  Seq Scan on temptable  (cost=0.00..2038.08 rows=49408 width=10) (actual time=1.325..56.671 rows=19001 loops=1)
Planning Time: 0.370 ms
JIT:
  Functions: 15
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 3.891 ms, Inlining 110.512 ms, Optimization 108.393 ms, Emission 94.061 ms, Total 316.856 ms
Execution Time: 75244.253 ms

As you might guess, it takes forever. Why is it not using the index? The seq scan is quite slow.

I tried a VACUUM FULL ANALYZE before, no change to the planner.

Should I create a secondary index just on pkcol1? I thought it'd use the PK index since it's the first column.

--

Re: DELETE not seeming to use the PK index..

From
Tom Lane
Date:
Wells Oliver <wells.oliver@gmail.com> writes:
> As you might guess, it takes forever. Why is it not using the index?

Because it thinks it's going to have to delete half of the table
(36M / 73M rows).  That's pretty obviously coming from a fallback default
selectivity estimate, which makes one wonder if your statistics for the
tables are up to date, or indeed exist at all.

> I tried a VACUUM FULL ANALYZE before, no change to the planner.

Hmph.  That should have updated things, but the numbers for temptable
at least are obviously not coming from any actual statistics.  (The
"200" for number of distinct values is another telltale default.)
If temptable actually is a temp table, was it included in the VACUUM
or did you create it after?

Are there any non-built-in datatypes or operators involved here?
Perhaps you've enabled RLS and it's preventing access to the stats?

> Should I create a secondary index just on pkcol1? I thought it'd use the PK
> index since it's the first column.

This is not a lack-of-index problem, or at least it won't be until
you get rowcount estimates that would encourage the planner to think
that an index would be helpful.  A rule of thumb is that if the
query needs to fetch more than a percent or two of the table, an
index is likely not worth the trouble.

            regards, tom lane