Thread: Slow delete

Slow delete

From
Doug Hall
Date:
Sorry. I realize this is a rather newbie question, but I've got a slow
delete going on here, and I could use some help figuring out why. This
is the classic "get rid of orphans" select.

delete from citizen where id not in (select citizenid from
citizen_stage);

citizen.id and citizen_stage.citizenid are both indexed.

The explain select tells me that there is a sequential select of
citizen_stage records. (??) There are 75009 citizen records and 14778
records, and it's taking more than half an hour. How can I speed this
up?

Thanks,
Doug


Re: Slow delete

From
Tom Lane
Date:
Doug Hall <doughalldev@gmail.com> writes:
> delete from citizen where id not in (select citizenid from
> citizen_stage);

> The explain select tells me that there is a sequential select of
> citizen_stage records. (??) There are 75009 citizen records and 14778
> records, and it's taking more than half an hour. How can I speed this
> up?

How old is your Postgres?  I'd expect 7.4 and up to do this with a hashed
IN, which'd be reasonably fast.  If the EXPLAIN output doesn't say
anything about a "hashed subplan", then either you've got an old version
or there's some sort of estimation problem.

If it is a hashed IN and it's still slow, I'd wonder about unindexed
foreign key references to the citizen table.

            regards, tom lane

Re: Slow delete

From
Doug Hall
Date:
On Jul 13, 2005, at 12:46 PM, Tom Lane wrote:

> Doug Hall <doughalldev@gmail.com> writes:
>> delete from citizen where id not in (select citizenid from
>> citizen_stage);
>
>> The explain select tells me that there is a sequential select of
>> citizen_stage records. (??) There are 75009 citizen records and 14778
>> records, and it's taking more than half an hour. How can I speed this
>> up?
>
> How old is your Postgres?  I'd expect 7.4 and up to do this with a
> hashed
> IN, which'd be reasonably fast.

My boss is using 8.0.0 beta4! (Yikes) I'll upgrade him just to make
sure.


> If the EXPLAIN output doesn't say
> anything about a "hashed subplan", then either you've got an old
> version
> or there's some sort of estimation problem.

No, the EXPLAIN doesn't mention "hashed subplan". I suspect it was a
bug in the beta.

> If it is a hashed IN and it's still slow, I'd wonder about unindexed
> foreign key references to the citizen table.
>

The foreign key is indexed without specifying the method, so it's
B-tree by default.

Does PostgreSQL automatically create a hashed index for primary keys?
If not, then we need to drop the index and create it using...

CREATE INDEX name ON table USING HASH (column);

However, the documentation says:

Note: Testing has shown PostgreSQL's hash indexes to perform no better
than B-tree indexes, and the index size and build time for hash indexes
is much worse. For these reasons, hash index use is presently
discouraged.

So, why have hashed indexes?

Thanks,
Doug


Re: Slow delete

From
Tom Lane
Date:
Doug Hall <doughalldev@gmail.com> writes:
>> If the EXPLAIN output doesn't say
>> anything about a "hashed subplan", then either you've got an old
>> version or there's some sort of estimation problem.

> No, the EXPLAIN doesn't mention "hashed subplan". I suspect it was a
> bug in the beta.

You might need to raise work_mem to get it to use a hash.  This is not
related to hash indexes at all --- we're talking about a hash table
built in-memory from the results of the sub-select.

> So, why have hashed indexes?

Certain people keep imagining that they'll someday outperform b-trees
for certain situations.  I have my doubts --- the theoretical advantage
is real enough, but it's difficult to translate that into practical
reality when only one set of code is getting significant maintenance
attention.  In any case, that's not related to your problem.

I'm still a bit concerned about whether the issue isn't foreign key
related rather than a planning problem per se.  Try a non-delete
operation, say
    select count(*) from citizen where id not in (select citizenid
    from citizen_stage);
and see if that comes back in a reasonable amount of time or not.

            regards, tom lane