Re: Deleting, indexes and transactions - Mailing list pgsql-general
From | Stevo Slavić |
---|---|
Subject | Re: Deleting, indexes and transactions |
Date | |
Msg-id | CAAUywg9LSZAc1O6O7Kr9Z7LJHQtd++5S8qF4YwNrEExdOWs53A@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting, indexes and transactions (Adrian Klaver <adrian.klaver@gmail.com>) |
Responses |
Re: Deleting, indexes and transactions
|
List | pgsql-general |
Hello Adrian,
Thanks for replying!
I guess in this case, referential action, from your quote, on deleting As is check that there are no Bs referencing to-be-deleted A row. But since all Bs are deleted (not committed yet though) prior to deleting As, I don't understand why is this check taking that long time. Doesn't this transaction, that both deleting Bs and As belong to, have enough "knowledge" if not to skip this check then to at least have it executed faster? It seems, in case without index, that postgres is executing this referential integrity check sequentially over B data as if they were not deleted, it just skips raising error because it ultimately after long time finds B records are about to be deleted. It would be faster if postgres had a structure/info on transaction level which would allow it to execute following (sequential) queries/checks only over rows which haven't been marked for deletion - I guess that would add complexity. With index I guess postgres does same logic just uses index to lookup Bs referencing to-be-deleted A much faster, and then determines Bs have been marked for deletion and doesn't raise error. I wonder how other RDBMS behave in this case.
Anyway, regarding your second question, cascade delete hasn't been applied or tried yet. Case I've initially explained is one subcase of actual case that needs to be supported which is to sync As with an external source, which unfortunatelly doesn't provide info whether Bs have been changed or not for given A. So, there are two subcases, one where almost all data is dropped (As and Bs) and replaced with new, while in other subcase just some As data gets added while some As are deleted. In either case, we need to drop all Bs and add them because of lack of information of changes in Bs. Will check and see how that performs for both scenarios.
Kind regards,
Stevo.
Thanks for replying!
I guess in this case, referential action, from your quote, on deleting As is check that there are no Bs referencing to-be-deleted A row. But since all Bs are deleted (not committed yet though) prior to deleting As, I don't understand why is this check taking that long time. Doesn't this transaction, that both deleting Bs and As belong to, have enough "knowledge" if not to skip this check then to at least have it executed faster? It seems, in case without index, that postgres is executing this referential integrity check sequentially over B data as if they were not deleted, it just skips raising error because it ultimately after long time finds B records are about to be deleted. It would be faster if postgres had a structure/info on transaction level which would allow it to execute following (sequential) queries/checks only over rows which haven't been marked for deletion - I guess that would add complexity. With index I guess postgres does same logic just uses index to lookup Bs referencing to-be-deleted A much faster, and then determines Bs have been marked for deletion and doesn't raise error. I wonder how other RDBMS behave in this case.
Anyway, regarding your second question, cascade delete hasn't been applied or tried yet. Case I've initially explained is one subcase of actual case that needs to be supported which is to sync As with an external source, which unfortunatelly doesn't provide info whether Bs have been changed or not for given A. So, there are two subcases, one where almost all data is dropped (As and Bs) and replaced with new, while in other subcase just some As data gets added while some As are deleted. In either case, we need to drop all Bs and add them because of lack of information of changes in Bs. Will check and see how that performs for both scenarios.
Kind regards,
Stevo.
On Mon, May 28, 2012 at 4:40 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
On 05/28/2012 07:23 AM, Stevo Slavić wrote:It is documented behavior:Hello PostgreSQL community,
Two tables, A and B, both with auto generated technical PK, A and B are
in relationship via nullable non-unique FK a_fk column in B to A's PK.
There are no other relationships involving table A. Lets say A has ~20k
rows, and B ~500k rows.
When there is no index on a_fk column, if one deletes Bs with DELETE
FROM b WHERE a_fk IS NOT NULL, and then in same transaction also deletes
all As - deleting As lasts painfully long.
Adding an index on FK in B, improves A deletion times significantly.
Can someone please provide an explanation/rationale of this behavior,
why does it take so long to delete As in first case without index?
Thanks in advance!
http://www.postgresql.org/docs/9.0/interactive/sql-createtable.html
"If the referenced column(s) are changed frequently, it might be wise to add an index to the foreign key column so that referential actions associated with the foreign key column can be performed more efficiently."
Though in your case would it not be worth it to just have an ON DELETE CASCADE clause on your FK?--
Btw, I'm using PostgreSQL 9.0. Will try how 9.1 behaves.
Kind regards,
Stevo.
Adrian Klaver
adrian.klaver@gmail.com
pgsql-general by date: