Thread: Restore referencial integrity

Restore referencial integrity

From
Carlos Henrique Reimer
Date:
Hi,
 
We had by mistake dropped the referencial integrety between two huge tables and now I'm facing the following messages when trying to recreate the foreign key again:
 
alter table posicoes_controles add
  CONSTRAINT protocolo FOREIGN KEY (protocolo)
      REFERENCES posicoes (protocolo) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE;
 
ERROR:  insert or update on table "posicoes_controles" violates foreign key constraint "protocolo"
DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
********** Erro **********
ERROR: insert or update on table "posicoes_controles" violates foreign key constraint "protocolo"
SQL state: 23503
Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
As the error message tells, the table "posicoes_controles" has values in column "protocolo" that are not present in column "protocolo" of table "posicoes". This happened because some programs removed rows from table "posicoes" while the referencial integrity was dropped.
 
Now I need to remove all rows from table "posicoes_controles" that has not corresponding row in table "posicoes".
 
As these are huge tables, almost 100GB each, and the server hardware restricted (4GB RAM) I would like a suggestion of which command or commands should be used from the performance perspective. 
 
Column "protocolo" is "posicoes" table primary key but is not in any index colum of table "posicoes_controles".
 
Thank you very much for any help!

--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Re: Restore referencial integrity

From
George H
Date:
On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
<carlos.reimer@opendb.com.br> wrote:
> Hi,
>
> We had by mistake dropped the referencial integrety between two huge tables
> and now I'm facing the following messages when trying to recreate the
> foreign key again:
>
> alter table posicoes_controles add
>   CONSTRAINT protocolo FOREIGN KEY (protocolo)
>       REFERENCES posicoes (protocolo) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE CASCADE;
>
> ERROR:  insert or update on table "posicoes_controles" violates foreign key
> constraint "protocolo"
> DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
> ********** Erro **********
> ERROR: insert or update on table "posicoes_controles" violates foreign key
> constraint "protocolo"
> SQL state: 23503
> Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
> As the error message tells, the table "posicoes_controles" has values in
> column "protocolo" that are not present in column "protocolo" of table
> "posicoes". This happened because some programs removed rows from table
> "posicoes" while the referencial integrity was dropped.
>
> Now I need to remove all rows from table "posicoes_controles" that has not
> corresponding row in table "posicoes".
>
> As these are huge tables, almost 100GB each, and the server
> hardware restricted (4GB RAM) I would like a suggestion of which command
> or commands should be used from the performance perspective.
>
> Column "protocolo" is "posicoes" table primary key but is not in any index
> colum of table "posicoes_controles".
>
> Thank you very much for any help!
> --
> Reimer
> 47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br
>
>

Hi,

I guess you could consider the following strategy: Halt the server or
lock the table or something so no program is allowed to delete any
rows on the affected tables. Run a PL/SQL script that will remove rows
from "posicoes_controles" whose foreign key is not present in table
"posics." Then re-issue the foreign key constraint. Then unlock the
table or whatever it is you have to do get programs to be able to use
the tables again.

I hope this helps somewhat.
--
George H
george.dma@gmail.com

Re: Restore referencial integrity

From
Carlos Henrique Reimer
Date:
Hi,
 
Yes, this is a good suggestion but as the table posicoes_controles has 3.71172e+008 rows it will perform 3.71172e+008 selects against table posicoes to check if the protocolo is in table.
 
I was think something like:
 
explain delete from posicoes_controles where protocolo not in (select protocolo from posicoes);
 
"Seq Scan on posicoes_controles  (cost=9929689.38..1180088620108403.70 rows=189165121 width=6)"
"  Filter: (NOT (subplan))"
"  SubPlan"
"    ->  Materialize  (cost=9929689.38..15217480.18 rows=380245580 width=4)"
"          ->  Seq Scan on posicoes  (cost=0.00..8064108.80 rows=380245580 width=4)"
 
Will this work better that a pl/pgsql as you suggested? Or is there something even betther?

Thank you!
2010/8/30 George H <george.dma@gmail.com>
On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
<carlos.reimer@opendb.com.br> wrote:
> Hi,
>
> We had by mistake dropped the referencial integrety between two huge tables
> and now I'm facing the following messages when trying to recreate the
> foreign key again:
>
> alter table posicoes_controles add
>   CONSTRAINT protocolo FOREIGN KEY (protocolo)
>       REFERENCES posicoes (protocolo) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE CASCADE;
>
> ERROR:  insert or update on table "posicoes_controles" violates foreign key
> constraint "protocolo"
> DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
> ********** Erro **********
> ERROR: insert or update on table "posicoes_controles" violates foreign key
> constraint "protocolo"
> SQL state: 23503
> Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
> As the error message tells, the table "posicoes_controles" has values in
> column "protocolo" that are not present in column "protocolo" of table
> "posicoes". This happened because some programs removed rows from table
> "posicoes" while the referencial integrity was dropped.
>
> Now I need to remove all rows from table "posicoes_controles" that has not
> corresponding row in table "posicoes".
>
> As these are huge tables, almost 100GB each, and the server
> hardware restricted (4GB RAM) I would like a suggestion of which command
> or commands should be used from the performance perspective.
>
> Column "protocolo" is "posicoes" table primary key but is not in any index
> colum of table "posicoes_controles".
>
> Thank you very much for any help!
> --
> Reimer
> 47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br
>
>

Hi,

I guess you could consider the following strategy: Halt the server or
lock the table or something so no program is allowed to delete any
rows on the affected tables. Run a PL/SQL script that will remove rows
from "posicoes_controles" whose foreign key is not present in table
"posics." Then re-issue the foreign key constraint. Then unlock the
table or whatever it is you have to do get programs to be able to use
the tables again.

I hope this helps somewhat.
--
George H
george.dma@gmail.com



--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Re: Restore referencial integrity

From
David Fetter
Date:
On Sun, Aug 29, 2010 at 11:30:57PM -0300, Carlos Henrique Reimer wrote:
> Hi,
>
> We had by mistake dropped the referencial integrety between two huge
> tables

Agora o elefante vai pegar! ;)

> and now I'm facing the following messages when trying to recreate
> the foreign key again:
>
> alter table posicoes_controles add
>   CONSTRAINT protocolo FOREIGN KEY (protocolo)
>       REFERENCES posicoes (protocolo) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE CASCADE;
>
> ERROR:  insert or update on table "posicoes_controles" violates foreign key
> constraint "protocolo"
> DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
> ********** Erro **********
> ERROR: insert or update on table "posicoes_controles" violates foreign key
> constraint "protocolo"
> SQL state: 23503
> Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
> As the error message tells, the table "posicoes_controles" has values in
> column "protocolo" that are not present in column "protocolo" of table
> "posicoes". This happened because some programs removed rows from table
> "posicoes" while the referencial integrity was dropped.
>
> Now I need to remove all rows from table "posicoes_controles" that has not
> corresponding row in table "posicoes".
>
> As these are huge tables, almost 100GB each, and the server
> hardware restricted (4GB RAM) I would like a suggestion of which command
> or commands should be used from the performance perspective.

First, if pescioes_controles doesn't already have an index on
protocolo, create such an index.  You can do something like

CREATE INDEX CONCURRENTLY ON pescioes_controles(protocolo);

After you have finished the indexing, you'll need to schedule some
down time, cut off all other access to the server, and then run
something like the following:

BEGIN;
DELETE FROM pescioes_controles WHERE NOT EXISTS (
    SELECT 1 FROM pesicoes WHERE pesicoes.protocolo = pescioes_controles.protocolo
);
ALTER TABLE posicoes_controles add
  CONSTRAINT protocolo FOREIGN KEY (protocolo)
      REFERENCES posicoes (protocolo) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE;
COMMIT;

Hope this helps :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Restore referencial integrity

From
Filip Rembiałkowski
Date:
I remember when I handled such situations without downtime, in 24/7 HA
setup, to avoid large transactions - You could try SELECT FROM A LEFT
JOIN B WHERE B.ID IS NULL LIMIT 10 -- and use this as a base for
DELETE statement...

2010/8/30, George H <george.dma@gmail.com>:
> On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
> <carlos.reimer@opendb.com.br> wrote:
>> Hi,
>>
>> We had by mistake dropped the referencial integrety between two huge
>> tables
>> and now I'm facing the following messages when trying to recreate the
>> foreign key again:
>>
>> alter table posicoes_controles add
>>   CONSTRAINT protocolo FOREIGN KEY (protocolo)
>>       REFERENCES posicoes (protocolo) MATCH SIMPLE
>>       ON UPDATE NO ACTION ON DELETE CASCADE;
>>
>> ERROR:  insert or update on table "posicoes_controles" violates foreign
>> key
>> constraint "protocolo"
>> DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
>> ********** Erro **********
>> ERROR: insert or update on table "posicoes_controles" violates foreign key
>> constraint "protocolo"
>> SQL state: 23503
>> Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
>> As the error message tells, the table "posicoes_controles" has values in
>> column "protocolo" that are not present in column "protocolo" of table
>> "posicoes". This happened because some programs removed rows from table
>> "posicoes" while the referencial integrity was dropped.
>>
>> Now I need to remove all rows from table "posicoes_controles" that has not
>> corresponding row in table "posicoes".
>>
>> As these are huge tables, almost 100GB each, and the server
>> hardware restricted (4GB RAM) I would like a suggestion of which command
>> or commands should be used from the performance perspective.
>>
>> Column "protocolo" is "posicoes" table primary key but is not in any index
>> colum of table "posicoes_controles".
>>
>> Thank you very much for any help!
>> --
>> Reimer
>> 47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br
>>
>>
>
> Hi,
>
> I guess you could consider the following strategy: Halt the server or
> lock the table or something so no program is allowed to delete any
> rows on the affected tables. Run a PL/SQL script that will remove rows
> from "posicoes_controles" whose foreign key is not present in table
> "posics." Then re-issue the foreign key constraint. Then unlock the
> table or whatever it is you have to do get programs to be able to use
> the tables again.
>
> I hope this helps somewhat.
> --
> George H
> george.dma@gmail.com
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Wysłane z mojego urządzenia przenośnego

Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

Re: Restore referencial integrity

From
David Fetter
Date:
On Mon, Aug 30, 2010 at 05:04:36PM -0300, Carlos Henrique Reimer wrote:
> Hi
>
> Thank David and Georg for your suggestions.
>
> Yes, there is an index now defined on column protocolo in table
> posicoes_controles.

Legal!

> I've selected two suggested commands to compare which would be more
> performatic and which will run faster:
>
> Option 1)
> explain delete from posicoes_controles where protocolo not in (select
> protocolo from posicoes);
> "Seq Scan on posicoes_controles  (cost=9954587.42..1185225908771206.50
> rows=189513428 width=6)"
> "  Filter: (NOT (subplan))"
> "  SubPlan"
> "    ->  Materialize  (cost=9954587.42..15255636.80 rows=381199038 width=4)"
> "          ->  Seq Scan on posicoes  (cost=0.00..8084329.38 rows=381199038
> width=4)"
>
> Option 2)
> explain delete FROM posicoes_controles WHERE NOT EXISTS (
>    SELECT 1 FROM posicoes WHERE posicoes.protocolo =
> posicoes_controles.protocolo
> );
> "Seq Scan on posicoes_controles  (cost=0.00..9560672015.05 rows=189419047
> width=6)"
> "  Filter: (NOT (subplan))"
> "  SubPlan"
> "    ->  Index Scan using pk_posicoes_protocolo on posicoes
> (cost=0.00..25.19 rows=1 width=0)"
> "          Index Cond: (protocolo = $0)"
> I'm not an explain specialist but I understood the second option will run
> much more faster.

It probably will.  EXISTS returns immediately when it finds the first
row.

> Let me know if I understood the explain for the second option:
> 1) Run a seq scan on posicoes_controles and get the protocolo key to access
> posicoes_protocolo
> 2) For each row accessed in item 1 run an index scan on posicoes to check if
> the key
> is in the table posicoes
> 3) If the parent found is not found on posicoes then remove the row from
> posicoes_controles
>
> Am I thinking correctly?

I believe so.

Cheers,
David (whose pt_BR is pretty w34k)
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Restore referencial integrity

From
Carlos Henrique Reimer
Date:
Hi Filip,
 
This was an excellent suggestion. I've run this join and just 2 minutes later got 1000 records to delete.
 
Will start the deletes for them and then repeat the processing until all are gone. With this approach I did not need to wait for the maintenance window to fix the foreign key.
 
I think this finishes my issue.
 
Thank you all!

2010/8/30 Filip Rembiałkowski <filip.rembialkowski@gmail.com>
I remember when I handled such situations without downtime, in 24/7 HA
setup, to avoid large transactions - You could try SELECT FROM A LEFT
JOIN B WHERE B.ID IS NULL LIMIT 10 -- and use this as a base for
DELETE statement...

2010/8/30, George H <george.dma@gmail.com>:
> On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
> <carlos.reimer@opendb.com.br> wrote:
>> Hi,
>>
>> We had by mistake dropped the referencial integrety between two huge
>> tables
>> and now I'm facing the following messages when trying to recreate the
>> foreign key again:
>>
>> alter table posicoes_controles add
>>   CONSTRAINT protocolo FOREIGN KEY (protocolo)
>>       REFERENCES posicoes (protocolo) MATCH SIMPLE
>>       ON UPDATE NO ACTION ON DELETE CASCADE;
>>
>> ERROR:  insert or update on table "posicoes_controles" violates foreign
>> key
>> constraint "protocolo"
>> DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
>> ********** Erro **********
>> ERROR: insert or update on table "posicoes_controles" violates foreign key
>> constraint "protocolo"
>> SQL state: 23503
>> Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
>> As the error message tells, the table "posicoes_controles" has values in
>> column "protocolo" that are not present in column "protocolo" of table
>> "posicoes". This happened because some programs removed rows from table
>> "posicoes" while the referencial integrity was dropped.
>>
>> Now I need to remove all rows from table "posicoes_controles" that has not
>> corresponding row in table "posicoes".
>>
>> As these are huge tables, almost 100GB each, and the server
>> hardware restricted (4GB RAM) I would like a suggestion of which command
>> or commands should be used from the performance perspective.
>>
>> Column "protocolo" is "posicoes" table primary key but is not in any index
>> colum of table "posicoes_controles".
>>
>> Thank you very much for any help!
>> --
>> Reimer
>> 47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br
>>
>>
>
> Hi,
>
> I guess you could consider the following strategy: Halt the server or
> lock the table or something so no program is allowed to delete any
> rows on the affected tables. Run a PL/SQL script that will remove rows
> from "posicoes_controles" whose foreign key is not present in table
> "posics." Then re-issue the foreign key constraint. Then unlock the
> table or whatever it is you have to do get programs to be able to use
> the tables again.
>
> I hope this helps somewhat.
> --
> George H
> george.dma@gmail.com
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Wysłane z mojego urządzenia przenośnego

Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/



--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br