integrity check and visibility was: COPY equivalent for updates - Mailing list pgsql-sql
From | Ivan Sergio Borgonovo |
---|---|
Subject | integrity check and visibility was: COPY equivalent for updates |
Date | |
Msg-id | 20080716110611.3337db10@dawn.webthatworks.it Whole thread Raw |
In response to | Re: COPY equivalent for updates (Decibel! <decibel@decibel.org>) |
List | pgsql-sql |
On Tue, 15 Jul 2008 22:13:42 -0500 Decibel! <decibel@decibel.org> wrote: > Depending on what you're doing, it might well be fastest to... > BEGIN; > COPY temp_table FROM 'file'; > DELETE FROM real_table WHERE real_table_id IN (SELECT > real_table_id FROM temp_table); > INSERT INTO real_table SELECT * FROM temp_table; > COMMIT; Thank you for making me reconsider this approach. I gave up since I have a lot of ON DELETE CASCADE... but you made me remind that there should be a way to delay the checks at the end of the transaction (I still couldn't find the syntax on the manual). Actually from my understanding an UPDATE in PostgreSQL is a DELETE + INSERT. But wouldn't the subquery DELETE FROM real_table WHERE real_table_id IN (SELECT real_table_id FROM temp_table); take longer to execute than the UPDATE version: update t1 set col1=temp_t1.col1, col2=temp_t1.col2 where t1.id=temp_t1.id; mutatis mutandis? I'd say that pg should be enough smart to take advantage of a unique key on real_table.real_table_id and stop searching as soon as it deleted the first occurrence of an ID found in real_table from temp_table. Is it? Would adding an unique index on temp_table make the process faster or slower? I know that temp_table.real_table_id will be unique so I'm not interested in the integrity check (that anyway should take place later). At first glance the unique index on real_table.real_table_id may be useful, the one on temp_table should not and will actually make the process slower adding index creation time. So the "ON DELETE CASCADE/SET NULL" problem should be solved once I find the syntax to delay checks Can it be solved this way? I'm going to check as soon as I find the syntax on the manual. What about visibility of the "going to be deleted rows"? If another transaction start between the DELETE and the INSERT statement are executed... what is it going to see? What if I don't have just a single INSERT statement since some columns will be moved to another table? Does this make necessary the UPDATE approach in spite of the DELETE/INSERT approach? If I had temp_table split into 2 tables eg. create temp_table( id int, col11 varchar(32), col21 varchar(32) ); create table table1 ( id int primary key, col11 varchar(32) ); create table table2 ( id int references table1(id) on delete cascade, col21 varchar(32) ); I wouldn't be concerned that data in table1 and table2 contains data updated in different times and "not coherent" rather I'd be concerned there is a row in table1 but there isn't in table2. Furthermore... if I delay checks and ON DELETE triggers rows in table2 won't be deleted when I delete rows in table1 unless I do it "manually". At this point I'm more concerned of maintainability over speed. Actually if I follow the UPDATE approach and temp_table is still split across several tables I'll have to write an update for each tableN anyway... so there shouldn't be too much difference between the UPDATE and the DELETE/INSERT approach. What about the effect of a longer transaction compared to a list of updates on speed? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it