I have several really big tables that have columns uniquely identified by
single or multiple rows. [ I have about 25 tables, 10k to 500k rows
per table ]
Each day I get a flat file of updates. I have no way of knowing which
lines in the file are new records and which are updates for existing
records.
I need a way to insert the new ones and update the old ones. I have
a couple of ideas but none of them seem fast enough ( I will soon
be getting updates faster than I can feed them into the database ).
I am running postgres 7.1.3-1PDGD.
Here are my ideas...
---------------------------------------------------------
1)
select count(*) from table where id = X;
(if the count == 1)
update table set a = 1, b = 2, c = 3 where id = X;
2)
insert into table (id, a, b, c) values (X, 1, 2, 3);
(if there is an exception b/c of duplicate keys)
update table set a = 1, b = 2, c = 3 where id = X;
3)
delete from table where id = X;
(if the record is new nothing happens)
insert into table (id, a, b, c) values (X, 1, 2, 3);
----------------------------------------------------------
right now I am doing (3) but it is very slow - I
am getting about 8 records/second through and
yes my id column is indexed. The other problem
with (3) is it requires more vacuuming.
I cant imagine that (1) would be any faster.
It would seem that (2) would be the fastest but some
of my tables key on multiple columns
(where id1 = X and id2 = Y) and thus cant use a primary
key to raise an exception.
I cant imagine I am the first person to try this...
What's the standard/best/fastest way to do this.
Thanks in advance for any advice!
Orion