Thread: What's the fastest way to do this?
I have several really big tables that have columns uniquely identified by single or multiple rows. [ I have about 25 tables, 10k to 500k rows per table ] Each day I get a flat file of updates. I have no way of knowing which lines in the file are new records and which are updates for existing records. I need a way to insert the new ones and update the old ones. I have a couple of ideas but none of them seem fast enough ( I will soon be getting updates faster than I can feed them into the database ). I am running postgres 7.1.3-1PDGD. Here are my ideas... --------------------------------------------------------- 1) select count(*) from table where id = X; (if the count == 1) update table set a = 1, b = 2, c = 3 where id = X; 2) insert into table (id, a, b, c) values (X, 1, 2, 3); (if there is an exception b/c of duplicate keys) update table set a = 1, b = 2, c = 3 where id = X; 3) delete from table where id = X; (if the record is new nothing happens) insert into table (id, a, b, c) values (X, 1, 2, 3); ---------------------------------------------------------- right now I am doing (3) but it is very slow - I am getting about 8 records/second through and yes my id column is indexed. The other problem with (3) is it requires more vacuuming. I cant imagine that (1) would be any faster. It would seem that (2) would be the fastest but some of my tables key on multiple columns (where id1 = X and id2 = Y) and thus cant use a primary key to raise an exception. I cant imagine I am the first person to try this... What's the standard/best/fastest way to do this. Thanks in advance for any advice! Orion
Are you updating directly from the flat file? Or are you copying the new data into an indexed table that is emptied and vacuumed after daily updates, and updating/inserting from there? Andrew Gould --- Orion <o2@trustcommerce.com> wrote: > > I have several really big tables that have columns > uniquely identified by > single or multiple rows. [ I have about 25 tables, > 10k to 500k rows > per table ] > > Each day I get a flat file of updates. I have no > way of knowing which > lines in the file are new records and which are > updates for existing > records. > > I need a way to insert the new ones and update the > old ones. I have > a couple of ideas but none of them seem fast enough > ( I will soon > be getting updates faster than I can feed them into > the database ). > > I am running postgres 7.1.3-1PDGD. > > Here are my ideas... > > --------------------------------------------------------- > > 1) > > select count(*) from table where id = X; > (if the count == 1) > update table set a = 1, b = 2, c = 3 where id = X; > > 2) > > insert into table (id, a, b, c) values (X, 1, 2, 3); > (if there is an exception b/c of duplicate keys) > update table set a = 1, b = 2, c = 3 where id = X; > > 3) > > delete from table where id = X; > (if the record is new nothing happens) > insert into table (id, a, b, c) values (X, 1, 2, 3); > > ---------------------------------------------------------- > > right now I am doing (3) but it is very slow - I > am getting about 8 records/second through and > yes my id column is indexed. The other problem > with (3) is it requires more vacuuming. > > I cant imagine that (1) would be any faster. > > It would seem that (2) would be the fastest but some > of my tables key on multiple columns > (where id1 = X and id2 = Y) and thus cant use a > primary > key to raise an exception. > > I cant imagine I am the first person to try this... > What's the standard/best/fastest way to do this. > > Thanks in advance for any advice! > > Orion > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com
Andrew Gould wrote: > Are you updating directly from the flat file? Or are > you copying the new data into an indexed table that is > emptied and vacuumed after daily updates, and > updating/inserting from there? > > Andrew Gould Well to be honest I'll do it any way I can to make this go faster. Right now I am updating directly from said file. For each line I read from the file I need to insert or update the exsisting table depending on weather or not the record is already there. Orion
On Thu, Nov 08, 2001 at 11:58:49AM -0800, Orion wrote: > > I have several really big tables that have columns uniquely identified by > single or multiple rows. [ I have about 25 tables, 10k to 500k rows > per table ] > > Each day I get a flat file of updates. I have no way of knowing which > lines in the file are new records and which are updates for existing > records. > > I need a way to insert the new ones and update the old ones. I have > a couple of ideas but none of them seem fast enough ( I will soon > be getting updates faster than I can feed them into the database ). > Hello I was facing a similar problem some time ago. My solution was to create a temp table and COPY the new data to it. After that I deleted all records in the original table which existed in the temporary table. Then I just did a insert from a select * from the temp table. Of course with this approach you have to lock the tables. - Einar Karttunen
Einar Karttunen writes: > > I have several really big tables that have columns uniquely identified by > > single or multiple rows. [ I have about 25 tables, 10k to 500k rows > > per table ] > > > > Each day I get a flat file of updates. I have no way of knowing which > > lines in the file are new records and which are updates for existing > > records. > > > > I need a way to insert the new ones and update the old ones. I have > > a couple of ideas but none of them seem fast enough ( I will soon > > be getting updates faster than I can feed them into the database ). > > I was facing a similar problem some time ago. My solution was to create > a temp table and COPY the new data to it. After that I deleted all records > in the original table which existed in the temporary table. Then I just > did a insert from a select * from the temp table. Of course with this > approach you have to lock the tables. > > - Einar Karttunen Even faster is to: 1. load the data into a temporary table (even faster is to load into a permanent table -- just truncate it first in each run). Let's call this table ImportTable. 2. update the existing records UPDATE rt SET a = t.a, b = x.b, c = x.c FROM RealTable AS rt, ImportTable AS x WHERE rt.pk = x.pk 3. insert the new records INSERT INTO RealTable(pk, a, b, c) SELECT pk, a, b, c FROM ImportTable AS x WHERE NOT EXISTS (SELECT 1 FROM RealTable WHERE pk = x.pk) 'pk' is the primary key of the table, or, at worst, a UNIQUEly INDEXed column combination. This avoids the expensive DELETE operation (DBMSs are generally better at INSERTing and UPDATEing than DELETEing, and remember that the DELETEs have to go through the transaction log). Don't use cursors if you can help it -- cursors can be up to several orders of magnitude slower, and usually at least 4 times slower. Using an import table allows you to sanitize the data by insert a step to do this between steps 1 and 2. Cheers, Colin
I tried your idea but I cant get it to work. Here's the SQL I used: CREATE TEMP table mfps_action_codes_394_tmp ( code text, description text); INSERT into mfps_action_codes_394_tmp (code,description) VALUES ('$','Dun Notice Printed'); INSERT into mfps_action_codes_394_tmp (code,description) VALUES ('&','Continuity Speedup'); INSERT into mfps_action_codes_394_tmp (code,description) VALUES ('*','Expiring CC Notification-Conty'); INSERT into mfps_action_codes_394_tmp (code,description) VALUES ('0','Return Authorization'); UPDATE mfps_action_codes_394 SET description = x.description FROM mfps_action_codes_394 AS rt, mfps_action_codes_394_tmp AS x WHERE rt.code = x.code; INSERT INTO mfps_action_codes_394 (code,description) SELECT code,description FROM mfps_action_codes_394_tmp AS x WHERE NOT EXISTS (SELECT 1 FROM mfps_action_codes_394 WHERE code = x.code); What ends up happening is that the UPDATE will change EVERY exsisting description to 'Dun Notice Printed' I can't find any documentation as to how to use the FROM keyword on the UPDATE command lest I'm sure I'd be able to figure this out myself. > Even faster is to: > > 1. load the data into a temporary table (even faster is to load into a > permanent > table -- just truncate it first in each run). Let's call this table > ImportTable. > > 2. update the existing records > UPDATE rt > SET a = t.a, b = x.b, c = x.c > FROM RealTable AS rt, ImportTable AS x > WHERE rt.pk = x.pk > > 3. insert the new records > INSERT INTO RealTable(pk, a, b, c) > SELECT pk, a, b, c > FROM ImportTable AS x > WHERE NOT EXISTS (SELECT 1 FROM RealTable WHERE pk = x.pk) > > > 'pk' is the primary key of the table, or, at worst, a UNIQUEly INDEXed > column > combination. > > This avoids the expensive DELETE operation (DBMSs are generally better at > INSERTing and UPDATEing than DELETEing, and remember that the DELETEs have > to > go through the transaction log). > > Don't use cursors if you can help it -- cursors can be up to several > orders of > magnitude slower, and usually at least 4 times slower. > > Using an import table allows you to sanitize the data by insert a step to > do this between steps 1 and 2. > > > Cheers, > > Colin
Ok, I've boiled this down to an easy to run test... DROP TABLE test; CREATE TABLE test ( code int4, description text); INSERT INTO test (code,description) VALUES (1,'OLD VALUE 1'); INSERT INTO test (code,description) VALUES (2,'OLD VALUE 2'); CREATE TEMP TABLE test_tmp ( code int4, description text); INSERT INTO test_tmp (code,description) VALUES (1,'NEW VALUE 1'); INSERT INTO test_tmp (code,description) VALUES (2,'NEW VALUE 2'); INSERT INTO test_tmp (code,description) VALUES (3,'NEW ENTRY 1'); INSERT INTO test_tmp (code,description) VALUES (4,'NEW ENTRY 2'); UPDATE test SET description = x.description FROM test AS rt, test_tmp AS x WHERE rt.code = x.code; INSERT INTO test (code,description) SELECT code,description FROM test_tmp AS x WHERE NOT EXISTS (SELECT 1 FROM test WHERE code = x.code); SELECT * FROM test; And here's the results I get from that test DROP CREATE INSERT 1894322 1 INSERT 1894323 1 CREATE INSERT 1894350 1 INSERT 1894351 1 INSERT 1894352 1 INSERT 1894353 1 UPDATE 2 INSERT 0 2 code | description ------+------------- 1 | NEW VALUE 1 2 | NEW VALUE 1 3 | NEW ENTRY 1 4 | NEW ENTRY 2 (4 rows) And as far as I understand it I should be getting the following code | description ------+------------- 1 | NEW VALUE 1 2 | NEW VALUE 2 3 | NEW ENTRY 1 4 | NEW ENTRY 2 For some reason the update is updating ALL the old records to be 'NEW VALUE 1'. So (A) is this a bug or (B) do I totally misunderstand that UPDATE statement above. I appears that it should update each row in table 'test' with the description corresponding to the code in test_tmp. If this is not the proper way to do this, what is?
On Mon, 12 Nov 2001, Orion wrote: > UPDATE test > SET description = x.description > FROM test AS rt, test_tmp AS x > WHERE rt.code = x.code; > So (A) is this a bug or (B) do I totally misunderstand that UPDATE statement > above. I appears that it should update each row in table 'test' with the > description corresponding to the code in test_tmp. If this is not the > proper way to do this, what is? The above update isn't doing what you're expecting. You're updating test, you don't need the test as rt in the from unless you are attempting a self-join. Try: update test set description = x.description from test_tmp as x where test.code=x.code;
Orion <o2@trustcommerce.com> writes: > UPDATE test > SET description = x.description > FROM test AS rt, test_tmp AS x > WHERE rt.code = x.code; This command is a three-way join between test AS test (the target), test AS rt, and test_tmp AS x. Almost certainly not what you want, especially considering that the target table is not constrained at all by the WHERE clause. regards, tom lane
On Mon, 12 Nov 2001 11:58:27 -0800, Orion <o2@trustcommerce.com> wrote: >[...] >UPDATE test > SET description = x.description > FROM test AS rt, test_tmp AS x > WHERE rt.code = x.code; This seems to work: UPDATE test SET description = x.description FROM test_tmp x WHERE test.code = x.code; Kind regards Carl van Tast