Thread: Insert or Replace or \copy (bulkload)
I'm seeing an obstacle in my aim to migrate from mysql to PG mainly from the manner in which PG handles duplicate entries either from primary keys or unique entries. Data is taken from perl DBI into (right now) CSV based files to be used via psql's \copy command to insert into the table. In MySql, I was using mysqlimport --replace which essentially provided the means to load data into the DB, while at the same time, would provide the necessary logic to replace the entire row if there was a duplicate instead of dying. Under PG, I've yet to found something similar to this functionality and searches via google has uncovered that this is one thing which _still_ has not found its way into PG. (anyone knows why? Standards?) Anyway, I found a workaround, but, to me, even though it provides a means to an end, it still looks like it'll end up as a maintenance nightmare each time a table has any additional columns added. Solution is taken from this site: http://www.pointwriter.com/blog/index.php?/archives/6-REPLACE-in-PostgreSQL.html [..snip..] For the sake of this example we'll use a very simple table called 'map' with a two fields: 'key' and 'value'. Not surprisingly, key is the primary key. Let's further assume that every insert into this table should actually be a replace. Then all we need is this simple statement and we're done: CREATE RULE "replace_map" AS ON INSERT TO "map_example" WHERE EXISTS(SELECT 1 FROM map_example WHERE key=NEW.key) DO INSTEAD (UPDATE map_example SET value=NEW.value WHERE key=NEW.key) [...snip...] Populate with some test data ========================== XMMS=> insert into map_example(key,value,value2) values (1,1,1); INSERT 0 1 XMMS=> insert into map_example(key,value,value2) values (2,2,2); INSERT 0 1 XMMS=> select * from map_example ; key | value | value2 -----+-------+-------- 1 | 1 | 1 2 | 2 | 2 (2 rows) Try to insert some duplicate data +================================= XMMS=>insert into map_example(key,value,value2) values (2,20,20); INSERT 0 0 XMMS=> select * from map_example ; key | value | value2 -----+-------+-------- 1 | 1 | 1 2 | 20 | ====>2 <===== NOT Updated (2 rows) Add the new rule once we added the new columns ============================================== XMMS=>CREATE OR REPLACE RULE "replace_map" AS ON INSERT TO "map_example" WHERE EXISTS(SELECT 1 FROM map_example WHERE key=NEW.key) DO INSTEAD (UPDATE map_example SET value=NEW.value,value2=NEW.value2 WHERE key=NEW.key) ; XMMS=> insert into map_example(key,value,value2) values (2,20,20); INSERT 0 0 XMMS=> select * from map_example ; key | value | value2 -----+-------+-------- 1 | 1 | 1 2 | 20 | 20 (2 rows) Can anyone tell me if this won't turn out to be a maintenance nightmare? So, the pertinent question is, is there a better mousetrap available?
On 8/14/07, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > I'm seeing an obstacle in my aim to migrate from mysql to PG mainly from > the manner in which PG handles duplicate entries either from primary > keys or unique entries. > > Data is taken from perl DBI into (right now) CSV based files to be used > via psql's \copy command to insert into the table. > > In MySql, I was using mysqlimport --replace which essentially provided > the means to load data into the DB, while at the same time, would > provide the necessary logic to replace the entire row if there was a > duplicate instead of dying. > > Under PG, I've yet to found something similar to this functionality and > searches via google has uncovered that this is one thing which _still_ > has not found its way into PG. (anyone knows why? Standards?) > > Anyway, I found a workaround, but, to me, even though it provides a > means to an end, it still looks like it'll end up as a maintenance > nightmare each time a table has any additional columns added. > > Solution is taken from this site: > > http://www.pointwriter.com/blog/index.php?/archives/6-REPLACE-in-PostgreSQL.html Example code snipped for brevity > Can anyone tell me if this won't turn out to be a maintenance nightmare? > So, the pertinent question is, is there a better mousetrap available? I don't see why it would be a maintenance nightmare. Looks pretty much like you just create it and go. Once it's in place it should just work. There are other ways to skin this particular cat, but that one seems as good as any.
On Tue, 2007-08-14 at 10:16 -0500, Scott Marlowe wrote: > On 8/14/07, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > > > > In MySql, I was using mysqlimport --replace which essentially provided > > the means to load data into the DB, while at the same time, would > > provide the necessary logic to replace the entire row if there was a > > duplicate instead of dying. > > Anyway, I found a workaround, but, to me, even though it provides a > > means to an end, it still looks like it'll end up as a maintenance > > nightmare each time a table has any additional columns added. > > > > Solution is taken from this site: > > > > http://www.pointwriter.com/blog/index.php?/archives/6-REPLACE-in-PostgreSQL.html > > Example code snipped for brevity > > > Can anyone tell me if this won't turn out to be a maintenance nightmare? > > So, the pertinent question is, is there a better mousetrap available? > > I don't see why it would be a maintenance nightmare. Looks pretty > much like you just create it and go. Once it's in place it should > just work. There are other ways to skin this particular cat, but that > one seems as good as any. That would be true only if I didn't have to (remember to) add a alter the rule each time a new column is added. At the rate of things, it might be quite an often procedure. (unless of course, I script it, which is an idea by itself) Ps : Is it this list's norm to have the OP/sender in the "to" list and mailing list on the "CC" list?
Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes: > Ps : Is it this list's norm to have the OP/sender in the "to" list and > mailing list on the "CC" list? Yes. If you don't like that you can try including a "Reply-To: <list>" header in what you send to the list; or perhaps better, I think there's a way to tell the list bot to not send you copies of messages you're cc'd on. But the old-timers are all accustomed to this behavior. The PG lists are big enough that turnaround time is O(minutes) even when everything is going well, and sometimes it isn't --- so direct cc's to the people involved in a specific thread help keep the conversation moving. Also, some of us find a backup copy sent via the list to be a Good Thing. I for one have a reputation of running spam filters that eat pets and small children ... so if you want to be sure to get through to me, don't forget to cc: the list. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes: >> Ps : Is it this list's norm to have the OP/sender in the "to" list and >> mailing list on the "CC" list? > > Yes. If you don't like that you can try including a "Reply-To: <list>" > header in what you send to the list; or perhaps better, I think there's > a way to tell the list bot to not send you copies of messages you're > cc'd on. Beware that setting Reply-To is actually "incorrect". It will cause people who think they're sending you private email to unexpectedly send their private mail to the list. There's a correct header to set, I think Mail-Followup-To or something like that. But sadly not all mail readers support it. > I for one have a reputation of running spam filters that eat pets and small > children ... so if you want to be sure to get through to me, don't forget to > cc: the list. They eat all my emails, but I'm sure that's intentional :) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 8/15/07, Gregory Stark <stark@enterprisedb.com> wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > I for one have a reputation of running spam filters that eat pets and small > > children ... so if you want to be sure to get through to me, don't forget to > > cc: the list. > > They eat all my emails, but I'm sure that's intentional :) I was wondering why my kids were so quiet...
On Tue, 2007-08-14 at 10:16 -0500, Scott Marlowe wrote: > On 8/14/07, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > > I'm seeing an obstacle in my aim to migrate from mysql to PG mainly from > > the manner in which PG handles duplicate entries either from primary > > keys or unique entries. > > > > Data is taken from perl DBI into (right now) CSV based files to be used > > via psql's \copy command to insert into the table. > > > > In MySql, I was using mysqlimport --replace which essentially provided > > the means to load data into the DB, while at the same time, would > > provide the necessary logic to replace the entire row if there was a > > duplicate instead of dying. > > > > Under PG, I've yet to found something similar to this functionality and > > searches via google has uncovered that this is one thing which _still_ > > has not found its way into PG. (anyone knows why? Standards?) > > > > Anyway, I found a workaround, but, to me, even though it provides a > > means to an end, it still looks like it'll end up as a maintenance > > nightmare each time a table has any additional columns added. > > > > Solution is taken from this site: > > > > http://www.pointwriter.com/blog/index.php?/archives/6-REPLACE-in-PostgreSQL.html > > Example code snipped for brevity > > > Can anyone tell me if this won't turn out to be a maintenance nightmare? > > So, the pertinent question is, is there a better mousetrap available? > > I don't see why it would be a maintenance nightmare. Looks pretty > much like you just create it and go. Once it's in place it should > just work. There are other ways to skin this particular cat, but that > one seems as good as any. I just ran into trouble with this. This rule seems to work when I do simple inserts, but as what I will be doing will be doing \copy bulkloads, it will balk and fail. Is there a suitable/replacement rule which can be used for this purpose which will work bulkload wise? Now would be a good idea to teach me how to skin the cat differently.
On Mon, 2007-08-27 at 11:55 +0800, Ow Mun Heng wrote: > I just ran into trouble with this. This rule seems to work when I do > simple inserts, but as what I will be doing will be doing \copy > bulkloads, it will balk and fail. > Now would be a good idea to teach me how to skin the cat differently. Base on the arhives, seems like this was raised as BUG 2437 close to a year ago and it was marked as "correct behaviour" and the suggested alternative is to either 1. Use pgloader (DL'ed it, but have not gone indepth to see if it suits my needs) 2. COPY into temp table, and then perform an insert into. =>create table foo (a int, b int, c int, d varchar(1) e bigint); =>alter table foo add primary key (a,b,c,e); =>insert into foo values(1,1,1,'A',1); =>create rule replace_foo as on insert to foo where (exists(select 1 from foo where foo.a = new.a and foo.b = new.b and foo.c = new.c and foo.e = new.e)) do instead update foo set d = new.d where foo.a = new.a and foo.b = new.b and foo.c = new.c and foo.e = new.e => \d foo; Table "xmms.foo" Column | Type | Modifiers --------+----------------------+----------- a | integer | not null b | integer | not null c | integer | not null d | character varying(1) | e | bigint | not null Indexes: "foo_pkey" PRIMARY KEY, btree (a, b, c, e) Rules: replace_foo AS ON INSERT TO foo WHERE (EXISTS ( SELECT 1 FROM foo WHERE foo.a = new.a AND foo.b = new.b AND foo.c = new.c AND foo.e = new.e)) DO INSTEAD UPDATE foo SET d = new.d WHERE foo.a = new.a AND foo.b = new.b AND foo.c = new.c AND foo.e = new.e =>create table temp_table_foo as select * from foo where 1=0; => select * from foo; a | b | c | d | e ---+---+---+---+--- 1 | 1 | 1 | A | 1 =>insert into temp_table_foo values(1,1,1,'Z',1); => select * from foo; a | b | c | d | e ---+---+---+---+--- 1 | 1 | 1 | Z | 1 => truncate table foo Seems like an awful lot of step and may even make the \copy process even longer than it should.. Is there any other method? Seems to me pgloader also does more or less the same thing.. What about pgbulkload? Anyone can provide any new insights? Thanks... http://archives.postgresql.org/pgsql-bugs/2006-05/msg00073.php