Re: Bulk Load Ignore/Skip Feature - Mailing list pgsql-general

From Dimitri Fontaine
Subject Re: Bulk Load Ignore/Skip Feature
Date
Msg-id 200711162336.14475.dfontaine@hi-media.com
Whole thread Raw
In response to Re: Bulk Load Ignore/Skip Feature  (Willem Buitendyk <willem@pcfish.ca>)
List pgsql-general
Hi all,

Le Friday 16 November 2007 18:04:44 Willem Buitendyk, vous avez écrit :
> Martijn van Oosterhout wrote:
> > On Thu, Nov 15, 2007 at 08:09:46PM -0800, Willem Buitendyk wrote:
> >> Damn - so the unqiue contraint is still an issue.  What gives?  Why is
> >> it so hard to implement this in Postgresql?  sigh - if only I had more
> >> time.
> >
> > Can you explain? The server ofcourse still generates error messages in
> > the logs, there's no way around that. However it looks to me that the
> > data ended up in the database correctly? Or did I miss something?

pgloader will load non-conflicting data and produce both a reject log with
errors about non inserted (COPYied) data and a reject data file with the
input line ready to be processed again if such is the operator choice.

But at the moment it does not provide any way to automate the UPDATE the PK
conflicting rows. I'm really hesitant as to code this option: what to do in
the case of a non primary key unique constraint conflict:

dim=# create table unic(a integer unique);
dim=# insert into unic values (1);
INSERT 2312559 1
dim=# insert into unic values (1);
ERROR:  duplicate key violates unique constraint "unic_a_key"

dim=# create table pk(a integer primary key);
dim=# insert into pk values (1);
INSERT 2312565 1
dim=# insert into pk values (1);
ERROR:  duplicate key violates unique constraint "pk_pkey"

I'm thinking maybe in the first case you don't want existing values to be
overwritten, but in the second case it's what you want to happen. Should this
be the user responsibility to make the difference --- by configuring pgloader
properly --- or should the tool try hard to protect the user against himself?

How to act on a table with a surrogate pk and a unique constraint when you
want to automatically update surrogate key but not the unique data, or the
other way around?

So I have two questions for the community:
 - should I provide a pgloader mailing list?
 - what do you think about adding the UPDATE-on-duplicate-key-error option?

> My apologies.  I misinterpreted that last post.  I have not been able to
> try pgloader as I am using the windows platform.

pgloader is a python "script" which depends on psycopg for handling the
PostgreSQL connection, and only standard python modules after that. The
following link provides windows binaries for psycopg.
  http://www.stickpeople.com/projects/python/win-psycopg/

I've gotten reports of pgloader running on windows, even if I didn't make any
specific effort for this to happen and I don't have any proprietary licenced
OS to test pgloader on.

Hope this helps,
--
dim

pgsql-general by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: view management
Next
From: Tom Hart
Date:
Subject: Re: convert access sql to postgresql