Thread: copy table from file: with row replacement?

copy table from file: with row replacement?

From
Michael Enke
Date:
Hello all,
I have a feature request as I think it is not possible with the actual version:

I want to load huge amount of data and I know that COPY is much faster than doing inserts.
But in my case I have an already filled table and rows (not all, only partly) from this table
should be replaced. The table has a primary key for one column.
If I do a COPY table FROM file and the key value already exists, postgresql tells me
that the import is not possible because of the violation of the PK.

If postgres is aware of such a violation, couldn't there be an option to the COPY command
to delete such existing rows so that a COPY table FROM file will never generate a PK violation message
but replaces existing rows?

If this is not possible, would it be the next fastes solution to create a before trigger and to
delete rows in this trigger? Or is this not different from issuing for every line an insert
and if this fails (because of the PK) than an update?

Thank you,
Michael

PS: Please CC to my email


Re: copy table from file: with row replacement?

From
Bruce Momjian
Date:
Michael Enke wrote:
> Hello all,
> I have a feature request as I think it is not possible with the actual version:
> 
> I want to load huge amount of data and I know that COPY is much faster than doing inserts.
> But in my case I have an already filled table and rows (not all, only partly) from this table
> should be replaced. The table has a primary key for one column.
> If I do a COPY table FROM file and the key value already exists, postgresql tells me
> that the import is not possible because of the violation of the PK.
> 
> If postgres is aware of such a violation, couldn't there be an option to the COPY command
> to delete such existing rows so that a COPY table FROM file will never generate a PK violation message
> but replaces existing rows?
> 
> If this is not possible, would it be the next fastes solution to create a before trigger and to
> delete rows in this trigger? Or is this not different from issuing for every line an insert
> and if this fails (because of the PK) than an update?

I would just COPY into another table, remove any duplicates by joining
the two tables, and then do a INSERT INTO ... SELECT.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: copy table from file: with row replacement?

From
Michael Enke
Date:
This works for small amount of data. But for large amount of data
the join takes a lot of time.

Regards,
Michael

Bruce Momjian wrote:
> Michael Enke wrote:
> 
>>Hello all,
>>I have a feature request as I think it is not possible with the actual version:
>>
>>I want to load huge amount of data and I know that COPY is much faster than doing inserts.
>>But in my case I have an already filled table and rows (not all, only partly) from this table
>>should be replaced. The table has a primary key for one column.
>>If I do a COPY table FROM file and the key value already exists, postgresql tells me
>>that the import is not possible because of the violation of the PK.
>>
>>If postgres is aware of such a violation, couldn't there be an option to the COPY command
>>to delete such existing rows so that a COPY table FROM file will never generate a PK violation message
>>but replaces existing rows?
>>
>>If this is not possible, would it be the next fastes solution to create a before trigger and to
>>delete rows in this trigger? Or is this not different from issuing for every line an insert
>>and if this fails (because of the PK) than an update?
> 
> 
> I would just COPY into another table, remove any duplicates by joining
> the two tables, and then do a INSERT INTO ... SELECT.
> 


Re: copy table from file: with row replacement?

From
"Florian G. Pflug"
Date:
Michael Enke wrote:
> This works for small amount of data. But for large amount of data
> the join takes a lot of time.

It certainly is faster then anly algorithm that checks for duplicates
for each lines of copy input could ever be. Especially for joins, doing
them in one large batch allows you to use better algorithms then looping
over one table, and searching for matching rows in the other - which is
exactly what copy would need to do if it had an "replace on duplicate"
flag.

I think the fastest way to join two large tables would be a mergejoin.
Try doing an "explain select" (or "explain delete") to see what algorithm
postgresc chooses. Check if you actually declared your primary key
in both tables - it might help postgres to know that the column you're joining
in is unique. Also check your work_mem setting - if this is set too low,
it often forces postgres to use inferior plans becaues it tries to save memory.

greetings, Florian Pflug