Re: COPY FROM STDIN instead of INSERT - Mailing list pgsql-general

From Casey Duncan
Subject Re: COPY FROM STDIN instead of INSERT
Date
Msg-id 86D1B80D-CDAA-4F0A-BC71-AFA4A8C07629@pandora.com
Whole thread Raw
In response to Re: COPY FROM STDIN instead of INSERT  ("Ilja Golshtein" <ilejn@yandex.ru>)
Responses Re: COPY FROM STDIN instead of INSERT  ("Ilja Golshtein" <ilejn@yandex.ru>)
List pgsql-general
On Oct 18, 2006, at 5:20 AM, Ilja Golshtein wrote:

>> When starting a database from scratch it is much faster to import the
>> data and then create the indexes. The time to create index on a full
>> table is less than the extra time from each index update from the
>> inserts. The more indexes to update the more time updating indexes
>> takes.
>>
>> The problem with a live database is removing the indexes slows down
>> current users and if you are adding 2,000 rows to a table that
>> already
>> has 5,000,000 rows in it then you will loose the benefit.
>
> I am 100% agree with you. What you are describing is a very good
> and useful technique for some maintenance operations.
>
> My current goal is to increase performance in normal [almost ;)]
> OLTP mode of my application, so removing indexes for some time is
> not an option here.
>
> And my question remains.
> Is it Ok to use COPY BINARY FROM STDIN instead of multiple INSERTs?

I don't think I would use BINARY, it seems likely to be susceptible
to changes in the underlying data type storage. From the docs:

"To determine the appropriate binary format for the actual tuple data
you should consult the PostgreSQL source, in particular the *send and
*recv functions for each column's data type (typically these
functions are found in the src/backend/utils/adt/ directory of the
source distribution)."

Regular text COPY is quite a big performance win over INSERTs, and
doesn't require your application to know the vagaries of the data
storage. Also, if you have many indices, time to update them will
probably dominate anyhow, making the difference between binary and
text copy negligible.

A was mentioned, COPY can only insert static data, and does not
support rules (that you might use to support constraint exclusion,
etc). AFIAK, the locking semantics are the same as INSERT, i.e., it
does not lock the entire table or anything like that. Since it would
allow transactions that insert data to finish faster, it should
actually work better under high concurrency.

-Casey

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Stats Collector Won't Start
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] query log corrupted-looking entries