Re: inserting large number of rows was: Re: Increasing - Mailing list pgsql-performance

From Erki Kaldjärv
Subject Re: inserting large number of rows was: Re: Increasing
Date
Msg-id 401F47A9.7070106@webware.ee
Whole thread Raw
In response to inserting large number of rows was: Re: Increasing number of PG connections.  ("scott.marlowe" <scott.marlowe@ihs.com>)
List pgsql-performance
You could do high speed inserts with COPY command:
http://developer.postgresql.org/docs/postgres/sql-copy.html

Check whenether your database adapter/client lib supports it (i guess it
does).

Note that it doesnt help very much if there are fk's/triggers's on the
target table.

Bill Moran wrote:

> I must have missed this post when it was made earlier.  Pardon the
> noise if
> my suggestion has already been made.
>
> Unlike MySQL (and possibly other database servers) PostgreSQL is
> faster when
> inserting inside a transaction.  Depending on the method in which you are
> actually adding the records.
>
> In my own experience (generating a list of INSERT statements from a perl
> script and using psql to execute them) the difference in performance was
> incredibly dramatic when I added a "BEGIN WORK" at the beginning and
> "COMMIT WORK" at the end.
>
> scott.marlowe wrote:
>
>> On Mon, 2 Feb 2004, Qing Zhao wrote:
>>
>>
>>> I am new here. I have a question related to this in some way.
>>>
>>> Our web site needs to upload a large volume of data into Postgres at
>>> a time. The performance deterioates as number of rows becomes
>>> larger.  When it reaches 2500 rows, it never come back to GUI. Since
>>> the tests were run through GUI, my suspision is
>>> that it might be caused by the way the application server talking to
>>> Postgres server, the connections, etc.. What might be the factors
>>> involved here? Does anyone know?
>>
>>
>>
>> Actually, I'm gonna go out on a limb here and assume two things:
>>
>> 1. you've got lotsa fk/pk relationships setup.
>> 2. you're analyzing the table empty before loading it up.
>>
>> What happens in this instance is that the analyze on an empty, or
>> nearly so, table, means that during the inserts, postgresql thinks
>> you have only a few rows.  At first, this is fine, as pgsql will seq
>> scan the  tables to make sure there is a proper key in both.  As the
>> number of rows increases, the planner needs to switch to index scans
>> but doesn't, because it doesn't know that the number of rows is
>> increasing.
>>
>> Fix:  insert a few hundred rows, run analyze, check to see if the
>> explain for inserts is showing index scans or not.  If not, load a
>> few more hundred rows, analyze, rinse, repeat.
>>
>> Also, look for fk/pk mismatches.  I.e. an int4 field pointing to an
>> int8 field.  That's a performance killer, so if the pk/fk types don't
>> match, see if you can change your field types to match and try again.
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>
>



pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?
Next
From: Czuczy Gergely
Date:
Subject: PQexecParams and types