Re: inserting large number of rows was: Re: Increasing - Mailing list pgsql-performance

From Bill Moran
Subject Re: inserting large number of rows was: Re: Increasing
Date
Msg-id 401EC2FE.3000500@potentialtech.com
Whole thread Raw
In response to inserting large number of rows was: Re: Increasing number of PG connections.  ("scott.marlowe" <scott.marlowe@ihs.com>)
List pgsql-performance
I must have missed this post when it was made earlier.  Pardon the noise if
my suggestion has already been made.

Unlike MySQL (and possibly other database servers) PostgreSQL is faster when
inserting inside a transaction.  Depending on the method in which you are
actually adding the records.

In my own experience (generating a list of INSERT statements from a perl
script and using psql to execute them) the difference in performance was
incredibly dramatic when I added a "BEGIN WORK" at the beginning and
"COMMIT WORK" at the end.

scott.marlowe wrote:
> On Mon, 2 Feb 2004, Qing Zhao wrote:
>
>
>>I am new here. I have a question related to this in some way.
>>
>>Our web site needs to upload a large volume of data into Postgres at a
>>time. The performance deterioates as number of rows becomes larger.
>>When it reaches 2500 rows, it never come back to GUI. Since the tests
>>were run through GUI, my suspision is
>>that it might be caused by the way the application server talking to
>>Postgres server, the connections, etc.. What might be the factors
>>involved here? Does anyone know?
>
>
> Actually, I'm gonna go out on a limb here and assume two things:
>
> 1. you've got lotsa fk/pk relationships setup.
> 2. you're analyzing the table empty before loading it up.
>
> What happens in this instance is that the analyze on an empty, or nearly
> so, table, means that during the inserts, postgresql thinks you have only
> a few rows.  At first, this is fine, as pgsql will seq scan the
> tables to make sure there is a proper key in both.  As the number of
> rows increases, the planner needs to switch to index scans but doesn't,
> because it doesn't know that the number of rows is increasing.
>
> Fix:  insert a few hundred rows, run analyze, check to see if the explain
> for inserts is showing index scans or not.  If not, load a few more
> hundred rows, analyze, rinse, repeat.
>
> Also, look for fk/pk mismatches.  I.e. an int4 field pointing to an int8
> field.  That's a performance killer, so if the pk/fk types don't match,
> see if you can change your field types to match and try again.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


--
Bill Moran
Potential Technologies
http://www.potentialtech.com


pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Bulk Record upload (was Re: Increasing number of PG connections)
Next
From: "scott.marlowe"
Date:
Subject: Re: Increasing number of PG connections.