inserting large number of rows was: Re: Increasing number of PG connections. - Mailing list pgsql-performance

From scott.marlowe
Subject inserting large number of rows was: Re: Increasing number of PG connections.
Date
Msg-id Pine.LNX.4.33.0402021346040.20085-100000@css120.ihs.com
Whole thread Raw
In response to Re: Increasing number of PG connections.  (Qing Zhao <qzhao@quotefx.net>)
Responses Re: inserting large number of rows was: Re: Increasing
List pgsql-performance
On Mon, 2 Feb 2004, Qing Zhao wrote:

> I am new here. I have a question related to this in some way.
>
> Our web site needs to upload a large volume of data into Postgres at a
> time. The performance deterioates as number of rows becomes larger.
> When it reaches 2500 rows, it never come back to GUI. Since the tests
> were run through GUI, my suspision is
> that it might be caused by the way the application server talking to
> Postgres server, the connections, etc.. What might be the factors
> involved here? Does anyone know?

Actually, I'm gonna go out on a limb here and assume two things:

1. you've got lotsa fk/pk relationships setup.
2. you're analyzing the table empty before loading it up.

What happens in this instance is that the analyze on an empty, or nearly
so, table, means that during the inserts, postgresql thinks you have only
a few rows.  At first, this is fine, as pgsql will seq scan the
tables to make sure there is a proper key in both.  As the number of
rows increases, the planner needs to switch to index scans but doesn't,
because it doesn't know that the number of rows is increasing.

Fix:  insert a few hundred rows, run analyze, check to see if the explain
for inserts is showing index scans or not.  If not, load a few more
hundred rows, analyze, rinse, repeat.

Also, look for fk/pk mismatches.  I.e. an int4 field pointing to an int8
field.  That's a performance killer, so if the pk/fk types don't match,
see if you can change your field types to match and try again.


pgsql-performance by date:

Previous
From: Adam Ruth
Date:
Subject: Re: [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?
Next
From: Richard Huxton
Date:
Subject: Bulk Record upload (was Re: Increasing number of PG connections)