Re: Optimization required for multiple insertions in PostgreSQL - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Optimization required for multiple insertions in PostgreSQL
Date
Msg-id 4EB29800020000250004299D@gw.wicourts.gov
Whole thread Raw
In response to Optimization required for multiple insertions in PostgreSQL  (siva palanisamy <psivait@gmail.com>)
List pgsql-performance
[Please keep the list copied.]

siva palanisamy <psivait@gmail.com> wrote:

> Could you pls guide me on how to minimize time consumption? I've
> postgresql 8.1.4; Linux OS.

Well, the first thing to do is to use a supported version of
PostgreSQL.  More recent releases perform better, for starters.

http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy

Whichever major release you use, you should be up-to-date on bug
fixes, some of which are fixes for bugs which cause performance
problems:

http://www.postgresql.org/support/versioning

> I'm yet to check its RAM and other memory capacity but I
> guess it would've the necessary stuffs.

Knowing what hardware you have, and what your current PostgreSQL
configuration setting are, would allow us to suggest what you might
reconfigure to tune your database.

> My master table's schema is
>
> CREATE TABLE contacts ( contact_id SERIAL PRIMARY KEY,
> contact_type INTEGER DEFAULT 0, display_name TEXT NOT NULL DEFAULT
> '', first_name TEXT DEFAULT '', last_name TEXT DEFAULT '',
> company_name TEXT DEFAULT '', last_updated TIMESTAMP NOT NULL
> DEFAULT current_timestamp, UNIQUE(display_name) ) WITHOUT OIDS;

Not that this is a performance issue, but you almost certainly will
expect the semantics provided by TIMESTAMP WITH TIME ZONE for your
last_updated column.  Just specifying TIMESTAMP is probably going to
give you an unpleasant surprise somewhere down the road.

> I've a sql function that is called from a C program where
> parameters are being passed. It is replicated for the other 2
> tables as well. Totally, I've 3 tables.

Which table is the source of your slowness, and how do you know
that?

> FYI, database connection is opened for the 1st and closed
> only after the last record is attempted. Do you think these
> constraints take a lot of time?

The only constraints you've shown are PRIMARY KEY and UNIQUE.  It is
somewhat slower to add rows with those constraints in place than to
blast in data without the constraints and then add the constraints;
but I understand that if the data is not known to be clean and free
of duplicates, that's not possible.  That certainly doesn't account
for the timings you describe.

> taking days to complete 20000 odd records are not encouraging!

I've seen PostgreSQL insert more rows than that per second, so it's
not like it is some inherent slowness of PostgreSQL.  There is
something you're doing with it that is that slow.  Getting onto a
modern version of PostgreSQL may help a lot, but most likely there's
something you're not telling us yet that is the thing that really
needs to change.

Just as one off-the-wall example of what *can* happen -- if someone
disabled autovacuum and had a function which did an update to all
rows in a table each time the function was called, they would see
performance like you describe.  How do I know, from what you've told
me, that you're *not* doing that?  Or one of a hundred other things
I could postulate?  (Hint, if you showed us your current PostgreSQL
settings I could probably have ruled this out.)

-Kevin


pgsql-performance by date:

Previous
From: Claudio Freire
Date:
Subject: Re: Blocking excessively in FOR UPDATE
Next
From: Tom Lane
Date:
Subject: Re: Predicates not getting pushed into SQL function?