Re: Slow Inserts on 1 table? - Mailing list pgsql-general

From Dan Armbrust
Subject Re: Slow Inserts on 1 table?
Date
Msg-id 42EF7751.1080604@gmail.com
Whole thread Raw
In response to Slow Inserts on 1 table?  (Dan Armbrust <daniel.armbrust.list@gmail.com>)
Responses Re: Slow Inserts on 1 table?  (Richard Huxton <dev@archonet.com>)
Re: Slow Inserts on 1 table?  ("John D. Burger" <john@mitre.org>)
Re: Slow Inserts on 1 table?  ("Joshua D. Drake" <jd@commandprompt.com>)
List pgsql-general
Dan Armbrust wrote:

> I have one particular insert query that is running orders of magnitude
> slower than other insert queries, and I cannot understand why.
> For example, Inserts into "conceptProperty" (detailed below) are at
> least 5 times faster than inserts into "conceptPropertyMultiAttributes".
>
> When I am running the inserts, postmaster shows as pegging one CPU on
> the Fedora Core 3 server it is running on at nearly 100%.
>
> Any advice is appreciated.  Here is a lot of info that may shed light
> on the issue to someone with more experience than me:
>
> Example Insert Query with data:
> INSERT INTO conceptPropertyMultiAttributes (codingSchemeName,
> conceptCode, propertyId, attributeName, attributeValue) VALUES ('NCI
> MetaThesaurus', 'C0000005', 'T-2', 'Source', 'MSH2005_2004_10_12')
>
> EXPLAIN ANALYZE output:
> QUERY PLAN
> Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.008
> rows=1 loops=1)
> Total runtime: 4.032 ms
>
> Table Structure:
> CREATE TABLE conceptpropertymultiattributes (
>    codingschemename character varying(70) NOT NULL,
>    conceptcode character varying(100) NOT NULL,
>    propertyid character varying(50) NOT NULL,
>    attributename character varying(50) NOT NULL,
>    attributevalue character varying(250) NOT NULL
> );
>
> Primary Key:
> ALTER TABLE ONLY conceptpropertymultiattributes
>    ADD CONSTRAINT conceptpropertymultiattributes_pkey PRIMARY KEY
> (codingschemename, conceptcode, propertyid, attributename,
> attributevalue);
>
> Foreign Key:
> ALTER TABLE ONLY conceptpropertymultiattributes
>    ADD CONSTRAINT f FOREIGN KEY (codingschemename, conceptcode,
> propertyid) REFERENCES conceptproperty(codingschemename, conceptcode,
> propertyid);
>
>
> Structure of Table Referenced by Foreign Key:
> CREATE TABLE conceptproperty (
>    codingschemename character varying(70) NOT NULL,
>    conceptcode character varying(100) NOT NULL,
>    propertyid character varying(50) NOT NULL,
>    property character varying(250) NOT NULL,
>    "language" character varying(32),
>    presentationformat character varying(50),
>    datatype character varying(50),
>    ispreferred boolean,
>    degreeoffidelity character varying(50),
>    matchifnocontext boolean,
>    representationalform character varying(50),
>    propertyvalue text NOT NULL
> );
>
> Primary Key:
> ALTER TABLE ONLY conceptproperty
>    ADD CONSTRAINT conceptproperty_pkey PRIMARY KEY (codingschemename,
> conceptcode, propertyid);
>
> Thanks,
>
> Dan
>
Well, I now have a further hunch on why the inserts are so slow on 1
table.  Most of the time, when I am doing bulk inserts, I am starting
with an empty database.  My insertion program creates the tables,
indexes and foreign keys.

The problem seems to be the foreign key - PostgreSQL is apparently being
to stupid to use the indexes while loading and checking the foreign key
between two large tables - my guess is because analyze has not been run
yet, so it thinks all of the tables are size 0.  If I let it run for a
while, then kill the load process, run Analyze, empty the tables, and
then restart, things perform fine.  But that is kind of a ridiculous
sequence to have to use to load a database.

Why can't postgres compile some rough statistics on tables without
running analyze?  Seems that it would be pretty easy to keep track of
the number of inserts/deletions that have taken place since the last
Analyze execution...  It may not be the exact right number, but it would
certainly be smarter than continuing to assume that the tables are size
0, even though it has been doing constant inserts on the tables in
question....

I have already had to disable sequential scans, since the planner is
almost _always_ wrong in deciding whether or not to use an index.  I put
the indexes on the columns I choose for a reason - it is because I KNOW
the index read will ALWAYS be faster since I designed the indexes for
the queries I am running.  But it still must be doing a sequential scan
on these inserts...



--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/


pgsql-general by date:

Previous
From: Samuel Thoraval
Date:
Subject: Re: System catalog diagram
Next
From: "Magnus Hagander"
Date:
Subject: Re: unicode error on win32 Was: Re: pgmonitor