Thread: Slow Inserts on 1 table?

Slow Inserts on 1 table?

From

Dan Armbrust

Date:

20 July 2005, 11:51:29

I have one particular insert query that is running orders of magnitude
slower than other insert queries, and I cannot understand why.
For example, Inserts into "conceptProperty" (detailed below) are at
least 5 times faster than inserts into "conceptPropertyMultiAttributes".

When I am running the inserts, postmaster shows as pegging one CPU on
the Fedora Core 3 server it is running on at nearly 100%.

Any advice is appreciated.  Here is a lot of info that may shed light on
the issue to someone with more experience than me:

Example Insert Query with data:
INSERT INTO conceptPropertyMultiAttributes (codingSchemeName,
conceptCode, propertyId, attributeName, attributeValue) VALUES ('NCI
MetaThesaurus', 'C0000005', 'T-2', 'Source', 'MSH2005_2004_10_12')

EXPLAIN ANALYZE output:
QUERY PLAN
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.008
rows=1 loops=1)
Total runtime: 4.032 ms

Table Structure:
CREATE TABLE conceptpropertymultiattributes (
    codingschemename character varying(70) NOT NULL,
    conceptcode character varying(100) NOT NULL,
    propertyid character varying(50) NOT NULL,
    attributename character varying(50) NOT NULL,
    attributevalue character varying(250) NOT NULL
);

Primary Key:
ALTER TABLE ONLY conceptpropertymultiattributes
    ADD CONSTRAINT conceptpropertymultiattributes_pkey PRIMARY KEY
(codingschemename, conceptcode, propertyid, attributename, attributevalue);

Foreign Key:
ALTER TABLE ONLY conceptpropertymultiattributes
    ADD CONSTRAINT f FOREIGN KEY (codingschemename, conceptcode,
propertyid) REFERENCES conceptproperty(codingschemename, conceptcode,
propertyid);


Structure of Table Referenced by Foreign Key:
CREATE TABLE conceptproperty (
    codingschemename character varying(70) NOT NULL,
    conceptcode character varying(100) NOT NULL,
    propertyid character varying(50) NOT NULL,
    property character varying(250) NOT NULL,
    "language" character varying(32),
    presentationformat character varying(50),
    datatype character varying(50),
    ispreferred boolean,
    degreeoffidelity character varying(50),
    matchifnocontext boolean,
    representationalform character varying(50),
    propertyvalue text NOT NULL
);

Primary Key:
ALTER TABLE ONLY conceptproperty
    ADD CONSTRAINT conceptproperty_pkey PRIMARY KEY (codingschemename,
conceptcode, propertyid);

Thanks,

Dan

Re: Slow Inserts on 1 table?

From

"Jim C. Nasby"

Date:

20 July 2005, 13:12:24

What indexes are defined on both tables? Are there any triggers or
rules?

On Wed, Jul 20, 2005 at 09:50:54AM -0500, Dan Armbrust wrote:
> I have one particular insert query that is running orders of magnitude
> slower than other insert queries, and I cannot understand why.
> For example, Inserts into "conceptProperty" (detailed below) are at
> least 5 times faster than inserts into "conceptPropertyMultiAttributes".
>
> When I am running the inserts, postmaster shows as pegging one CPU on
> the Fedora Core 3 server it is running on at nearly 100%.
>
> Any advice is appreciated.  Here is a lot of info that may shed light on
> the issue to someone with more experience than me:
>
> Example Insert Query with data:
> INSERT INTO conceptPropertyMultiAttributes (codingSchemeName,
> conceptCode, propertyId, attributeName, attributeValue) VALUES ('NCI
> MetaThesaurus', 'C0000005', 'T-2', 'Source', 'MSH2005_2004_10_12')
>
> EXPLAIN ANALYZE output:
> QUERY PLAN
> Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.008
> rows=1 loops=1)
> Total runtime: 4.032 ms
>
> Table Structure:
> CREATE TABLE conceptpropertymultiattributes (
>    codingschemename character varying(70) NOT NULL,
>    conceptcode character varying(100) NOT NULL,
>    propertyid character varying(50) NOT NULL,
>    attributename character varying(50) NOT NULL,
>    attributevalue character varying(250) NOT NULL
> );
>
> Primary Key:
> ALTER TABLE ONLY conceptpropertymultiattributes
>    ADD CONSTRAINT conceptpropertymultiattributes_pkey PRIMARY KEY
> (codingschemename, conceptcode, propertyid, attributename, attributevalue);
>
> Foreign Key:
> ALTER TABLE ONLY conceptpropertymultiattributes
>    ADD CONSTRAINT f FOREIGN KEY (codingschemename, conceptcode,
> propertyid) REFERENCES conceptproperty(codingschemename, conceptcode,
> propertyid);
>
>
> Structure of Table Referenced by Foreign Key:
> CREATE TABLE conceptproperty (
>    codingschemename character varying(70) NOT NULL,
>    conceptcode character varying(100) NOT NULL,
>    propertyid character varying(50) NOT NULL,
>    property character varying(250) NOT NULL,
>    "language" character varying(32),
>    presentationformat character varying(50),
>    datatype character varying(50),
>    ispreferred boolean,
>    degreeoffidelity character varying(50),
>    matchifnocontext boolean,
>    representationalform character varying(50),
>    propertyvalue text NOT NULL
> );
>
> Primary Key:
> ALTER TABLE ONLY conceptproperty
>    ADD CONSTRAINT conceptproperty_pkey PRIMARY KEY (codingschemename,
> conceptcode, propertyid);
>
> Thanks,
>
> Dan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Slow Inserts on 1 table?

From

Dan Armbrust

Date:

02 August 2005, 10:38:39

Dan Armbrust wrote:

> I have one particular insert query that is running orders of magnitude
> slower than other insert queries, and I cannot understand why.
> For example, Inserts into "conceptProperty" (detailed below) are at
> least 5 times faster than inserts into "conceptPropertyMultiAttributes".
>
> When I am running the inserts, postmaster shows as pegging one CPU on
> the Fedora Core 3 server it is running on at nearly 100%.
>
> Any advice is appreciated.  Here is a lot of info that may shed light
> on the issue to someone with more experience than me:
>
> Example Insert Query with data:
> INSERT INTO conceptPropertyMultiAttributes (codingSchemeName,
> conceptCode, propertyId, attributeName, attributeValue) VALUES ('NCI
> MetaThesaurus', 'C0000005', 'T-2', 'Source', 'MSH2005_2004_10_12')
>
> EXPLAIN ANALYZE output:
> QUERY PLAN
> Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.008
> rows=1 loops=1)
> Total runtime: 4.032 ms
>
> Table Structure:
> CREATE TABLE conceptpropertymultiattributes (
>    codingschemename character varying(70) NOT NULL,
>    conceptcode character varying(100) NOT NULL,
>    propertyid character varying(50) NOT NULL,
>    attributename character varying(50) NOT NULL,
>    attributevalue character varying(250) NOT NULL
> );
>
> Primary Key:
> ALTER TABLE ONLY conceptpropertymultiattributes
>    ADD CONSTRAINT conceptpropertymultiattributes_pkey PRIMARY KEY
> (codingschemename, conceptcode, propertyid, attributename,
> attributevalue);
>
> Foreign Key:
> ALTER TABLE ONLY conceptpropertymultiattributes
>    ADD CONSTRAINT f FOREIGN KEY (codingschemename, conceptcode,
> propertyid) REFERENCES conceptproperty(codingschemename, conceptcode,
> propertyid);
>
>
> Structure of Table Referenced by Foreign Key:
> CREATE TABLE conceptproperty (
>    codingschemename character varying(70) NOT NULL,
>    conceptcode character varying(100) NOT NULL,
>    propertyid character varying(50) NOT NULL,
>    property character varying(250) NOT NULL,
>    "language" character varying(32),
>    presentationformat character varying(50),
>    datatype character varying(50),
>    ispreferred boolean,
>    degreeoffidelity character varying(50),
>    matchifnocontext boolean,
>    representationalform character varying(50),
>    propertyvalue text NOT NULL
> );
>
> Primary Key:
> ALTER TABLE ONLY conceptproperty
>    ADD CONSTRAINT conceptproperty_pkey PRIMARY KEY (codingschemename,
> conceptcode, propertyid);
>
> Thanks,
>
> Dan
>
Well, I now have a further hunch on why the inserts are so slow on 1
table.  Most of the time, when I am doing bulk inserts, I am starting
with an empty database.  My insertion program creates the tables,
indexes and foreign keys.

The problem seems to be the foreign key - PostgreSQL is apparently being
to stupid to use the indexes while loading and checking the foreign key
between two large tables - my guess is because analyze has not been run
yet, so it thinks all of the tables are size 0.  If I let it run for a
while, then kill the load process, run Analyze, empty the tables, and
then restart, things perform fine.  But that is kind of a ridiculous
sequence to have to use to load a database.

Why can't postgres compile some rough statistics on tables without
running analyze?  Seems that it would be pretty easy to keep track of
the number of inserts/deletions that have taken place since the last
Analyze execution...  It may not be the exact right number, but it would
certainly be smarter than continuing to assume that the tables are size
0, even though it has been doing constant inserts on the tables in
question....

I have already had to disable sequential scans, since the planner is
almost _always_ wrong in deciding whether or not to use an index.  I put
the indexes on the columns I choose for a reason - it is because I KNOW
the index read will ALWAYS be faster since I designed the indexes for
the queries I am running.  But it still must be doing a sequential scan
on these inserts...

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: Slow Inserts on 1 table?

From

Richard Huxton

Date:

02 August 2005, 11:30:42

Dan Armbrust wrote:
> Dan Armbrust wrote:
>
>> I have one particular insert query that is running orders of magnitude
>> slower than other insert queries, and I cannot understand why.
>> For example, Inserts into "conceptProperty" (detailed below) are at
>> least 5 times faster than inserts into "conceptPropertyMultiAttributes".

> Well, I now have a further hunch on why the inserts are so slow on 1
> table.  Most of the time, when I am doing bulk inserts, I am starting
> with an empty database.  My insertion program creates the tables,
> indexes and foreign keys.
> The problem seems to be the foreign key - PostgreSQL is apparently being
> to stupid to use the indexes while loading and checking the foreign key
> between two large tables - my guess is because analyze has not been run
> yet, so it thinks all of the tables are size 0.

If you haven't analysed them since creation, it should think size=1000,
which is a safety measure to reduce this sort of problem.

 > If I let it run for a
> while, then kill the load process, run Analyze, empty the tables, and
> then restart, things perform fine.  But that is kind of a ridiculous
> sequence to have to use to load a database.
> Why can't postgres compile some rough statistics on tables without
> running analyze?  Seems that it would be pretty easy to keep track of
> the number of inserts/deletions that have taken place since the last
> Analyze execution...  It may not be the exact right number, but it would
> certainly be smarter than continuing to assume that the tables are size
> 0, even though it has been doing constant inserts on the tables in
> question....

Yep, but it would have to do it all the time. That's overhead on every
query.

> I have already had to disable sequential scans, since the planner is
> almost _always_ wrong in deciding whether or not to use an index.

Then either your stats are badly out, or your other configuration
settings are.

> I put
> the indexes on the columns I choose for a reason - it is because I KNOW
> the index read will ALWAYS be faster since I designed the indexes for
> the queries I am running.  But it still must be doing a sequential scan
> on these inserts...

What, ALWAYS faster, even for the first FK check when there's only one
row in the target table and that's cached?

If you're really in a hurry doing your bulk loads:
  1. Use COPY.
  2. Drop/restore the foreign-key constraints before/after.
That will be hugely faster than INSERTs, although it's not always an
applicable solution.
--
   Richard Huxton
   Archonet Ltd

Re: Slow Inserts on 1 table?

From

"John D. Burger"

Date:

02 August 2005, 11:31:44

> my guess is because analyze has not been run yet, so it thinks all of
> the tables are size 0.  If I let it run for a while, then kill the
> load process, run Analyze, empty the tables, and then restart, things
> perform fine.  But that is kind of a ridiculous sequence to have to
> use to load a database.

So automate it.  After discovering exactly this behavior, I've
developed an idiom for load scripts where I (optionally) commit at some
linear interval, and (optionally) analyze at some exponential interval.
  I presume this has been invented countless times, but here's my basic
idea in pseudo-code:

   commitInterval = 1000
   analyzeFactor = 2
   whenToCommit = whenToAnalyze = commitInterval
   nInserts = 0

   loop over input data
       if we decide to insert
           insert
           nInserts++
       if whenToCommit < nInserts
           commmit
           whenToCommit += commitInterval
           if whenToAnalyze < nInserts
               analyze
               whenToAnalyze *= 2
       ....

So (with these constants) we commit after 1000 total inserts, then
after 2000, 3000, etc.  And we analyze after 1000 inserts, then after
2000, 4000, etc.  This is perhaps way too conservative - in particular,
I suspect that it's only the first one or two analyzes that matter -
but it works for me.

The only annoyance is that the interface I use most often, Python's
pgdb, runs everything in a transaction, and you can't analyze in a
transaction.  I've gotten around this in a variety of ways, some less
principled than others.

- John D. Burger
   MITRE

Re: Slow Inserts on 1 table?

From

"Joshua D. Drake"

Date:

02 August 2005, 11:42:33

> Why can't postgres compile some rough statistics on tables without
> running analyze?

Why can't you just run analyze? You don't have to empty the tables to do
so and you can alter the statistics on the fly. Heck you can even run
analyze while doing the inserts.

Perhaps the fine manual would be of assistance?

http://www.postgresql.org/docs/8.0/interactive/sql-analyze.html
http://www.postgresql.org/docs/8.0/interactive/sql-altertable.html

> I have already had to disable sequential scans, since the planner is
> almost _always_ wrong in deciding whether or not to use an index.

Sounds again like you are not setting up your statistics correctly or
running analyze as and when it should.

Sincerely,

Joshua D. Drake



--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: Slow Inserts on 1 table?

From

Dan Armbrust

Date:

02 August 2005, 11:43:10

>
> What, ALWAYS faster, even for the first FK check when there's only one
> row in the target table and that's cached?
>
> If you're really in a hurry doing your bulk loads:
>  1. Use COPY.
>  2. Drop/restore the foreign-key constraints before/after.
> That will be hugely faster than INSERTs, although it's not always an
> applicable solution.
> --
>   Richard Huxton
>   Archonet Ltd
>
It seems like the query planner goes to great lengths to avoid using
indexes because it might take 5 ms longer to execute an index lookup on
a table with one row.

But then, when the table has 1 million rows, and a full scan takes 3
minutes, and the index scan takes 3 seconds, it has no problem picking
the 3 minute route.
I'll gladly give up the 5 ms in turn for not having to wait 3 minutes,
which is why I disabled the sequential scans.  If I have a small table,
where indexes won't speed things up, I wont build an index on it.

The other factor, is that most of my tables have at least thousands, and
usually millions of rows.  Sequential scans will never be faster for the
queries that I am doing - like I said, that is why I created the indexes.

My loading is done programatically, from another format, so COPY is not
an option.  Neither is removing foreign keys, as they are required to
guarantee valid data.  I don't really have a problem with the insert
speed when it is working properly - it is on par with other DBs that I
have on the same hardware.  The problem is when it stops using the
indexes, for no good reason.

Example, last night, I kicked off a load process - this morning, it had
only managed to make it through about 600,000 rows (split across several
tables).  After restarting it this morning, it made it through the same
data in 30 minutes.

If thats not bad and buggy behavior, I don't know what is....

Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: Slow Inserts on 1 table?

From

Dan Armbrust

Date:

02 August 2005, 12:02:13

Joshua D. Drake wrote:

>> Why can't postgres compile some rough statistics on tables without
>> running analyze?
>
>
> Why can't you just run analyze? You don't have to empty the tables to
> do so and you can alter the statistics on the fly. Heck you can even
> run analyze while doing the inserts.

I shouldn't have to manually run Analyze to make the DB be capable of
handling inserts involving tables with foreign keys correctly.  My code
that is doing the inserts is a java application that works across
multiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc.

I shouldn't have to put custom code into it just to make postgres deal
with inserts properly.  No other database that I insert data into has
problems like this.

This will look really nice in the instructions for my data loader -

* - If you are using PostgreSQL for your database server, it has a bug
that causes its performance to become abysmal unless you manually run
this "Analyze" command a little while after you start the load process.

I will have users that don't even know what a database is, much less
have to go out and run manual sysadmin level commands on it to make the
thing work.

>
>> I have already had to disable sequential scans, since the planner is
>> almost _always_ wrong in deciding whether or not to use an index.
>
>
> Sounds again like you are not setting up your statistics correctly or
> running analyze as and when it should.

Doesn't matter if the statistics are perfectly up to date.  It still
doesn't use the indexes.  If the default statistics are so poorly set up
that the planner thinks a 5 million row table scan will be quicker for a
query that is doing exact matches on indexed columns, I would say it is
poorly implemented.  So I just disabled that "feature".  And it works
fine with sequential scans disabled - I have no problem with it in this
respect, since I can turn it off.

Is there any way that I can disable sequential scans for foreign key checks?

Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: Slow Inserts on 1 table?

From

Richard Huxton

Date:

02 August 2005, 12:15:35

Dan Armbrust wrote:
>
>>
>> What, ALWAYS faster, even for the first FK check when there's only one
>> row in the target table and that's cached?
>>
>> If you're really in a hurry doing your bulk loads:
>>  1. Use COPY.
>>  2. Drop/restore the foreign-key constraints before/after.
>> That will be hugely faster than INSERTs, although it's not always an
>> applicable solution.
>> --
>>   Richard Huxton
>>   Archonet Ltd
>>
> It seems like the query planner goes to great lengths to avoid using
> indexes because it might take 5 ms longer to execute an index lookup on
> a table with one row.
> But then, when the table has 1 million rows, and a full scan takes 3
> minutes, and the index scan takes 3 seconds, it has no problem picking
> the 3 minute route.
> I'll gladly give up the 5 ms in turn for not having to wait 3 minutes,
> which is why I disabled the sequential scans.  If I have a small table,
> where indexes won't speed things up, I wont build an index on it.
>
> The other factor, is that most of my tables have at least thousands, and
> usually millions of rows.  Sequential scans will never be faster for the
> queries that I am doing - like I said, that is why I created the indexes.

The issue is nothing to do with special "small table" handling code.
It's all to do with not having up-to-date stats. Of course, once you've
analysed your table the system knows your index is good.

> My loading is done programatically, from another format, so COPY is not
> an option.

Why not? A lot of my bulk-loads are generated from other systems and I
go through a temporary-file/pipe via COPY when I can. When I don't I
block inserts into groups of e.g. 1000 and stick in an analyse/etc as
required.

 > Neither is removing foreign keys, as they are required to
> guarantee valid data.

Ah, but you can still guarantee your data. You can wrap the whole
drop-FK, bulk-load, recreate-FK in a single transaction, and it can
still be faster. Obviously doing this on a high-activity table won't win
though, you'll have to block everyone else doing updates.

> I don't really have a problem with the insert
> speed when it is working properly - it is on par with other DBs that I
> have on the same hardware.  The problem is when it stops using the
> indexes, for no good reason.
>
> Example, last night, I kicked off a load process - this morning, it had
> only managed to make it through about 600,000 rows (split across several
> tables).  After restarting it this morning, it made it through the same
> data in 30 minutes.
> If thats not bad and buggy behavior, I don't know what is....

So run ANALYSE in parallel with your load, or break the bulk-load into
blocks and analyse in-line. I'm not sure ripping out PG's cost-based
query analyser will be a popular solution just to address bulk-loads.

--
   Richard Huxton
   Archonet Ltd

Re: Slow Inserts on 1 table?

From

Alvaro Herrera

Date:

02 August 2005, 12:16:29

On Tue, Aug 02, 2005 at 10:01:50AM -0500, Dan Armbrust wrote:

> I shouldn't have to manually run Analyze to make the DB be capable of
> handling inserts involving tables with foreign keys correctly.  My code
> that is doing the inserts is a java application that works across
> multiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc.

[etc, rant removed]

You don't _have_ to be rude.  This is a known limitation, and people
have suggested the usual workarounds.  This is an open source project --
if you think you can make it better, please by all means post a patch.

HAND.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)

Re: Slow Inserts on 1 table?

From

Dan Armbrust

Date:

02 August 2005, 12:31:31

>> My loading is done programatically, from another format, so COPY is
>> not an option.
>
>
> Why not? A lot of my bulk-loads are generated from other systems and I
> go through a temporary-file/pipe via COPY when I can. When I don't I
> block inserts into groups of e.g. 1000 and stick in an analyse/etc as
> required.


I guess I should clarify - my inserts are done by a Java application
running on a client machine. This isn't bulk load in the normal definition.

I don't have any problem with the speed of the inserts when they are
working correctly.  The only problem is that the query analyzer is
making a really poor decision when it is executing insert statements on
tables that have foreign keys.


>
> So run ANALYSE in parallel with your load, or break the bulk-load into
> blocks and analyse in-line. I'm not sure ripping out PG's cost-based
> query analyser will be a popular solution just to address bulk-loads.


I never suggested that it needed to be ripped out.  It just seems that
when it is looking to check foreign keys, and the statistics are not up
to date (or have not yet been created) it should default to using the
indexes, rather than not using the indexes.  The time savings of using
indexes when things are big is FAR bigger than the time savings of not
using indexes when things are small.


Dan



--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: Slow Inserts on 1 table?

From

Dan Armbrust

Date:

02 August 2005, 12:41:15

Alvaro Herrera wrote:

  On Tue, Aug 02, 2005 at 10:01:50AM -0500, Dan Armbrust wrote:



    I shouldn't have to manually run Analyze to make the DB be capable of
handling inserts involving tables with foreign keys correctly.  My code
that is doing the inserts is a java application that works across
multiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc.



[etc, rant removed]

You don't _have_ to be rude.  This is a known limitation, and people
have suggested the usual workarounds.  This is an open source project --
if you think you can make it better, please by all means post a patch.

HAND.



My apologies, I don't intend to be rude. 

But it is rather easy to get into rant mode when the prevailing opinion
is that not being able to insert rows into a table with a foreign key
without running Analyze after X rows is a misuse of the DB, rather than
a bug.

I did not know that this is a known limitation, I have not been able to
find any documentation that talks about how foreign keys, indexes, and
the query planner relate.

My first assumption was that since foreign key creation implicitly
creates the necessary indexes, that these indexes would always be used
for foreign key checks.  I wouldn't have even guessed that the query
planner was involved in this portion.  But, these are all (apparently
wrong) guesses - I don't know the internals.  The performance probably
is better on small tables to not use these indexes.  But it seems to
me, that if you know that the statistics are out of date (which I would
think that you should know, if analyze hasn't been run since the tables
were created) that the safer choice would be to use the indexes, rather
than not using the indexes.


Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Force PostgreSQL to use indexes on foreign key lookups - Was: Slow Inserts on 1 table?

From

Dan Armbrust

Date:

02 August 2005, 13:06:04

An aha moment:

http://archives.postgresql.org/pgsql-bugs/2005-03/msg00183.php

Some of the ensuing conversation seemed to indicate that a change was
made in the 8.0 branch in March, that was intended to fix this issue.
Any idea if that fix would have made it into the 8.0.3 release?

Or maybe the fix didn't fix the issue.

Dan


--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: Slow Inserts on 1 table?

From

Tom Lane

Date:

02 August 2005, 13:53:41

"John D. Burger" <john@mitre.org> writes:
> The only annoyance is that the interface I use most often, Python's
> pgdb, runs everything in a transaction, and you can't analyze in a
> transaction.

Hm?  We've allowed ANALYZE inside a transaction for a long time.

The real solution to Dan's problem, of course, is to throw away the
cached plan for the FK check and re-plan it once the table sizes have
changed enough to invalidate the plan.  Neil Conway was working on
infrastructure for this, but it didn't get done in time for 8.1 ...
maybe it will be there in 8.2.

In the meantime, though, I don't see any mention in the thread of
exactly which PG version Dan is using.  If it's 8.0.0 or 8.0.1,
an update would probably help --- we tweaked the rules for never-yet-
vacuumed tables in 8.0.2.

            regards, tom lane

Re: Slow Inserts on 1 table?

From

"John D. Burger"

Date:

02 August 2005, 15:08:38

>> The only annoyance is that the interface I use most often, Python's
>> pgdb, runs everything in a transaction, and you can't analyze in a
>> transaction.
>
> Hm?  We've allowed ANALYZE inside a transaction for a long time.

I'm stuck with using 7.2, for now, and I get this:

   ERROR:  ANALYZE cannot run inside a BEGIN/END block

I'll be happy to find out that this restriction's been removed in later
versions.

- John D. Burger
   MITRE

Re: Slow Inserts on 1 table?

From

"Joshua D. Drake"

Date:

02 August 2005, 15:19:39

John D. Burger wrote:
>>> The only annoyance is that the interface I use most often, Python's
>>> pgdb, runs everything in a transaction, and you can't analyze in a
>>> transaction.
>>
>>
>> Hm?  We've allowed ANALYZE inside a transaction for a long time.
>
>
> I'm stuck with using 7.2, for now, and I get this:
>
>   ERROR:  ANALYZE cannot run inside a BEGIN/END block
>
> I'll be happy to find out that this restriction's been removed in later
> versions.

Your running 7.2? That is all kinds of level of... huh? Why?

Sincerely,

Joshua D. Drake


>
> - John D. Burger
>   MITRE
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: Slow Inserts on 1 table?

From

"John D. Burger"

Date:

02 August 2005, 15:26:59

> Your running 7.2? That is all kinds of level of... huh? Why?

I'm not running it, my organization is.  Not sure how to interpret "all
kinds of level of..."  Are there any huge suckages that I can use to
leverage an update?  I'm familiar with some of the smaller ones.

- John D. Burger
   G63

Re: Slow Inserts on 1 table?

From

Tom Lane

Date:

02 August 2005, 16:21:16

"John D. Burger" <john@mitre.org> writes:
>> Your running 7.2? That is all kinds of level of... huh? Why?

> I'm not running it, my organization is.  Not sure how to interpret "all
> kinds of level of..."  Are there any huge suckages that I can use to
> leverage an update?  I'm familiar with some of the smaller ones.

Lots, what does it take to get their attention?  Feature-wise, there are
such small matters as schemas.  Performance-wise, there are quite a lot
of improvements since 7.2.  Security-wise, there are unfixable holes
in 7.2 (try "select cash_out(2)").

I hope you're at least on 7.2.8, else you are also vulnerable to a
number of data-loss-grade bugs.  I don't recall at the moment whether
there were any data-loss issues that we deemed unfixable in 7.2.*, but
it wouldn't surprise me.

Try perusing the release notes at
http://developer.postgresql.org/docs/postgres/release.html
for ammunition.

            regards, tom lane

Re: Slow Inserts on 1 table?

From

"John D. Burger"

Date:

02 August 2005, 17:52:27

> I hope you're at least on 7.2.8, else you are also vulnerable to a
> number of data-loss-grade bugs.

Sadly, it seems to be exactly 7.2.

> Try perusing the release notes at
> http://developer.postgresql.org/docs/postgres/release.html
> for ammunition.

Excellent - the various security holes might get their attention.
Thanks!

- John D. Burger
   MITRE

Re: Slow Inserts on 1 table?

From

Scott Marlowe

Date:

02 August 2005, 18:11:36

On Tue, 2005-08-02 at 15:52, John D. Burger wrote:
> > I hope you're at least on 7.2.8, else you are also vulnerable to a
> > number of data-loss-grade bugs.
>
> Sadly, it seems to be exactly 7.2.
>
> > Try perusing the release notes at
> > http://developer.postgresql.org/docs/postgres/release.html
> > for ammunition.
>
> Excellent - the various security holes might get their attention.
> Thanks!

Also, the index bloat issues in 7.2 can be quite a performance killer.
Just sayin...

Re: Slow Inserts on 1 table?

From

Gregory Youngblood

Date:

05 August 2005, 02:48:29

On Aug 2, 2005, at 8:16 AM, Alvaro Herrera wrote:

On Tue, Aug 02, 2005 at 10:01:50AM -0500, Dan Armbrust wrote:

I shouldn't have to manually run Analyze to make the DB be capable of
handling inserts involving tables with foreign keys correctly. My code
that is doing the inserts is a java application that works across
multiple DBS - MySQL, PostgreSQL, DB2, MS Access, Sybase, etc.

[etc, rant removed]

You don't _have_ to be rude. This is a known limitation, and people
have suggested the usual workarounds. This is an open source project --
if you think you can make it better, please by all means post a patch.

Not to fan the flames, so to speak, but I do have on question. If this is a known limitation, I'd expect the behavior to be consistent. Instead, he is saying that the problem is intermittent. Sometimes it runs fine, other times it slows down to a snail's pace.

So, does that mean the known problem is indeed intermittent, and not something that happens every time? Or, is this an issue that can be mostly eliminated with appropriate tuning?

I ask, because I may soon be facing similar problems, and forewarned is forearmed. I'd like to anticipate this and tune things accordingly before it becomes an issue.

Thanks,

Greg

Re: Slow Inserts on 1 table?

From

Martijn van Oosterhout

Date:

05 August 2005, 06:24:24

On Tue, Aug 02, 2005 at 10:55:22AM -0700, Gregory Youngblood wrote:
> Not to fan the flames, so to speak, but I do have on question. If
> this is a known limitation, I'd expect the behavior to be consistent.
> Instead, he is saying that the problem is intermittent. Sometimes it
> runs fine, other times it slows down to a snail's pace.
>
> So, does that mean the known problem is indeed intermittent, and not
> something that happens every time? Or, is this an issue that can be
> mostly eliminated with appropriate tuning?

The problem is basically that PostgreSQL chooses a seq scan if it
thinks the table is empty. But a normal CREATE TABLE fudges the
statistics so that it chooses an index scan.

The problem mostly arises if you use an ANALYZE or VACUUM ANALYZE
*after* you've deleted all the rows but *before* you load the data.
Don't do that. If you need to run VACUUM to clear out the table, do so,
just don't analyze at the same time. Or use TRUNCATE.

At least that's my experience.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-101775.dat