Re: [BUGS] BUG #1552: massive performance hit - Mailing list pgsql-performance

From Simon Riggs
Subject Re: [BUGS] BUG #1552: massive performance hit
Date
Msg-id 1111765105.11750.749.camel@localhost.localdomain
Whole thread Raw
In response to Re: [BUGS] BUG #1552: massive performance hit between 7.4 and 8.0.1  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-performance
On Fri, 2005-03-25 at 03:50 -0700, Karim Nassar wrote:
> On Fri, 2005-03-25 at 10:18 +0000, Simon Riggs wrote:
> > > When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine
> > > with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were
> > > able to insert all this data in 5-7 minutes.  It's taken a while to
> > > install Postgres 8.0.1 on the same machine, but now I have, and it's
> > > taking 40-45 minutes to run the same insert script.
>
> <snip>
>
> > OK. Not-a-bug.
> >
> > Your situation is covered in the manual with some sage advice
> > http://www.postgresql.org/docs/8.0/static/populate.html
> > It doesn't go into great lengths about all the reasons why those
> > recommendations are good ones - but they are clear.

> Simon, this begs the question: what changed from 7.4->8.0 to require he
> modify his script?

Good question. Clearly, some combination of stats-plus-index-selection
code changed but I suspect this is a case of more, not less accuracy,
affecting us here.

The FK code literally generates SQL statements, then prepares them.
AFAICS it should be possible to add more code to
src/backend/utils/adt/ritrigger.c to force the prepare of FK code to
avoid seq scans by executing "SET enable_seqscan = off;"
I'll have a play....

But, the wider point raised by this is whether Prepare should be more
conservative in the plan it generates. When we Execute a single query,
it is perfectly OK to go for the "best" plan, since it is being executed
only this once and we can tell, right now, which one the "best" is.

With a Prepared query, it is clearly going to be executed many times and
so we should consider that the optimal plan may change over time.

Index access has more overhead for small tables, but increases by (I
think) only logN as the number of rows in a table, N, increases.
Sequential scan access varies by N. Thus, as N increases from zero,
first of all Seq Scan is the best plan - but only marginally better than
Index access, then this changes at some value of N, then after that
index access is the best plan. As N increases, Seq Scan access clearly
diverges badly from Indexed access.

The conservative choice for unknown, or varying N would be index access,
rather than the best plan available when the query is prepared.

I propose a more general TODO item:

* Make Prepared queries always use indexed access, if it is available

Best Regards, Simon Riggs


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Delete query takes exorbitant amount of time
Next
From: Simon Riggs
Date:
Subject: Re: Delete query takes exorbitant amount of time