Thread: SELECT performance drop v 6.5 -> 7.0.3

SELECT performance drop v 6.5 -> 7.0.3

From
Pascal Hingamp
Date:
Hi!

We've been very satisfied customers of Postgres 6.5 for the past 12 months.
Our bioinformatics database is growing currently with around 100 tables and
a few dozen million records.
Our only gripe was to do with the rather slow importing of data using 'COPY
FROM' (36 hours for the above db on a 533Mhz Pentium III, Linux
2.2.17-21mdk/Mandrake 7.2, 256Mb memory). Otherwise in a friendly
performance competition with our industrial partners who share exactly the
same interface code but with an Oracle backend, we were rather proud
Postgres proponents - and slightly the richer.

So we decided to upgrade to Pg 7.0.2 wondering if we'd see faster data
import. Indeed the same total import was achieved in around 6 hours on the
same machine! However SELECT statements over big table joins that
previously took an instant (a few seconds at most) under v6.5 now took so
long our web interfaces would time out (minutes at least, we aborted rather
than wait)... We tried Linux RPM's for v7.0.2 and 7.0.3, as well as gcc
(v2.95.3) compiled these versions on our machine, but to no avail. We also
noticed strong incidence on performance depending on whether we built the
indexes before import straight after creating the tables (very very slow
SELECTS), or after data import (quicker import+index build but yet even
slower SELECTS, I don't think we were ever patient enough to wait for a
SELECT result).

We've reluctantly switched back to v6.5 in order to recover a workable db
interface. We can just about still handle a db update over the week end...
However with the current rate of data growth, this will very soon prove
impossible. We also regret not being able to take advantage of the
significant other improvements made in v7. Above all we're mystified by
what has happended in v7 and how come we seem the only ones to notice the
change. Maybe we're the only ones silly enough to think such a hefty db can
live on a modest Linux box...

Any suggestions? Anyone else noticed this dramatic change from v6.5 to v7.0.3?
Cheers,
Pascal
---------------------------------------------------------------------------
Pascal Hingamp                                     hingamp@ciml.univ-mrs.fr
Techniques Avancées pour le Génome et la Clinique   http://tagc.univ-mrs.fr
Centre d'Immunologie de Marseille Luminy          mob: (+33) 06 17 53 18 80
Parc Scientifique de Luminy, Case 906             lab: (+33) 04 91 26 94 96
13288 Marseille Cedex 09, FRANCE                  Fax: (+33) 04 91 26 94 30
---------------------------------------------------------------------------

RE: SELECT performance drop v 6.5 -> 7.0.3

From
Matthew
Date:
    [snip]

> We've reluctantly switched back to v6.5 in order to recover a workable db
> interface. We can just about still handle a db update over the week end...
>
> However with the current rate of data growth, this will very soon prove
> impossible. We also regret not being able to take advantage of the
> significant other improvements made in v7. Above all we're mystified by
> what has happended in v7 and how come we seem the only ones to notice the
> change. Maybe we're the only ones silly enough to think such a hefty db
> can
> live on a modest Linux box...
>
> Any suggestions? Anyone else noticed this dramatic change from v6.5 to
> v7.0.3?
>
    A couple of thoughts / suggestions.  First I assume that after you
imported the data to performed a vacuum analyze.  If that is true it would
help us to have an explain of the queries from both 6.5 and 7.0, so please
post them (and include the data structure and any other relevant information
so we can help).  There are some differences between 7.0 and 6.5 that can
cause the planner to make different decisions.  I would imagine that the
speed issue can be resolved fairly easily.  Also, you said you used 7.0.2,
you should use 7.0.3

    As for the import process taking so long, you might want to try
turning off fsync during the import.  7.1 improves the fsync on performance
but it's still in beta.  Dropping non-required indexes before doing the
import then re-creating them after import will also help speed it up.
Always make sure you vacuum analyze it after.

    Matt

RE: SELECT performance drop v 6.5 -> 7.0.3

From
"Creager, Robert S"
Date:
I've a question.  I have often seen the 'trick' of dropping an index,
importing large amounts of data, then re-creating the index to speed the
import.  The obvious problem with this is during the time from index drop to
the index finishing re-creation, a large db is going to be essentially
worthless to queries which use those indexes.  I know nothing about the
backend and how it does 'stuff', so I may be asking something absurd here.
Why, when using transactions, are indexes updated on every insert?  It seems
logical (to someone who doesn't know better), that the indexes could be
updated on the COMMIT.

Please don't hurt me too bad...
Rob

Robert Creager
Senior Software Engineer
Client Server Library
303.673.2365 V
303.661.5379 F
888.912.4458 P
StorageTek
INFORMATION made POWERFUL



> -----Original Message-----
>
>     As for the import process taking so long, you might want to try
> turning off fsync during the import.  7.1 improves the fsync
> on performance
> but it's still in beta.  Dropping non-required indexes before
> doing the
> import then re-creating them after import will also help speed it up.
> Always make sure you vacuum analyze it after.
>
>     Matt
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
http://www.postgresql.org/search.mpl

Re: SELECT performance drop v 6.5 -> 7.0.3

From
Joseph Shraibman
Date:
"Creager, Robert S" wrote:
>
> I've a question.  I have often seen the 'trick' of dropping an index,
> importing large amounts of data, then re-creating the index to speed the
> import.  The obvious problem with this is during the time from index drop to
> the index finishing re-creation, a large db is going to be essentially
> worthless to queries which use those indexes.  I know nothing about the
> backend and how it does 'stuff', so I may be asking something absurd here.
> Why, when using transactions, are indexes updated on every insert?  It seems
> logical (to someone who doesn't know better), that the indexes could be
> updated on the COMMIT.
>
> Please don't hurt me too bad...
> Rob
>

I imagine because the transaction might do a select on data it just
inserted/updated.

--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com