Thread: Perfomance difference between 7.2 and 7.3

Perfomance difference between 7.2 and 7.3

From
Paulo Jan
Date:
Hi all:

    I have here a table with the following schema:

                        Table "todocinetv"
    Column    |            Type             |      Modifiers
-------------+-----------------------------+----------------------
  id          | integer                     | not null default '0'
  datestamp   | timestamp without time zone | not null
  thread      | integer                     | not null default '0'
  parent      | integer                     | not null default '0'
  author      | character(37)               | not null default ''
  subject     | character(255)              | not null default ''
  email       | character(200)              | not null default ''
  attachment  | character(64)               | default ''
  host        | character(50)               | not null default ''
  email_reply | character(1)                | not null default 'N'
  approved    | character(1)                | not null default 'N'
  msgid       | character(100)              | not null default ''
  modifystamp | integer                     | not null default '0'
  userid      | integer                     | not null default '0'
Indexes: todocinetv_approved,
          todocinetv_author,
          todocinetv_datestamp,
          todocinetv_modifystamp,
          todocinetv_msgid,
          todocinetv_parent,
          todocinetv_subject,
          todocinetv_thread,
          todocinetv_userid,
          todocinetvpri_key


    (It's actually a table created by the discussion board application
Phorum (version 3.3)).
    This table has about 28000 rows, and is running with Postgres 7.2.3
under Red Hat 8.0, in a 2.4 Ghz. Pentiun 4 with 512 Mb. of RAM.
    The problem I'm having is that, when you access the main page of the
discussion board, it takes forever to show you the list of posts. The
query that Phorum uses for doing so is:


phorum=# explain
phorum-#  SELECT thread, modifystamp, count(id) AS tcount,
datetime(modifystamp) AS latest, max(id) as maxid FROM todocinetv WHERE
approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc,
thread desc limit 30;
NOTICE:  QUERY PLAN:

Limit  (cost=40354.79..40354.79 rows=30 width=12)
   ->  Sort  (cost=40354.79..40354.79 rows=2879 width=12)
         ->  Aggregate  (cost=39901.43..40189.35 rows=2879 width=12)
               ->  Group  (cost=39901.43..40045.39 rows=28792 width=12)
                     ->  Sort  (cost=39901.43..39901.43 rows=28792 width=12)
                           ->  Seq Scan on todocinetv
(cost=0.00..37768.90 rows=28792 width=12)


    This query takes up to 3 minutes to execute. I have tried to strip it
down and leaving it in its most vanilla form (without "count(id)" and
such), and it's still almost as slow:


phorum=# explain
phorum-# SELECT thread, modifystamp,  datetime(modifystamp) AS latest
from todocinetv WHERE approved='Y'  ORDER BY modifystamp desc, thread
desc limit 30;
NOTICE:  QUERY PLAN:

Limit  (cost=39901.43..39901.43 rows=30 width=8)
   ->  Sort  (cost=39901.43..39901.43 rows=28792 width=8)
         ->  Seq Scan on todocinetv  (cost=0.00..37768.90 rows=28792
width=8)


    But here is the weird thing: I dump the table, export it into another
machine running Postgres 7.3.2 (Celeron 1.7 Ghz, 512 Mb. of memory), and
the query takes only 2 or 3 seconds to execute, even though the query
plan is almost the same:


provphorum=# explain
provphorum-#  SELECT thread, modifystamp, count(id) AS tcount,
modifystamp AS latest, max(id) as maxid FROM todocinetv WHERE
approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc,
thread desc limit 30 ;
                                            QUERY PLAN

------------------------------------------------------------------------------------------------
  Limit  (cost=5765.92..5765.99 rows=30 width=12)
    ->  Sort  (cost=5765.92..5772.96 rows=2817 width=12)
          Sort Key: modifystamp, thread
          ->  Aggregate  (cost=5252.34..5604.49 rows=2817 width=12)
                ->  Group  (cost=5252.34..5463.63 rows=28172 width=12)
                      ->  Sort  (cost=5252.34..5322.77 rows=28172 width=12)
                            Sort Key: thread, modifystamp
                            ->  Seq Scan on todocinetv
(cost=0.00..3170.15 rows=28172 width=12)
                                  Filter: (approved = 'Y'::bpchar)
(9 rows)


    (I took out the "datetime" function, since 7.3 didn't accept it and I
didn't think it was relevant to the performance problem (am I wrong?))

    So my question is: what causes such a big difference? (3 min. vs. 3
seconds) Does the version difference (7.2 vs. 7.3) account for all of
it? Or should I start looking at other factors? As I said, both machines
are almost equivalent hardware-wise, and as for the number of shared
buffers, the faster machine actually has less of them (the 7.3 machine
has "shared_buffers = 768", while the 7.2 one has "shared_buffers = 1024").



                    Paulo Jan.
                    DDnet.



Re: Perfomance difference between 7.2 and 7.3

From
Tom Lane
Date:
Paulo Jan <admin@digital.ddnet.es> writes:
>          ->  Seq Scan on todocinetv  (cost=0.00..37768.90 rows=28792
> width=8)

The estimated cost seems to be more than one disk page read per row
returned.  This suggests to me that you have a huge amount of dead space
in that table --- try a VACUUM FULL on it.  If that fixes the problem,
then you need to improve your housekeeping procedures on the 7.2
installation: run vacuums more often and ensure that your FSM settings
are large enough.

            regards, tom lane

Re: Perfomance difference between 7.2 and 7.3

From
Bruno Wolff III
Date:
On Wed, Nov 12, 2003 at 16:30:41 +0100,
  Paulo Jan <admin@digital.ddnet.es> wrote:
>     This table has about 28000 rows, and is running with Postgres 7.2.3
> under Red Hat 8.0, in a 2.4 Ghz. Pentiun 4 with 512 Mb. of RAM.

You probably want to use 7.4 for this since a new way to do aggragates using
hashes has been added. 7.4 is currently in release candidate status and
maybe be released as early as next Monday.

>
>     This query takes up to 3 minutes to execute. I have tried to strip
>     it down and leaving it in its most vanilla form (without "count(id)" and
>     But here is the weird thing: I dump the table, export it into
>     another machine running Postgres 7.3.2 (Celeron 1.7 Ghz, 512 Mb. of
> memory), and the query takes only 2 or 3 seconds to execute, even though
> the query plan is almost the same:

This makes it sound like you haven't been properly vacuuming and/or
analyzing the database. You might want to run a vacuum full on the
production db and see if that speeds things up. Once you have done
a vacuum full then regular vacuums should keep the number of dead tuples
down (as long as FSM is set high enough).

Re: Perfomance difference between 7.2 and 7.3

From
Paulo Jan
Date:
Tom Lane wrote:
> Paulo Jan <admin@digital.ddnet.es> writes:
>
>>         ->  Seq Scan on todocinetv  (cost=0.00..37768.90 rows=28792
>>width=8)
>
>
> The estimated cost seems to be more than one disk page read per row
> returned.  This suggests to me that you have a huge amount of dead space
> in that table --- try a VACUUM FULL on it.  If that fixes the problem,



    Argh!!! The thing is, I *had* run VACUUM several times before posting
to the list... but it was VACUUM ANALYZE, not VACUUM FULL. And here I
was, wondering why VACUUMing so much didn't have any effect...



                    Paulo Jan.
                    DDnet.



Re: simple question

From
"Rick Gigger"
Date:
Is this correct?

vacuum by itself just cleans out the old extraneous tuples so that they
aren't in the way anymore
vacuum analyze rebuilds indexes.  If you add an index to a table it won't be
used until you vacuum analyze it
vacuum full actually compresses the table on disk by reclaiming the space
from the old tuples after they have been removed.


----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Paulo Jan" <admin@digital.ddnet.es>
Cc: <pgsql-general@postgresql.org>
Sent: Wednesday, November 12, 2003 8:38 AM
Subject: Re: [GENERAL] Perfomance difference between 7.2 and 7.3


> Paulo Jan <admin@digital.ddnet.es> writes:
> >          ->  Seq Scan on todocinetv  (cost=0.00..37768.90 rows=28792
> > width=8)
>
> The estimated cost seems to be more than one disk page read per row
> returned.  This suggests to me that you have a huge amount of dead space
> in that table --- try a VACUUM FULL on it.  If that fixes the problem,
> then you need to improve your housekeeping procedures on the 7.2
> installation: run vacuums more often and ensure that your FSM settings
> are large enough.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


Re: simple question

From
Doug McNaught
Date:
"Rick Gigger" <rick@alpinenetworking.com> writes:

> Is this correct?
>
> vacuum by itself just cleans out the old extraneous tuples so that they
> aren't in the way anymore

Actually it puts the free space in each page on a list (the free space
map) so it can be reused for new tuples without having to allocate
fresh pages.  It finds free space by looking for tuples that can't be
seen any more by any transaction.

> vacuum analyze rebuilds indexes.  If you add an index to a table it won't be
> used until you vacuum analyze it

It doesn't rebuild indexes--REINDEX does that.  ANALYZE measures the
size and statistics of the data in the table, so the planner can do a
good job.

> vacuum full actually compresses the table on disk by reclaiming the space
> from the old tuples after they have been removed.

It moves tuples around and frees up pages at the end of the table,
thus compacting it.

So you're mostly wrong on all three.  :)

-Doug

Re: simple question

From
"scott.marlowe"
Date:
On Wed, 12 Nov 2003, Rick Gigger wrote:

> Is this correct?
>
> vacuum by itself just cleans out the old extraneous tuples so that they
> aren't in the way anymore
> vacuum analyze rebuilds indexes.  If you add an index to a table it won't be
> used until you vacuum analyze it
> vacuum full actually compresses the table on disk by reclaiming the space
> from the old tuples after they have been removed.

You don't have to analyze AFTER index creation, just at some point in
time.  I.e.:

create table test ...

import into table test 1000000 rows

analyze test;

create index test_field1_dx on test (id);

select * from test where id=4567;  <- this will likely use the index.


Re: simple question

From
"Rick Gigger"
Date:
> > Is this correct?
> >
> > vacuum by itself just cleans out the old extraneous tuples so that they
> > aren't in the way anymore
>
> Actually it puts the free space in each page on a list (the free space
> map) so it can be reused for new tuples without having to allocate
> fresh pages.  It finds free space by looking for tuples that can't be
> seen any more by any transaction.
>
> > vacuum analyze rebuilds indexes.  If you add an index to a table it
won't be
> > used until you vacuum analyze it
>
> It doesn't rebuild indexes--REINDEX does that.  ANALYZE measures the
> size and statistics of the data in the table, so the planner can do a
> good job.

Is REINDEX something that needs to be done on a periodic basis?

> > vacuum full actually compresses the table on disk by reclaiming the
space
> > from the old tuples after they have been removed.
>
> It moves tuples around and frees up pages at the end of the table,
> thus compacting it.
>
> So you're mostly wrong on all three.  :)
>
> -Doug
>

Thanks!

Rick


Re: simple question

From
Bruno Wolff III
Date:
On Thu, Nov 13, 2003 at 12:06:05 -0700,
  Rick Gigger <rick@alpinenetworking.com> wrote:
>
> Is REINDEX something that needs to be done on a periodic basis?

In version prior to 7.4 some patterns of use will require periodic
reindexing. The problem case is when the index column monoticly
increases (or decreases) and old values eventually get deleted.
In this case the index blocks for the deleted values don't get
reused and the size of the index will continually grow.