Thread: Perfomance difference between 7.2 and 7.3
Hi all: I have here a table with the following schema: Table "todocinetv" Column | Type | Modifiers -------------+-----------------------------+---------------------- id | integer | not null default '0' datestamp | timestamp without time zone | not null thread | integer | not null default '0' parent | integer | not null default '0' author | character(37) | not null default '' subject | character(255) | not null default '' email | character(200) | not null default '' attachment | character(64) | default '' host | character(50) | not null default '' email_reply | character(1) | not null default 'N' approved | character(1) | not null default 'N' msgid | character(100) | not null default '' modifystamp | integer | not null default '0' userid | integer | not null default '0' Indexes: todocinetv_approved, todocinetv_author, todocinetv_datestamp, todocinetv_modifystamp, todocinetv_msgid, todocinetv_parent, todocinetv_subject, todocinetv_thread, todocinetv_userid, todocinetvpri_key (It's actually a table created by the discussion board application Phorum (version 3.3)). This table has about 28000 rows, and is running with Postgres 7.2.3 under Red Hat 8.0, in a 2.4 Ghz. Pentiun 4 with 512 Mb. of RAM. The problem I'm having is that, when you access the main page of the discussion board, it takes forever to show you the list of posts. The query that Phorum uses for doing so is: phorum=# explain phorum-# SELECT thread, modifystamp, count(id) AS tcount, datetime(modifystamp) AS latest, max(id) as maxid FROM todocinetv WHERE approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc, thread desc limit 30; NOTICE: QUERY PLAN: Limit (cost=40354.79..40354.79 rows=30 width=12) -> Sort (cost=40354.79..40354.79 rows=2879 width=12) -> Aggregate (cost=39901.43..40189.35 rows=2879 width=12) -> Group (cost=39901.43..40045.39 rows=28792 width=12) -> Sort (cost=39901.43..39901.43 rows=28792 width=12) -> Seq Scan on todocinetv (cost=0.00..37768.90 rows=28792 width=12) This query takes up to 3 minutes to execute. I have tried to strip it down and leaving it in its most vanilla form (without "count(id)" and such), and it's still almost as slow: phorum=# explain phorum-# SELECT thread, modifystamp, datetime(modifystamp) AS latest from todocinetv WHERE approved='Y' ORDER BY modifystamp desc, thread desc limit 30; NOTICE: QUERY PLAN: Limit (cost=39901.43..39901.43 rows=30 width=8) -> Sort (cost=39901.43..39901.43 rows=28792 width=8) -> Seq Scan on todocinetv (cost=0.00..37768.90 rows=28792 width=8) But here is the weird thing: I dump the table, export it into another machine running Postgres 7.3.2 (Celeron 1.7 Ghz, 512 Mb. of memory), and the query takes only 2 or 3 seconds to execute, even though the query plan is almost the same: provphorum=# explain provphorum-# SELECT thread, modifystamp, count(id) AS tcount, modifystamp AS latest, max(id) as maxid FROM todocinetv WHERE approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc, thread desc limit 30 ; QUERY PLAN ------------------------------------------------------------------------------------------------ Limit (cost=5765.92..5765.99 rows=30 width=12) -> Sort (cost=5765.92..5772.96 rows=2817 width=12) Sort Key: modifystamp, thread -> Aggregate (cost=5252.34..5604.49 rows=2817 width=12) -> Group (cost=5252.34..5463.63 rows=28172 width=12) -> Sort (cost=5252.34..5322.77 rows=28172 width=12) Sort Key: thread, modifystamp -> Seq Scan on todocinetv (cost=0.00..3170.15 rows=28172 width=12) Filter: (approved = 'Y'::bpchar) (9 rows) (I took out the "datetime" function, since 7.3 didn't accept it and I didn't think it was relevant to the performance problem (am I wrong?)) So my question is: what causes such a big difference? (3 min. vs. 3 seconds) Does the version difference (7.2 vs. 7.3) account for all of it? Or should I start looking at other factors? As I said, both machines are almost equivalent hardware-wise, and as for the number of shared buffers, the faster machine actually has less of them (the 7.3 machine has "shared_buffers = 768", while the 7.2 one has "shared_buffers = 1024"). Paulo Jan. DDnet.
Paulo Jan <admin@digital.ddnet.es> writes: > -> Seq Scan on todocinetv (cost=0.00..37768.90 rows=28792 > width=8) The estimated cost seems to be more than one disk page read per row returned. This suggests to me that you have a huge amount of dead space in that table --- try a VACUUM FULL on it. If that fixes the problem, then you need to improve your housekeeping procedures on the 7.2 installation: run vacuums more often and ensure that your FSM settings are large enough. regards, tom lane
On Wed, Nov 12, 2003 at 16:30:41 +0100, Paulo Jan <admin@digital.ddnet.es> wrote: > This table has about 28000 rows, and is running with Postgres 7.2.3 > under Red Hat 8.0, in a 2.4 Ghz. Pentiun 4 with 512 Mb. of RAM. You probably want to use 7.4 for this since a new way to do aggragates using hashes has been added. 7.4 is currently in release candidate status and maybe be released as early as next Monday. > > This query takes up to 3 minutes to execute. I have tried to strip > it down and leaving it in its most vanilla form (without "count(id)" and > But here is the weird thing: I dump the table, export it into > another machine running Postgres 7.3.2 (Celeron 1.7 Ghz, 512 Mb. of > memory), and the query takes only 2 or 3 seconds to execute, even though > the query plan is almost the same: This makes it sound like you haven't been properly vacuuming and/or analyzing the database. You might want to run a vacuum full on the production db and see if that speeds things up. Once you have done a vacuum full then regular vacuums should keep the number of dead tuples down (as long as FSM is set high enough).
Tom Lane wrote: > Paulo Jan <admin@digital.ddnet.es> writes: > >> -> Seq Scan on todocinetv (cost=0.00..37768.90 rows=28792 >>width=8) > > > The estimated cost seems to be more than one disk page read per row > returned. This suggests to me that you have a huge amount of dead space > in that table --- try a VACUUM FULL on it. If that fixes the problem, Argh!!! The thing is, I *had* run VACUUM several times before posting to the list... but it was VACUUM ANALYZE, not VACUUM FULL. And here I was, wondering why VACUUMing so much didn't have any effect... Paulo Jan. DDnet.
Is this correct? vacuum by itself just cleans out the old extraneous tuples so that they aren't in the way anymore vacuum analyze rebuilds indexes. If you add an index to a table it won't be used until you vacuum analyze it vacuum full actually compresses the table on disk by reclaiming the space from the old tuples after they have been removed. ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Paulo Jan" <admin@digital.ddnet.es> Cc: <pgsql-general@postgresql.org> Sent: Wednesday, November 12, 2003 8:38 AM Subject: Re: [GENERAL] Perfomance difference between 7.2 and 7.3 > Paulo Jan <admin@digital.ddnet.es> writes: > > -> Seq Scan on todocinetv (cost=0.00..37768.90 rows=28792 > > width=8) > > The estimated cost seems to be more than one disk page read per row > returned. This suggests to me that you have a huge amount of dead space > in that table --- try a VACUUM FULL on it. If that fixes the problem, > then you need to improve your housekeeping procedures on the 7.2 > installation: run vacuums more often and ensure that your FSM settings > are large enough. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
"Rick Gigger" <rick@alpinenetworking.com> writes: > Is this correct? > > vacuum by itself just cleans out the old extraneous tuples so that they > aren't in the way anymore Actually it puts the free space in each page on a list (the free space map) so it can be reused for new tuples without having to allocate fresh pages. It finds free space by looking for tuples that can't be seen any more by any transaction. > vacuum analyze rebuilds indexes. If you add an index to a table it won't be > used until you vacuum analyze it It doesn't rebuild indexes--REINDEX does that. ANALYZE measures the size and statistics of the data in the table, so the planner can do a good job. > vacuum full actually compresses the table on disk by reclaiming the space > from the old tuples after they have been removed. It moves tuples around and frees up pages at the end of the table, thus compacting it. So you're mostly wrong on all three. :) -Doug
On Wed, 12 Nov 2003, Rick Gigger wrote: > Is this correct? > > vacuum by itself just cleans out the old extraneous tuples so that they > aren't in the way anymore > vacuum analyze rebuilds indexes. If you add an index to a table it won't be > used until you vacuum analyze it > vacuum full actually compresses the table on disk by reclaiming the space > from the old tuples after they have been removed. You don't have to analyze AFTER index creation, just at some point in time. I.e.: create table test ... import into table test 1000000 rows analyze test; create index test_field1_dx on test (id); select * from test where id=4567; <- this will likely use the index.
> > Is this correct? > > > > vacuum by itself just cleans out the old extraneous tuples so that they > > aren't in the way anymore > > Actually it puts the free space in each page on a list (the free space > map) so it can be reused for new tuples without having to allocate > fresh pages. It finds free space by looking for tuples that can't be > seen any more by any transaction. > > > vacuum analyze rebuilds indexes. If you add an index to a table it won't be > > used until you vacuum analyze it > > It doesn't rebuild indexes--REINDEX does that. ANALYZE measures the > size and statistics of the data in the table, so the planner can do a > good job. Is REINDEX something that needs to be done on a periodic basis? > > vacuum full actually compresses the table on disk by reclaiming the space > > from the old tuples after they have been removed. > > It moves tuples around and frees up pages at the end of the table, > thus compacting it. > > So you're mostly wrong on all three. :) > > -Doug > Thanks! Rick
On Thu, Nov 13, 2003 at 12:06:05 -0700, Rick Gigger <rick@alpinenetworking.com> wrote: > > Is REINDEX something that needs to be done on a periodic basis? In version prior to 7.4 some patterns of use will require periodic reindexing. The problem case is when the index column monoticly increases (or decreases) and old values eventually get deleted. In this case the index blocks for the deleted values don't get reused and the size of the index will continually grow.