Thread: slow query
Hi, I am having problems with my master database now. It used to work extremely good just two days ago, but then I started playing with adding/dropping schemas and added another database and performance went down. I also have a replica database on the same server and when I run the same query on it, it runs good. Interestingly the planner statistics for this query are the same on the master and replica databases. However, query execution on the master database is about 4min. and on the replica database is 6 sec.! I VACUUM ANALYZED both databases and made sure they have same indexes on the tables. I don't know where else to look, but here are the schemas I have on the master and replica database. The temp schemas must be the ones that I created and then dropped. master=# select * from pg_namespace; nspname | nspowner | nspacl ------------+----------+-------- pg_catalog | 1 | {=U} pg_toast | 1 | {=} public | 1 | {=UC} pg_temp_1 | 1 | pg_temp_3 | 1 | pg_temp_10 | 1 | pg_temp_28 | 1 | replica=> select * from pg_namespace; nspname | nspowner | nspacl ------------+----------+-------- pg_catalog | 1 | {=U} pg_toast | 1 | {=} public | 1 | {=UC} pg_temp_1 | 1 | pg_temp_39 | 1 | india | 105 | Here is the query: SELECT * FROM media m, speccharacter c WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE medianame='Audio') AND m.mediachar = c.objectid AND (m.activity='178746' OR (EXISTS (SELECT ism.objectid FROM intsetmedia ism, set s WHERE ism.set = s.objectid AND ism.media = m.objectid AND s.activity='178746' ) ) OR (EXISTS (SELECT dtrm.objectid FROM dtrowmedia dtrm, dtrow dtr, dtcol dtc, datatable dt WHERE dtrm.dtrow = dtr.objectid AND dtrm.media = m.objectid AND dtr.dtcol = dtc.objectid AND dtc.datatable = dt.objectid AND dt.activity = '178746') ) ) ORDER BY medianame ASC, status DESC; ************************************* This email may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Any review, copying, printing, disclosure or other use is prohibited. We reserve the right to monitor email sent through our network. *************************************
Oleg Lebedev <oleg.lebedev@waterford.org> writes: > Interestingly the planner statistics for this query are the same on the > master and replica databases. > However, query execution on the master database is about 4min. and on > the replica database is 6 sec.! What does EXPLAIN ANALYZE show in each case? regards, tom lane
Oleg, > I VACUUM ANALYZED both databases and made sure they have same indexes on > the tables. Have you VACUUM FULL the main database? And how about REINDEX? > Here is the query: > SELECT * FROM media m, speccharacter c > WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE > medianame='Audio') The above should use an EXISTS clause, not IN, unless you are absolutely sure that the subquery will never return more than 12 rows. -- Josh Berkus Aglio Database Solutions San Francisco
On Sun, 2003-02-23 at 13:52, Josh Berkus wrote: > Oleg, > > > I VACUUM ANALYZED both databases and made sure they have same indexes on > > the tables. > > Have you VACUUM FULL the main database? And how about REINDEX? > > > Here is the query: > > SELECT * FROM media m, speccharacter c > > WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE > > medianame='Audio') > > The above should use an EXISTS clause, not IN, unless you are absolutely sure > that the subquery will never return more than 12 rows. I am assuming you said this because EXISTS is faster for > 12 rows? Interesting :) thanks, - Ryan -- Ryan Bradetich <rbradetich@uswest.net>
Ryan, > > The above should use an EXISTS clause, not IN, unless you are absolutely > > sure that the subquery will never return more than 12 rows. > > I am assuming you said this because EXISTS is faster for > 12 rows? > Interesting :) That's my rule of thumb, *NOT* any kind of relational-calculus-based truth. Basically, one should only use IN for a subquery when one is *absolutely* sure that the subquery will only return a handful of records, *and* the subquery doesn't have to do an complex work like aggregating or custom function evaluation. You're safer using EXISTS for all subqueries, really. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> I am assuming you said this because EXISTS is faster for > 12 rows? > That's my rule of thumb, *NOT* any kind of relational-calculus-based truth. Keep in mind also that the tradeoffs will change quite a lot when PG 7.4 hits the streets, because the optimizer has gotten a lot smarter about how to handle IN, but no smarter about EXISTS. Here's one rather silly example using CVS tip: regression=# explain analyze select * from tenk1 a where regression-# unique1 in (select hundred from tenk1 b); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=486.32..504.11 rows=100 width=248) (actual time=453.19..468.86 rows=100 loops=1) Merge Cond: ("outer".unique1 = "inner".hundred) -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..1571.87 rows=10000 width=244) (actual time=0.12..5.25 rows=101loops=1) -> Sort (cost=486.32..486.57 rows=100 width=4) (actual time=452.91..453.83 rows=100 loops=1) Sort Key: b.hundred -> HashAggregate (cost=483.00..483.00 rows=100 width=4) (actual time=447.59..449.80 rows=100 loops=1) -> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=4) (actual time=0.06..276.47 rows=10000 loops=1) Total runtime: 472.06 msec (8 rows) regression=# explain analyze select * from tenk1 a where regression-# exists (select 1 from tenk1 b where b.hundred = a.unique1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on tenk1 a (cost=0.00..35889.66 rows=5000 width=244) (actual time=3.69..1591.78 rows=100 loops=1) Filter: (subplan) SubPlan -> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..354.32 rows=100 width=0) (actual time=0.10..0.10 rows=0 loops=10000) Index Cond: (hundred = $0) Total runtime: 1593.88 msec (6 rows) The EXISTS case takes about the same time in 7.3, but the IN case is off the charts (I got bored of waiting after 25 minutes...) regards, tom lane
Thanks everybody for your help. VACUUM FULL did the job, and now the query performance is the same in both databases. I am surprised that FULL option makes such a dramatic change to the query performance: from 4min. to 5sec.!!! It also changed planner stats from ~9 sec to ~8sec. I haven't tried to REINDEX yet, though. Regarding IN vs. EXISTS. The sub-query in the IN clause will always return fewer records than 12. I tried using EXISTS instead of IN with Postgres7.2.1 and it slowed down query performance. With postgres 7.3, when I use EXISTS instead of IN the planner returns the same stats and query performance does not improve. However, if I use m.mediatype=(SELECT objectid FROM mediatype WHERE medianame='Audio') the planner returns ~7 sec., which is the same as if I the query is changed like this: SELECT * FROM media m, speccharacter c, mediatype mt WHERE mt.objectid=m.mediatype and mt.medianame='Audio' ... So, using JOIN and =(SELECT ...) is better than using IN and EXISTS in this case. Thanks. Oleg -----Original Message----- From: Josh Berkus [mailto:josh@agliodbs.com] Sent: Sunday, February 23, 2003 1:53 PM To: Oleg Lebedev; pgsql-performance@postgresql.org Subject: Re: [PERFORM] slow query Importance: Low Oleg, > I VACUUM ANALYZED both databases and made sure they have same indexes > on the tables. Have you VACUUM FULL the main database? And how about REINDEX? > Here is the query: > SELECT * FROM media m, speccharacter c > WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE > medianame='Audio') The above should use an EXISTS clause, not IN, unless you are absolutely sure that the subquery will never return more than 12 rows. -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster ************************************* This email may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Any review, copying, printing, disclosure or other use is prohibited. We reserve the right to monitor email sent through our network. *************************************
On Mon, 2003-02-24 at 10:59, Oleg Lebedev wrote: > Thanks everybody for your help. > VACUUM FULL did the job, and now the query performance is the same in > both databases. I am surprised that FULL option makes such a dramatic > change to the query performance: from 4min. to 5sec.!!! It also changed > planner stats from ~9 sec to ~8sec. If your seeing wildly dramatic improvments from vacuum full, you might want to look into running regular vacuums more often (especially for high turnover tables), increase your max_fsm_relations to 1000, and increasing your max_fsm_pages. Robert Treat
I run VACUUM (not FULL though) every night, which I thought would be enough for good query performance. Moreover, the data in table does not really change significantly since database usage is pretty low right now, therefore I thought that VACUUM FULL was an overkill. I think that creating, populating and dropping schemas in the master database could have affected the query performance and required VACUUM FULL. I will definitely look at max_fsm_relations and max_fsm_pages parameter settings. Thank you. Oleg Lebedev -----Original Message----- From: Robert Treat [mailto:xzilla@users.sourceforge.net] Sent: Monday, February 24, 2003 9:59 AM To: Oleg Lebedev Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] slow query On Mon, 2003-02-24 at 10:59, Oleg Lebedev wrote: > Thanks everybody for your help. > VACUUM FULL did the job, and now the query performance is the same in > both databases. I am surprised that FULL option makes such a dramatic > change to the query performance: from 4min. to 5sec.!!! It also > changed planner stats from ~9 sec to ~8sec. If your seeing wildly dramatic improvments from vacuum full, you might want to look into running regular vacuums more often (especially for high turnover tables), increase your max_fsm_relations to 1000, and increasing your max_fsm_pages. Robert Treat ************************************* This email may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Any review, copying, printing, disclosure or other use is prohibited. We reserve the right to monitor email sent through our network. *************************************
On 24 Feb 2003, Robert Treat wrote: > > If your seeing wildly dramatic improvments from vacuum full, you might > want to look into running regular vacuums more often (especially for > high turnover tables), increase your max_fsm_relations to 1000, and > increasing your max_fsm_pages. I don't know about the settings you mention, but a frequent vacuum does not at all obviate a vacuum full. My database is vacuumed every night, but a while ago I found that a vacuum full changed a simple single-table query from well over 30 seconds to one or two. We now do a vacuum full every night.
What about REINDEXing and other DB tricks? Do you use any of these regularly? Thanks. Oleg -----Original Message----- From: Clarence Gardner [mailto:clarence@silcom.com] Sent: Monday, February 24, 2003 10:28 AM To: Robert Treat Cc: Oleg Lebedev; Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] slow query On 24 Feb 2003, Robert Treat wrote: > > If your seeing wildly dramatic improvments from vacuum full, you might > want to look into running regular vacuums more often (especially for > high turnover tables), increase your max_fsm_relations to 1000, and > increasing your max_fsm_pages. I don't know about the settings you mention, but a frequent vacuum does not at all obviate a vacuum full. My database is vacuumed every night, but a while ago I found that a vacuum full changed a simple single-table query from well over 30 seconds to one or two. We now do a vacuum full every night. ************************************* This email may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Any review, copying, printing, disclosure or other use is prohibited. We reserve the right to monitor email sent through our network. *************************************
On Mon, 2003-02-24 at 12:27, Clarence Gardner wrote: > On 24 Feb 2003, Robert Treat wrote: > > > > > If your seeing wildly dramatic improvments from vacuum full, you might > > want to look into running regular vacuums more often (especially for > > high turnover tables), increase your max_fsm_relations to 1000, and > > increasing your max_fsm_pages. > > I don't know about the settings you mention, but a frequent vacuum > does not at all obviate a vacuum full. My database is vacuumed every > night, but a while ago I found that a vacuum full changed a simple > single-table query from well over 30 seconds to one or two. We now > do a vacuum full every night. > Actually if you are vacuuming frequently enough, it can (and should*) obviate a vacuum full. Be aware that frequently enough might mean really frequent, for instance I have several tables in my database that update every row within a 15 minute timeframe, so I run a "lazy" vacuum on these tables every 10 minutes. This allows postgresql to reuse the space for these tables almost continuously so I never have to vacuum full them. (* this assumes your max_fsm_pages/relations settings are set correctly) Robert Treat
Robert, > Actually if you are vacuuming frequently enough, it can (and should*) > obviate a vacuum full. Be aware that frequently enough might mean really > frequent, for instance I have several tables in my database that update > every row within a 15 minute timeframe, so I run a "lazy" vacuum on > these tables every 10 minutes. This allows postgresql to reuse the space > for these tables almost continuously so I never have to vacuum full > them. This would assume absolutely perfect FSM settings, and that the DB never gets thrown off by unexpected loads. I have never been so fortunate as to work with such a database. However, I agree that good FSM tuning and frequent regular VACUUMs can greatly extend the period required for running FULL. I have not found, though, that this does anything to prevent the need for REINDEX on frequently-updated tables. How about you, Robert? -- Josh Berkus Aglio Database Solutions San Francisco
On Mon, 2003-02-24 at 12:27, Clarence Gardner wrote: > On 24 Feb 2003, Robert Treat wrote: > > > > > If your seeing wildly dramatic improvments from vacuum full, you might > > want to look into running regular vacuums more often (especially for > > high turnover tables), increase your max_fsm_relations to 1000, and > > increasing your max_fsm_pages. > > I don't know about the settings you mention, but a frequent vacuum > does not at all obviate a vacuum full. My database is vacuumed every > night, but a while ago I found that a vacuum full changed a simple > single-table query from well over 30 seconds to one or two. We now > do a vacuum full every night. Sure is does. If your free-space-map (FSM) is up to date, tuples are not appended to the end of the table, so table growth will not occur (beyond a 'settling' point. Unless you remove more data from the table than you ever expect to have in the table again, this is enough. That said, the instant the FSM is empty, you're table starts to grow again and a VACUUM FULL will be required to re-shrink it. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Attachment
On Mon, Feb 24, 2003 at 09:27:56AM -0800, Clarence Gardner wrote: > > I don't know about the settings you mention, but a frequent vacuum > does not at all obviate a vacuum full. My database is vacuumed every > night, but a while ago I found that a vacuum full changed a simple > single-table query from well over 30 seconds to one or two. We now > do a vacuum full every night. This probably means that either some of your FSM settings should be different, or that you have long-running queries, or both. _Some_ advantage to vacuum full is expected, but 30 seconds to one or two is a pretty big deal. Except in cases where a large percentage of the table is a vacuum candidate, the standard vacuum should be more than adequate. But there are a couple of gotchas. You need to have room in your free space map to hold information about the bulk of the to-be-freed tables. So perhaps your FSM settings are not big enough, even though you tried to set them higher. (Of course, if you're replacing, say, more than half the table, setting the FSM high enough isn't practical.) Another possibility is that you have multiple long-running transactions that are keeping non-blocking vacuum from being very effective. Since those transactions are possibly referencing old versions of a row, when the non-blocking vacuum comes around, it just skips the "dead" tuples which are nevertheless alive to someone. (You can see the effect of this by using the contrib/pgstattuple function). Blocking vacuum doesn't have this problem, because it just waits on the table until everything ahead of it has committed or rolled back. So you pay in wait time for all transactions during the vacuum. (I have encountered this very problem on a table which gets a lot of update activity on just on just one row. Even vacuuming every minute, the table grew and grew, because of another misbehaving application which was keeping a transaction open when it shouldn't have.) A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Mon, 2003-02-24 at 13:45, Josh Berkus wrote: > Robert, > > > Actually if you are vacuuming frequently enough, it can (and should*) > > obviate a vacuum full. Be aware that frequently enough might mean really > > frequent, for instance I have several tables in my database that update > > every row within a 15 minute timeframe, so I run a "lazy" vacuum on > > these tables every 10 minutes. This allows postgresql to reuse the space > > for these tables almost continuously so I never have to vacuum full > > them. > > This would assume absolutely perfect FSM settings, and that the DB never gets > thrown off by unexpected loads. I have never been so fortunate as to work > with such a database. However, I agree that good FSM tuning and frequent > regular VACUUMs can greatly extend the period required for running FULL. > It's somewhat relative. On one of my tables, it has about 600 rows, each row gets updated within 15 minutes. I vacuum it every 10 minutes, which should leave me with around 1000 tuples (dead and alive) for that table, Even if something overload the updates on that table, chances are that I wouldn't see enough of a performance drop to warrant a vacuum full. Of course, its a small table, so YMMV. I think the point is though that if your running nightly vacuum fulls just to stay ahead of the game, your not maintaining the database optimally. > I have not found, though, that this does anything to prevent the need for > REINDEX on frequently-updated tables. How about you, Robert? > Well, this touches on a different topic. On those tables where I get "index bloat", I do have to do REINDEX's. But that's not currently solvable with vacuum (remember indexes dont even use FSM) though IIRC Tom & Co. have done some work toward this for 7.4 Robert Treat
Josh Berkus <josh@agliodbs.com> writes: > ... However, I agree that good FSM tuning and frequent > regular VACUUMs can greatly extend the period required for running FULL. > I have not found, though, that this does anything to prevent the need for > REINDEX on frequently-updated tables. How about you, Robert? As of 7.3, FSM doesn't have anything to do with indexes. If you have index bloat, it's because of the inherent inability of btree indexes to reuse space when the data distribution changes over time. (Portions of the btree may become empty, but they aren't recycled.) You'll particularly get burnt by indexes that are on OIDs or sequentially assigned ID numbers, since the set of IDs in use just naturally tends to migrate higher over time. I don't think that the update rate per se has much to do with this, it's the insertion of new IDs and deletion of old ones that causes the statistical shift. The tree grows at the right edge, but doesn't shrink at the left. As of CVS tip, however, the situation is different ;-). Btree indexes will recycle space using FSM in 7.4. regards, tom lane
On Mon, 2003-02-24 at 15:50, Tom Lane wrote: > You'll > particularly get burnt by indexes that are on OIDs or sequentially > assigned ID numbers, since the set of IDs in use just naturally tends to > migrate higher over time. I don't think that the update rate per se has > much to do with this, it's the insertion of new IDs and deletion of old > ones that causes the statistical shift. Would it be safe to say that tables with high update rates where the updates do not change the indexed value would not suffer from index bloat? For example updates to non-index columns or updates that overwrite, but don't change the value of indexed columns; do these even need to touch the index? Robert Treat
Robert Treat <xzilla@users.sourceforge.net> writes: > Would it be safe to say that tables with high update rates where the > updates do not change the indexed value would not suffer from index > bloat? I would expect not. If you vacuum often enough to keep the main table size under control, the index should stay under control too. > For example updates to non-index columns or updates that > overwrite, but don't change the value of indexed columns; do these even > need to touch the index? Yes, they do. Think MVCC. regards, tom lane
On Tue, 25 Feb 2003, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > > Would it be safe to say that tables with high update rates where the > > updates do not change the indexed value would not suffer from index > > bloat? > > I would expect not. If you vacuum often enough to keep the main table > size under control, the index should stay under control too. Yes and No, From what I can work out the index is a tree and if many updates occus the tree can become unbalanced (eventually it get so unbalanced its no better than a seq scan.... Its not big its just all the data is all on one side of the tree. Which is why Reindexing is a good plan. What is really needed is a quicker way of rebalancing the tree. So the database notices when the index is unbalanced and picks a new root node and hangs the old root to that. (Makes for a very intresting algorithim if I remeber my University lectures....) Now I'm trying to sort out a very large static table that I've just finished updating. I am beginning to think that the quickest way of sorting it out is to dump and reload it. But I'm trying a do it in place method. (Of Reindex it, vaccum full analyse) but what is the correct order to do this in? Reindex, Vaccum or Vaccum, Reindex. Peter Childs > > > For example updates to non-index columns or updates that > > overwrite, but don't change the value of indexed columns; do these even > > need to touch the index? > > Yes, they do. Think MVCC. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >