Thread: How to influence the planner
I have a table, t1, with about 12 million rows The column foo is unique and indexed The column bar is not unique and is indexed "select * from t1 where length(bar) = 0;" runs about 2 minutes "select * from t1 where length(bar) = 0 order by foo ;" ran until I stopped it after about 20 minutes My simple solution is "select * into t2 from t1 where length(bar) = 0;" and "select * from t2 order by foo ;" Is there a way to make "select * from t1 where length(bar) = 0 order by foo ;" or something similar work Thanks Richard Ray
On Aug 31, 2007, at 13:32 , Richard Ray wrote: > "select * from t1 where length(bar) = 0;" runs about 2 minutes > "select * from t1 where length(bar) = 0 order by foo ;" ran until I > stopped it after about 20 minutes EXPLAIN ANALYZE will help you see what the planner is doing to produce the results. Have you recently ANALYZEd t1? If length(bar) = 0 is a common operation on this table, you might consider using an expression index on t1: create index t1_length_bar_idx on t1 (length(bar)); You might want to ask on the performance list as well, as this is right up their alley. Hope this gets you started on the right track. Michael Glaesemann grzm seespotcode net
On Fri, 31 Aug 2007, Michael Glaesemann wrote: > > On Aug 31, 2007, at 13:32 , Richard Ray wrote: > >> "select * from t1 where length(bar) = 0;" runs about 2 minutes >> "select * from t1 where length(bar) = 0 order by foo ;" ran until I >> stopped it after about 20 minutes > > > EXPLAIN ANALYZE will help you see what the planner is doing to produce the > results. mda=# EXPLAIN ANALYZE select * from t1 where length(bar) = 0 order by foo; QUERY PLAN --------------------------------------------------------------- Index Scan using t1_pkey on t1 (cost=0.00..46698478.18 rows=60038 width=334) (actual time=4612.740..2349582.520 rows=32705 loops=1) Filter: (length(bar) = 0) Total runtime: 2349614.258 ms (3 rows) mda=# EXPLAIN ANALYZE select * from t1 where length(bar) = 0; QUERY PLAN --------------------------------------------------------------- Seq Scan on t1 (cost=100000000.00..102020349.17 rows=60038 width=334) (actual time=39.065..108645.233 rows=32705 loops=1) Filter: (length(bar) = 0) Total runtime: 108677.759 ms (3 rows) mda=# The index for foo on t1 is the primary index t1_pkey Why is it slower using the index > Have you recently ANALYZEd t1? I run vacuum analyze nightly > If length(bar) = 0 is a common > operation on this table, you might consider using an expression index on t1: > > create index t1_length_bar_idx on t1 (length(bar)); This is a one time procedure to fix some data but I've had this problem before I'm running PostgreSQL 8.1.0 on Fedora Core 6 > > You might want to ask on the performance list as well, as this is right up > their alley. > > Hope this gets you started on the right track. > > Michael Glaesemann > grzm seespotcode net > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
On Aug 31, 2007, at 16:07 , Richard Ray wrote: > Total runtime: 2349614.258 ms > (3 rows) Wow. Nearly 40 minutes! What are your work_mem set at? You may want to increase work_mem, as it might help with the sort. > The index for foo on t1 is the primary index t1_pkey > Why is it slower using the index Using an index requires first scanning the index and then looking up the value in the table, so depending on the number of rows that need to be returned, using an index might have more overhead than just reading every row of the table (i.e., a sequential scan). >> Have you recently ANALYZEd t1? > > I run vacuum analyze nightly That might not be often enough. Looking at the number of rows in the cost estimate (60K rows) and the actual number of rows (~30K rows), it looks like there's a factor of two difference. >> If length(bar) = 0 is a common operation on this table, you might >> consider using an expression index on t1: >> >> create index t1_length_bar_idx on t1 (length(bar)); > > This is a one time procedure to fix some data but I've had this > problem before Depending on the time it takes to build the index, it might prove worthwhile even for a one-off query. You're pretty much doing this by using a temporary table though. > I'm running PostgreSQL 8.1.0 on Fedora Core 6 You should upgrade 8.1.9, the latest in the 8.1.x series. This may not help your performance issues, but there have been 9 point releases since the version you're running which include bug and security fixes. Even better, upgrade to 8.2.4, as there may very well be performance improvements in 8.2 which help you. You could look through the 8.2 release notes to see if any might apply. Hope this helps. Michael Glaesemann grzm seespotcode net
Richard Ray <rray@mstc.state.ms.us> writes: > On Fri, 31 Aug 2007, Michael Glaesemann wrote: >> EXPLAIN ANALYZE will help you see what the planner is doing to produce the >> results. > mda=# EXPLAIN ANALYZE select * from t1 where length(bar) = 0 order by foo; > QUERY PLAN > --------------------------------------------------------------- > Index Scan using t1_pkey on t1 (cost=0.00..46698478.18 > rows=60038 width=334) (actual time=4612.740..2349582.520 rows=32705 > loops=1) > Filter: (length(bar) = 0) > Total runtime: 2349614.258 ms > (3 rows) > mda=# EXPLAIN ANALYZE select * from t1 where length(bar) = 0; > QUERY PLAN > --------------------------------------------------------------- > Seq Scan on t1 (cost=100000000.00..102020349.17 rows=60038 > width=334) (actual time=39.065..108645.233 rows=32705 loops=1) > Filter: (length(bar) = 0) > Total runtime: 108677.759 ms > (3 rows) The problem here is you've got enable_seqscan = off. Don't do that. That will make it use an index if it possibly can, whether using one is a good idea or not. In this case, since the useful condition on length(bar) is not indexable, the best available index-using scan uses the index to implement order by foo ... which is pointless here in terms of saving runtime. > I'm running PostgreSQL 8.1.0 on Fedora Core 6 Please update. There are a *lot* of bugs fixed in the 8.1.x series since then. regards, tom lane
Michael Glaesemann <grzm@seespotcode.net> writes: > On Aug 31, 2007, at 16:07 , Richard Ray wrote: >>> If length(bar) = 0 is a common operation on this table, you might >>> consider using an expression index on t1: >> >>> create index t1_length_bar_idx on t1 (length(bar)); >> >> This is a one time procedure to fix some data but I've had this >> problem before Actually, I just noticed that the OP does have an index on bar, which means (assuming it's a string data type) that this query is equivalent toselect * from t1 where bar = '' which would be a far preferable way to do it because that condition can use the index. The Postgres planner is fairly data-type-agnostic and does not have the knowledge that these are equivalent queries, so you can't expect it to make that substitution for you. regards, tom lane
On Fri, 31 Aug 2007, Tom Lane wrote: > Richard Ray <rray@mstc.state.ms.us> writes: >> On Fri, 31 Aug 2007, Michael Glaesemann wrote: >>> EXPLAIN ANALYZE will help you see what the planner is doing to produce the >>> results. > >> mda=# EXPLAIN ANALYZE select * from t1 where length(bar) = 0 order by foo; >> QUERY PLAN >> --------------------------------------------------------------- >> Index Scan using t1_pkey on t1 (cost=0.00..46698478.18 >> rows=60038 width=334) (actual time=4612.740..2349582.520 rows=32705 >> loops=1) >> Filter: (length(bar) = 0) >> Total runtime: 2349614.258 ms >> (3 rows) > >> mda=# EXPLAIN ANALYZE select * from t1 where length(bar) = 0; >> QUERY PLAN >> --------------------------------------------------------------- >> Seq Scan on t1 (cost=100000000.00..102020349.17 rows=60038 >> width=334) (actual time=39.065..108645.233 rows=32705 loops=1) >> Filter: (length(bar) = 0) >> Total runtime: 108677.759 ms >> (3 rows) > > The problem here is you've got enable_seqscan = off. Don't do that. > That will make it use an index if it possibly can, whether using one > is a good idea or not. In this case, since the useful condition on > length(bar) is not indexable, the best available index-using scan > uses the index to implement order by foo ... which is pointless here > in terms of saving runtime. > >> I'm running PostgreSQL 8.1.0 on Fedora Core 6 > > Please update. There are a *lot* of bugs fixed in the 8.1.x series > since then. Changing to enable_seqscan = on does solve this problem, thanks Is there some method of crafting a query that will assert my wishes to the planner mda=# EXPLAIN ANALYZE select * from (select * from t1 where length(bar) = 0) a order by foo; QUERY PLAN ------------------------------------------------------------------------ Index Scan using t1_pkey on t1 (cost=0.00..46698482.18 rows=60038 width=334) (actual time=4784.869..2317363.298 rows=32705 loops=1) Filter: (length(bar) = 0) Total runtime: 2317395.137 ms (3 rows) When is enable_seqscan = off appropriate > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate >
Richard Ray wrote: > > When is enable_seqscan = off appropriate Never(*) It's not for normal usage, the various enable_xxx settings do let you experiment with different options if you think the planner is making a mistake. (*) OK, sooner or later, if you have enough systems and enough performance-critical queries then you might want to use it in production. But only if you really know what you're doing and you've exhausted you're other options. -- Richard Huxton Archonet Ltd
On 8/31/07, Richard Ray <rray@mstc.state.ms.us> wrote: > Changing to enable_seqscan = on does solve this problem, thanks > Is there some method of crafting a query that will assert my wishes to the planner > When is enable_seqscan = off appropriate enable_xxx = off are troubleshooting tools. They override the query planner. For instance, I had a query that was running slow, and using set enable_nestloop=off allowed the query to run fast. However, using explain analyze I could see that the estimated number of rows was off.Analyze didn't fix it, so I increased the stats targetfor the column I was working with, reanalyzed, and voila, the query ran fine with nestloop=on. So, enable_xxx=off is normally only appropriate when troubleshooting an issue, not as a fix all. That's doubly true for enable_seqscan=off. If you do have a query that nothing else seems to work on it, you can set one of the enable_xxx settings off for that connection only and not worry about messing up all the other sessions connecting to your db.
Thanks guys Lesson learned On Tue, 4 Sep 2007, Scott Marlowe wrote: > On 8/31/07, Richard Ray <rray@mstc.state.ms.us> wrote: > >> Changing to enable_seqscan = on does solve this problem, thanks >> Is there some method of crafting a query that will assert my wishes to the planner >> When is enable_seqscan = off appropriate > > enable_xxx = off are troubleshooting tools. They override the query planner. > > For instance, I had a query that was running slow, and using set > enable_nestloop=off allowed the query to run fast. However, using > explain analyze I could see that the estimated number of rows was off. > Analyze didn't fix it, so I increased the stats target for the column > I was working with, reanalyzed, and voila, the query ran fine with > nestloop=on. > > So, enable_xxx=off is normally only appropriate when troubleshooting > an issue, not as a fix all. That's doubly true for > enable_seqscan=off. > > If you do have a query that nothing else seems to work on it, you can > set one of the enable_xxx settings off for that connection only and > not worry about messing up all the other sessions connecting to your > db. >