Thread: bad estimates / non-scanning aggregates
I'm surprised at the effort pgsql requires to run one of my queries. I don't know how to tune this query. Column | Type | Modifiers ------------+--------------+----------- the_id | integer | not null the_date | date | not null num1 | numeric(9,4) | num2 | numeric(9,4) | num3 | numeric(9,4) | num4 | numeric(9,4) | int1 | integer | Indexes: "the_table_pkey" primary key, btree (the_id, the_date) --------------------------------------- The query I want to run is select stock_id, min(price_date) from day_ends group by stock_id; --------------------------------------- Here's the plan that I get. GroupAggregate (cost=3711244.30..3838308.31 rows=6732 width=8) -> Sort (cost=3711244.30..3753593.36 rows=16939624 width=8) Sort Key: stock_id -> Seq Scan on day_ends (cost=0.00..361892.24 rows=16939624 width=8) If I set enable_seqscan = false, the plan changes to GroupAggregate (cost=0.00..67716299.91 rows=6732 width=8) -> Index Scan using day_ends_pkey on day_ends (cost=0.00..67631584.96 rows=16939624 width=8) --------------------------------------- Now... the first plan uses up tons of temporary space for sorting. The second one just runs and runs and runs. I've tried setting the statistics to 1000 with little effect. So the query can get everything it needs from the index, and a full scan of the index should be faster (the index file is less than half the size of the data file.) So why does the optimizer estimate so high? Also, to get the MIN for a given group, not all values of the index need to be seen. Must pgsql do a full scan because it treats all aggregates in the same way? Are MIN and MAX used often enough to justify special treatment, and could that be cleanly implemented? Perhaps the aggregate function can request the data in a certain order, be told that it is being passed data in a certain order, and return before seeing the entire set of data. Food for thought... Thanks, Ken Geis
On Thu, Aug 28, 2003 at 17:10:31 -0700, Ken Geis <kgeis@speakeasy.org> wrote: > The query I want to run is > > select stock_id, min(price_date) from day_ends group by stock_id; The fast way to do this is: select distinct on (stock_id) stock_id, price_date order by stock_id, price_date; > Also, to get the MIN for a given group, not all values of the index need > to be seen. Must pgsql do a full scan because it treats all aggregates > in the same way? Are MIN and MAX used often enough to justify special > treatment, and could that be cleanly implemented? Perhaps the aggregate > function can request the data in a certain order, be told that it is > being passed data in a certain order, and return before seeing the > entire set of data. Yes, max and min are not treated special so they don't benefit from indexes. This has been discussed repeatedly in the archives.
Bruno Wolff III wrote: > On Thu, Aug 28, 2003 at 17:10:31 -0700, > Ken Geis <kgeis@speakeasy.org> wrote: > >>The query I want to run is >> >>select stock_id, min(price_date) from day_ends group by stock_id; > > The fast way to do this is: > > select distinct on (stock_id) stock_id, price_date > order by stock_id, price_date; Not according to the optimizer! Plus, this is not guaranteed to return the correct results. Unique (cost=3711244.30..3795942.42 rows=6366 width=8) -> Sort (cost=3711244.30..3753593.36 rows=16939624 width=8) Sort Key: stock_id, price_date -> Seq Scan on day_ends (cost=0.00..361892.24 rows=16939624 width=8)
On Thu, Aug 28, 2003 at 19:50:38 -0700, Ken Geis <kgeis@speakeasy.org> wrote: > Bruno Wolff III wrote: > >On Thu, Aug 28, 2003 at 17:10:31 -0700, > > Ken Geis <kgeis@speakeasy.org> wrote: > > > >>The query I want to run is > >> > >>select stock_id, min(price_date) from day_ends group by stock_id; > > > >The fast way to do this is: > > > >select distinct on (stock_id) stock_id, price_date > > order by stock_id, price_date; > > Not according to the optimizer! Plus, this is not guaranteed to return > the correct results. For it to be fast you need an index on (stock_id, price_date) so that you can use an index scan. The answers are guarenteed to be correct. See: http://developer.postgresql.org/docs/postgres/sql-select.html#SQL-DISTINCT
Bruno Wolff III wrote: >>Not according to the optimizer! Plus, this is not guaranteed to return >>the correct results. > > For it to be fast you need an index on (stock_id, price_date) so that > you can use an index scan. I already said that such an index existed. In fact, it is the primary key of the table. And yes, I *am* analyzed! > The answers are guarenteed to be correct. See: > http://developer.postgresql.org/docs/postgres/sql-select.html#SQL-DISTINCT That's good to know. Thanks! Ken
On Thu, Aug 28, 2003 at 20:00:32 -0700, Ken Geis <kgeis@speakeasy.org> wrote: > Bruno Wolff III wrote: > >>Not according to the optimizer! Plus, this is not guaranteed to return > >>the correct results. > > > >For it to be fast you need an index on (stock_id, price_date) so that > >you can use an index scan. > > I already said that such an index existed. In fact, it is the primary > key of the table. And yes, I *am* analyzed! Your original example didn't actually match that of the table you are showing examples from. In that example the second half of the primary key was the date not the end of the day price. If this is the case for the real table, then that is the reason the distinct on doesn't help.
Bruno Wolff III wrote: > On Thu, Aug 28, 2003 at 20:00:32 -0700, > Ken Geis <kgeis@speakeasy.org> wrote: > >>Bruno Wolff III wrote: >> >>>>Not according to the optimizer! Plus, this is not guaranteed to return >>>>the correct results. >>> >>>For it to be fast you need an index on (stock_id, price_date) so that >>>you can use an index scan. >> >>I already said that such an index existed. In fact, it is the primary >>key of the table. And yes, I *am* analyzed! > > > Your original example didn't actually match that of the table you are showing > examples from. In that example the second half of the primary key was the > date not the end of the day price. If this is the case for the real table, > then that is the reason the distinct on doesn't help. I had obfuscated the table in the example and forgot to do the same with the query. Serves me right for thinking I care about that. A big problem is that the values I am working with are *only* the primary key and the optimizer is choosing a table scan over an index scan. That is why I titled the email "bad estimates." The table has (stock_id, price_date) as the primary key, and a bunch of other columns. What I *really* want to do efficiently is select stock_id, min(price_date), max(price_date) from day_ends group by stock_id; It is not the table or the query that is wrong. It is either the db parameters or the optimizer itself. Ken
On Thu, Aug 28, 2003 at 20:46:00 -0700, Ken Geis <kgeis@speakeasy.org> wrote: > > A big problem is that the values I am working with are *only* the > primary key and the optimizer is choosing a table scan over an index > scan. That is why I titled the email "bad estimates." The table has > (stock_id, price_date) as the primary key, and a bunch of other columns. > What I *really* want to do efficiently is > > select stock_id, min(price_date), max(price_date) > from day_ends > group by stock_id; > > It is not the table or the query that is wrong. It is either the db > parameters or the optimizer itself. If you want both the max and the min, then things are going to be a bit more work. You are either going to want to do two separate selects or join two selects or use subselects. If there aren't enough prices per stock, the sequential scan might be fastest since you only need to go through the table once and don't have to hit the index blocks. It is still odd that you didn't get a big speed up for just the min though. You example did have the stock id and the date as the primary key which would make sense since the stock id and stock price on a day wouldn't be guarenteed to be unique. Are you absolutely sure you have a combined key on the stock id and the stock price?
Bruno Wolff III wrote: > On Thu, Aug 28, 2003 at 20:46:00 -0700, > Ken Geis <kgeis@speakeasy.org> wrote: >>It is not the table or the query that is wrong. It is either the db >>parameters or the optimizer itself. ... > > It is still odd that you didn't get a big speed up for just the min though. > You example did have the stock id and the date as the primary key which > would make sense since the stock id and stock price on a day wouldn't > be guarenteed to be unique. Are you absolutely sure you have a combined > key on the stock id and the stock price? I am positive! I can send a log if you want, but I won't post it to the list. The arity on the data is roughly 1500 price_dates per stock_id. I was able to get the query to return in a reasonable amount of time (still ~3 minutes) by forcing a nested loop path using SQL functions instead of min and max. I'm going to run comparisons on 7.3.3 and 7.4-beta2. I'll also look into the optimizer source to try to figure out why it thinks scanning this index is so expensive. Ken
On Thu, Aug 28, 2003 at 21:09:00 -0700, Ken Geis <kgeis@speakeasy.org> wrote: > Bruno Wolff III wrote: > > I am positive! I can send a log if you want, but I won't post it to the > list. Can you do a \d on the real table or is that too sensitive? It still doesn't make sense that you have a primary key that is a stock and its price. What happens when the stock has the same price on two different dates? And I doubt that you are looking for the minimum and maximum dates for which you have price data. So it is hard to believe that the index for your primary key is the one you need for your query. > The arity on the data is roughly 1500 price_dates per stock_id. Two index scans (one for min values and another for max values) should be better than one sequential scan under those conditions. I am calling it quits for tonight, but will check back tomorrow to see how things turned out.
Bruno Wolff III wrote: > Can you do a \d on the real table or is that too sensitive? It was silly of me to think of this as particularly sensitive. stocks=> \d day_ends Table "public.day_ends" Column | Type | Modifiers ------------+--------------+----------- stock_id | integer | not null price_date | date | not null open | numeric(9,4) | high | numeric(9,4) | low | numeric(9,4) | close | numeric(9,4) | volume | integer | Indexes: day_ends_pkey primary key btree (stock_id, price_date) Triggers: RI_ConstraintTrigger_16558399 > It still doesn't make sense that you have a primary key that > is a stock and its price. What happens when the stock has the > same price on two different dates? And I doubt that you are looking > for the minimum and maximum dates for which you have price data. > So it is hard to believe that the index for your primary key is the > one you need for your query. I can see the naming being confusing. I used "price_date" because, of course, "date" is not a legal name. "day_ends" is a horrible name for the table; "daily_bars" would probably be better. I *am* looking for the mininum and maximum dates for which I have price data. I'm running this query to build a chart so I can see visually where the majority of my data begins to use as the start of a window for analysis. When run on 7.3.3, forcing an index scan by setting enable_seqscan=false, the query took 55 minutes to run. The index is about 660M in size, and the table is 1G. As I mentioned before, with table scans enabled, it bombs, running out of temporary space. Hey Bruno, thanks for your attention here. I'm not a newbie, but I've never really had performance issues with pgsql before. And I've been running this database for a couple of years now, but I haven't run these queries against it. Ken
Sorry, all, to wipe out the context, but it was getting a little long. Bruno Wolff III wrote: > I am calling it quits for tonight, but will check back tomorrow > to see how things turned out. I went through the code (7.4 beta2) that estimates the cost of an index scan path. What I need to be sure of is that when running a query in pgsql that uses only the columns that are in an index, the underlying table need not be accessed. I know that Oracle does this. The cost_index function is assuming that after finding an entry in the index it will be looking it up in the underlying table. That table is not well correlated to the index, so it is assuming (in the worst case) a random page lookup for each of 17 million records! In my case, if the underlying table is indeed not touched, the estimated cost is 1000 times the real cost. 63388.624000 to scan the index 67406506.915595 to scan the index and load a random page for each entry Ken
Ken Geis wrote: > I went through the code (7.4 beta2) that estimates the cost of an index > scan path. What I need to be sure of is that when running a query in > pgsql that uses only the columns that are in an index, the underlying > table need not be accessed. I know that Oracle does this. Thinking about it some more, it's obvious to me that a pgsql index scan must be accessing the underlying table even though all of the information needed is in the index itself. A linear scan of a 660M file should not take 55 minutes. I could confirm this with stats, but someone out there probably already knows the answer here. Ken
> > I went through the code (7.4 beta2) that estimates the cost of an index > > scan path. What I need to be sure of is that when running a query in > > pgsql that uses only the columns that are in an index, the underlying > > table need not be accessed. I know that Oracle does this. PostgreSQL absolutely does not do this. It is also not possible to do this due to MVCC. Chris
Ken Geis wrote: > When run on 7.3.3, forcing an index scan by setting > enable_seqscan=false, the query took 55 minutes to run. The index is > about 660M in size, and the table is 1G. As I mentioned before, with > table scans enabled, it bombs, running out of temporary space. Man, I should wait a while before I send mails, because I keep having more to say! Some good news here. Doing the same as above on 7.4beta2 took 29 minutes. Now, the 7.3.3 was on reiser and 7.4 on ext2, so take that as you will. 7.4's index selectivity estimate seems much better; 7.3.3's anticipated rows was ten times the actual; 7.4's is one half of the actual. Ken
On Fri, 29 Aug 2003, Ken Geis wrote: > Some good news here. Doing the same as above on 7.4beta2 took 29 > minutes. Now, the 7.3.3 was on reiser and 7.4 on ext2, so take that as > you will. 7.4's index selectivity estimate seems much better; 7.3.3's > anticipated rows was ten times the actual; 7.4's is one half of the actual. > Min() & Max() unfortunatly suck on PG. It will be that way for a while perhaps at some point someone will make a "special" case and convince -HACKERS it is a Good Thing(tm) (Like select count(*) from table being 'cached' - a lot of people probably get bad first impressions because of that) Would it be possible ot rewrite your queries replacing min/max with a select stock_id from bigtable where blah = blorch order by stock_id (desc|asc) limit 1? because that would enable PG to use an index and magically "go fast". You may need a subselect.. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
On Fri, 29 Aug 2003, Ken Geis wrote: > Ken Geis wrote: > > I went through the code (7.4 beta2) that estimates the cost of an index > > scan path. What I need to be sure of is that when running a query in > > pgsql that uses only the columns that are in an index, the underlying > > table need not be accessed. I know that Oracle does this. > > Thinking about it some more, it's obvious to me that a pgsql index scan > must be accessing the underlying table even though all of the > information needed is in the index itself. A linear scan of a 660M file > should not take 55 minutes. I could confirm this with stats, but > someone out there probably already knows the answer here. Unfortunately not all the information needed is in the index. You can't tell from the index alone currently whether or not the row is visible to you. Adding said information would be possible but there are downsides to that as well (there are some past discussions on the topic, but I'm too lazy to look them up to give a link, check the archives ;) ).
Bruno Wolff III wrote: > If you want both the max and the min, then things are going to be a bit > more work. You are either going to want to do two separate selects > or join two selects or use subselects. If there aren't enough prices > per stock, the sequential scan might be fastest since you only need to > go through the table once and don't have to hit the index blocks. > > It is still odd that you didn't get a big speed up for just the min though. I found I'm suffering from an effect detailed in a previous thread titled Does "correlation" mislead the optimizer on large tables? Ken
> >If you want both the max and the min, then things are going to be a > >bit more work. You are either going to want to do two separate > >selects or join two selects or use subselects. If there aren't > >enough prices per stock, the sequential scan might be fastest since > >you only need to go through the table once and don't have to hit > >the index blocks. > > > >It is still odd that you didn't get a big speed up for just the min though. > > I found I'm suffering from an effect detailed in a previous thread titled > > Does "correlation" mislead the optimizer on large tables? I don't know about large tables, but this is a big problem and something I'm going to spend some time validating later today. I think Manfred's patch is pretty good and certainly better than where we are but I haven't used it yet to see if it's the magic ticket for many of these index problems. -sc -- Sean Chittenden
Sean Chittenden wrote: >>I found I'm suffering from an effect detailed in a previous thread titled >> >> Does "correlation" mislead the optimizer on large tables? > > > I don't know about large tables, but this is a big problem and > something I'm going to spend some time validating later today. I > think Manfred's patch is pretty good and certainly better than where > we are but I haven't used it yet to see if it's the magic ticket for > many of these index problems. I had to dig through a lot of archives to find this. Is this the patch, from last October? http://members.aon.at/pivot/pg/16-correlation.diff If so, I'll try it out and report my results. Ken
> >>I found I'm suffering from an effect detailed in a previous thread titled > >> > >> Does "correlation" mislead the optimizer on large tables? > > > > > >I don't know about large tables, but this is a big problem and > >something I'm going to spend some time validating later today. I > >think Manfred's patch is pretty good and certainly better than where > >we are but I haven't used it yet to see if it's the magic ticket for > >many of these index problems. > > I had to dig through a lot of archives to find this. Is this the patch, > from last October? > > http://members.aon.at/pivot/pg/16-correlation.diff > > If so, I'll try it out and report my results. Same guy, but that patch is pretty out of date and has been replaced by some newer work that's much better. From: Manfred Koizar <mkoi-pg@aon.at> Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Correlation in cost_index() Date: Wed, 20 Aug 2003 19:57:12 +0200 Message-ID: <lo97kvkmjatb0ain1e7ad69ccslripcafv@4ax.com> and From: Manfred Koizar <mkoi-pg@aon.at> To: pgsql-hackers@postgresql.org Subject: [HACKERS] Again on index correlation Date: Wed, 20 Aug 2003 21:21:14 +0200 Message-ID: <dhd7kvs4niqijnerr9mi38oeih1o7j2s28@4ax.com> -sc -- Sean Chittenden
I haven't come up with any great ideas for this one. It might be interesting to compare the explain analyze output from the distinct on query with and without seqscans enabled.
Bruno Wolff III wrote: > I haven't come up with any great ideas for this one. It might be interesting > to compare the explain analyze output from the distinct on query with > and without seqscans enabled. Can't do that comparison. Remember, with seqscan it fails. (Oh, and that nested loops solution I thought was fast actually took 31 minutes versus 29 for index scan in 7.4b2.) I ran another query across the same data: select price_date, count(*) from day_ends group by price_date; It used a table scan and hashed aggregates, and it ran in 5.5 minutes. Considering that, pgsql should be able to do the query that I had been running in a little more time than that. So... From what I've learned, we want to convince the optimizer to use a table scan; that's a good thing. I want it to use hashed aggregates, but I can't convince it to (unless maybe I removed all of the statistics.) To use group aggregates, it first sorts the results of the table scan (all 17 million rows!) There ought to be some way to tell pgsql not to do sorts above a certain size. In this case, if I set enable_sort=false, it goes back to the index scan. If I then set enable_indexscan=false, it goes back to sorting.
Bruno Wolff III wrote: > I haven't come up with any great ideas for this one. It might be interesting > to compare the explain analyze output from the distinct on query with > and without seqscans enabled. After digging through planner code, I found that bumping up the sort_mem will make the planner prefer a full table scan and hashed aggregation. The sort memory is where the hash table is stored. In the end, the query runs in 4.5 minutes, which is reasonable. I had planned to try Manfred's index correlation patch to see if it would give better estimates for an index scan. The index scan method took maybe 6.5x as long, but the estimate was that it would take 1400x as long. I think instead of trying out his patch I might actually work on my application! Ken
Ken Geis <kgeis@speakeasy.org> writes: > From what I've learned, we want to convince the optimizer to use a > table scan; that's a good thing. I want it to use hashed aggregates, > but I can't convince it to (unless maybe I removed all of the > statistics.) You probably just need to increase sort_mem. Multiple aggregates take more RAM to process in a hashtable style ... regards, tom lane