Thread: Query Optimizer makes a poor choice

Query Optimizer makes a poor choice

From
"Tyler Hains"
Date:

Hi,

 

We’ve got a strange situation where two queries get dramatically different performance because of how the Query Optimizer handles LIMIT.

 

# explain analyze select * from cards where card_set_id=2850 order by card_id;
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=86686.36..86755.40 rows=27616 width=40) (actual time=22.504..22.852 rows=5000 loops=1)
   Sort Key: card_id
   Sort Method:  quicksort  Memory: 583kB
   ->  Bitmap Heap Scan on cards  (cost=755.41..84649.24 rows=27616 width=40) (actual time=0.416..1.051 rows=5000 loops=1)
         Recheck Cond: (card_set_id = 2850)
         ->  Bitmap Index Scan on cards_card_set_id_indx  (cost=0.00..748.50 rows=27616 width=0) (actual time=0.399..0.399 rows=5000 loops=1)
               Index Cond: (card_set_id = 2850)
 Total runtime: 23.233 ms
(8 rows)

# explain analyze select * from cards where card_set_id=2850 order by card_id limit 1;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..105.19 rows=1 width=40) (actual time=6026.947..6026.948 rows=1 loops=1)
   ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38 rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)
         Filter: (card_set_id = 2850)
 Total runtime: 6026.985 ms
(4 rows)


The only way we’ve found to get around the use of the PK index in the second query is by invalidating it -- sorting it on a cast version of the PK. This doesn’t work terribly well with our dataset. Is there a better way around this?

Tyler Hains

IT Director

ProfitPoint, Inc.

www.profitpointinc.com

 

Re: Query Optimizer makes a poor choice

From
Filip Rembiałkowski
Date:
2011/11/29 Tyler Hains <thains@profitpointinc.com>:
> # explain analyze select * from cards where card_set_id=2850 order by
> card_id limit 1;
>                                                                QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..105.19 rows=1 width=40) (actual time=6026.947..6026.948
> rows=1 loops=1)
>    ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38
> rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)
>          Filter: (card_set_id = 2850)
>  Total runtime: 6026.985 ms

do you have autovacum enabled?

does the plan change when you push stats target for this column?
ALTER TABLE cards ALTER card_set_id SET STATISTICS 500;
ANALYZE cards ( card_set_id );

what happens when you do:
select * from ( select * from cards where card_set_id=2850 ) order by
card_id limit 1
?

Re: Query Optimizer makes a poor choice

From
Scott Marlowe
Date:
On Tue, Nov 29, 2011 at 11:21 AM, Tyler Hains <thains@profitpointinc.com> wrote:
> # explain analyze select * from cards where card_set_id=2850 order by
> card_id limit 1;
>                                                                QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..105.19 rows=1 width=40) (actual time=6026.947..6026.948
> rows=1 loops=1)
>    ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38
> rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)

There's a huge disconnect here between what the query planner expects
(27k rows) and how many there are (1).  Also, getting a single row
from an index should be faster than this, even if the table and index
are quite large.  Have you checked for bloat on this index?

Re: Query Optimizer makes a poor choice

From
"Tyler Hains"
Date:
2011/11/29 Tyler Hains <thains@profitpointinc.com>:
> # explain analyze select * from cards where card_set_id=2850 order by
> card_id limit 1;
>                                                                QUERY
PLAN
>
------------------------------------------------------------------------
-----------------------------------------------------------------
>  Limit  (cost=0.00..105.19 rows=1 width=40) (actual
time=6026.947..6026.948
> rows=1 loops=1)
>    ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38
> rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)
>          Filter: (card_set_id = 2850)
>  Total runtime: 6026.985 ms

do you have autovacum enabled?

does the plan change when you push stats target for this column?
ALTER TABLE cards ALTER card_set_id SET STATISTICS 500;
ANALYZE cards ( card_set_id );

what happens when you do:
select * from ( select * from cards where card_set_id=2850 ) order by
card_id limit 1
?
------------------------------------------------------------------------
--

Yes, I'm pretty sure autovacuum is enabled. Changing the query as shown
there uses the sub-optimal index.

I haven't had a chance to experiment with the SET STATISTICS, but that
got me going on something interesting...

Do these statistics look right?

# SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
pg_stats WHERE tablename = 'cards';

"initial_set_sequence"    31224
"{291,169,334,380,488,599,1752,2293,12584,4}"
"{5,806,2485,5394,9106,14071,18566,26521,41407,89905,534617}"
"initial_card_set_id"    901
"{5201,3203,3169,5679,5143,5204,5231,5655,4322,5236}"
"{4,3079,3896,4349,4677,5149,5445,5707,6003,6361,6784}"
"status"    5    "{Inventory,Activated}"
"{Closed,Expired,Suspended}"
"demo"    1    "{f}"    ""
"card_set_id"    905
"{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
"{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
"external_id"    1    "{""}"    ""
"card_id"    -1    ""
"{0267xxxxxxxxxx46,1000xxxxxxxxxx86,1000xxxxxxxxxx85,1000xxxxxxxxxx69,10
00xxxxxxxxxx04,1000xxxxxxxxxx11,1000xxxxxxxxxx84,1000xxxxxxxxxx65,600xxx
xxxxxxx4,6006xxxxxxxxxxxx279,998xxxxxxxxxx40}"
"pin"    9654    "{1234,1643,2392,6577,0085,0515,0729,1125,1801,1960}"
"{0000,1029,2012,2983,3965,4903,5878,6828,7821,8920,9992}"


Re: Query Optimizer makes a poor choice

From
"Tyler Hains"
Date:
On Tue, Nov 29, 2011 at 11:21 AM, Tyler Hains
<thains@profitpointinc.com> wrote:
> # explain analyze select * from cards where card_set_id=2850 order by
> card_id limit 1;
>                                                                QUERY
PLAN
>
------------------------------------------------------------------------
-----------------------------------------------------------------
>  Limit  (cost=0.00..105.19 rows=1 width=40) (actual
time=6026.947..6026.948
> rows=1 loops=1)
>    ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38
> rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)

There's a huge disconnect here between what the query planner expects
(27k rows) and how many there are (1).  Also, getting a single row
from an index should be faster than this, even if the table and index
are quite large.  Have you checked for bloat on this index?
---------------------------------------------------------------------

There are actually more like 27 million rows in the table. That's why it
really should be filtering the rows using the index on the other column
before ordering for the limit.

The documentation does not seem to give a clear reason for changing the
value used in default_statistics_target or why you would override it
with ALTER TABLE SET STATISTICS. My gut is telling me that this may be
our answer if we can figure out how to tweak it.


Re: Query Optimizer makes a poor choice

From
Filip Rembiałkowski
Date:
2011/11/29 Tyler Hains <thains@profitpointinc.com>:


> I haven't had a chance to experiment with the SET STATISTICS, but that
> got me going on something interesting...
>
> Do these statistics look right?
>
> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
> pg_stats WHERE tablename = 'cards';
>
...
> "card_set_id"   905
> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"

This looks promising, because n_distinct is low enough that you can
cover almost all values with statistics.
raise the statistics and ANALYZE. should help.
(NOTE NOTE NOTE: assuming that the distribution is even)


...
but one thing we see for sure is that you have not tuned your
PostgreSQL instance :-)
I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
it covers most important stuff, *including* default_statistics_target.



Filip

Re: Query Optimizer makes a poor choice

From
Tomas Vondra
Date:
Hi,

what PostgreSQL version is this? That's the first thing we need to know.

On 29.11.2011 22:28, Tyler Hains wrote:
> Yes, I'm pretty sure autovacuum is enabled. Changing the query as shown
> there uses the sub-optimal index.

That doesn't mean

> I haven't had a chance to experiment with the SET STATISTICS, but that
> got me going on something interesting...

If you execute this

SELECT count(*) FROM cards WHERE card_set_id=2850;

what number do you get? How far is that from 27616, expected by the planner?

> Do these statistics look right?

No idea, that depends on your data set. And you've missed the
most_common_freqs so it's almost impossible to analyze the stats.

Tomas

Re: Query Optimizer makes a poor choice

From
Tomas Vondra
Date:
On 29.11.2011 21:34, Scott Marlowe wrote:
> On Tue, Nov 29, 2011 at 11:21 AM, Tyler Hains <thains@profitpointinc.com> wrote:
>> # explain analyze select * from cards where card_set_id=2850 order by
>> card_id limit 1;
>>                                                                QUERY PLAN
>>
-----------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..105.19 rows=1 width=40) (actual time=6026.947..6026.948
>> rows=1 loops=1)
>>    ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38
>> rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)
>
> There's a huge disconnect here between what the query planner expects
> (27k rows) and how many there are (1).  Also, getting a single row
> from an index should be faster than this, even if the table and index
> are quite large.  Have you checked for bloat on this index?

No there isn't - the "1" is actually caused by the LIMIT clause. Once
the first row is returned, it does not fetch the following ones.

The bloat might be the cause, though. Tyler, run this and let us know
the results:

1) SELECT relpages, reltuples FROM pg_class WHERE relname = 'cards';

2) SELECT n_live_tup, n_dead_tup FROM pg_stat_all_tables
    WHERE relname = 'cards';

3) SELECT n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd
     FROM pg_stat_all_tables WHERE relname = 'cards';

If the table / indexes are bloated due to heavy modifications or
(auto)vacuum not aggressive enough, you may try to cluster the table.
But it obtains exclusive lock on the table, preventing writes.

Tomas

Re: Query Optimizer makes a poor choice

From
Tomas Vondra
Date:
On 29.11.2011 23:06, Filip Rembiałkowski wrote:
> 2011/11/29 Tyler Hains <thains@profitpointinc.com>:
>
>
>> I haven't had a chance to experiment with the SET STATISTICS, but that
>> got me going on something interesting...
>>
>> Do these statistics look right?
>>
>> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
>> pg_stats WHERE tablename = 'cards';
>>
> ...
>> "card_set_id"   905
>> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
>> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
>
> This looks promising, because n_distinct is low enough that you can
> cover almost all values with statistics.
> raise the statistics and ANALYZE. should help.
> (NOTE NOTE NOTE: assuming that the distribution is even)

Estimating ndistinct is very tricky, there are well known fail cases
(skewed distributions etc.)

> ...
> but one thing we see for sure is that you have not tuned your
> PostgreSQL instance :-)
> I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
> it covers most important stuff, *including* default_statistics_target.

How do we see that? The only thing you can derive from the above info is
that he's probably running 8.3 (or older), because the number of MVC is
10 and newer releases use 100 by default.

But the statistics target is modified rather rarely, only when it's
actually needed - the default is usually enough and increasing it just
adds overhead to planning.

And pgtune can't reliably suggest a good value, because it's very
dependent on the data. It can merely recommend some reasonable values
(and it recommends 10 for most workloads anyway, except for DWH and
mixed). Don't touch default_statistics_target unless you're sure it
helps and set it only for those columns that need it.

Tomas

Re: Query Optimizer makes a poor choice

From
Tomas Vondra
Date:
On 29.11.2011 22:43, Tyler Hains wrote:
> There are actually more like 27 million rows in the table. That's why it
> really should be filtering the rows using the index on the other column
> before ordering for the limit.

Well, the problem is that the PostgreSQL MVCC model is based on keeping
copies of the row. When you delete a row, it's actually marked as
deleted so that running transactions can still see it. An update is just
a delete+insert, so the consequences are the same.

This means there may be a lot of dead rows - easily orders of magnitude
more than there should be. So instead of 27 million rows the table may
actually contain 270 million.

That's what (auto)vacuum is for - it reclaims the space occupied by dead
rows, because there are no transaction that can see them. This space is
then used for new rows (either created by INSERT or UPDATE).

But if the autovacuum can't keep pace with the changes, e.g. because
you've repeatedly run a full-table update or because the table is
updated heavily and the autovacuum is not aggressive enough, you got a
problem.

And this affects indexes too - each new row (or a copy of a row) needs a
new record in the index. Unless it's a HOT update, but let's not
complicate that. And this space is not reclaimed by plain (auto)vacuum,
so you may have a perfectly healthy table and bloated index.

Check the size of your table and indexes, see if it matches your
expectations. E.g. create a small table with 10000 rows and compute how
large would the table be with 27 million rows (just multiply by 2700).
Does that match the current size? Same thing for the index.

And run the three queries I've posted in my previous post - that should
give you more details.

You may also use pgstattuple contrib module - run this

  select * from pgstattuple('cards');
  select * from pgstatindex('cards_pkey');

High values of dead_tuple_percent/free_percent (for a table) or
leaf_fragmentation (index) and low avg_leaf_density (index) usually mean
there's a bloat.

But be careful - this actually reads the whole table / index.

> The documentation does not seem to give a clear reason for changing the
> value used in default_statistics_target or why you would override it
> with ALTER TABLE SET STATISTICS. My gut is telling me that this may be
> our answer if we can figure out how to tweak it.

That affects the estimates - when the distribution is skewed the default
detail may not be sufficient for estimate precise enough, so the
optimizer chooses bad plans. Increasing the statistics target means
"collect more detailed statistics" and that often helps to fix the
issues. But I think this is not the case. I'd guess the bloat.

Tomas

Re: Query Optimizer makes a poor choice

From
Tomas Vondra
Date:
On 29.11.2011 23:19, Tomas Vondra wrote:
> Hi,
>
> what PostgreSQL version is this? That's the first thing we need to know.
>
> On 29.11.2011 22:28, Tyler Hains wrote:
>> Yes, I'm pretty sure autovacuum is enabled. Changing the query as shown
>> there uses the sub-optimal index.
>
> That doesn't mean

Sorry, deleted this part by accident. It should be "That doesn't mean
the table / index is not bloated." See my other posts for more details.

Tomas

Re: Query Optimizer makes a poor choice

From
"Tyler Hains"
Date:
>> I haven't had a chance to experiment with the SET STATISTICS, but
that
>> got me going on something interesting...
>>
>> Do these statistics look right?
>>
>> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
>> pg_stats WHERE tablename = 'cards';
>>
>...
>> "card_set_id"   905
>> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
>> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
>
>This looks promising, because n_distinct is low enough that you can
>cover almost all values with statistics.
>raise the statistics and ANALYZE. should help.
>(NOTE NOTE NOTE: assuming that the distribution is even)
>
>
>...
>but one thing we see for sure is that you have not tuned your
>PostgreSQL instance :-)
>I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
>it covers most important stuff, *including* default_statistics_target.
>
>
>
>Filip
>

I just tried the set statistics on our test system with essentially the
same end result.

I'm beginning to think the answer is to just avoid LIMIT.

Tyler



Re: Query Optimizer makes a poor choice

From
Tomas Vondra
Date:
On 30.11.2011 23:22, Tyler Hains wrote:
>>> I haven't had a chance to experiment with the SET STATISTICS, but
> that
>>> got me going on something interesting...
>>>
>>> Do these statistics look right?
>>>
>>> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
>>> pg_stats WHERE tablename = 'cards';
>>>
>> ...
>>> "card_set_id"   905
>>> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
>>> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
>>
>> This looks promising, because n_distinct is low enough that you can
>> cover almost all values with statistics.
>> raise the statistics and ANALYZE. should help.
>> (NOTE NOTE NOTE: assuming that the distribution is even)
>>
>>
>> ...
>> but one thing we see for sure is that you have not tuned your
>> PostgreSQL instance :-)
>> I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
>> it covers most important stuff, *including* default_statistics_target.
>>
>>
>>
>> Filip
>>
>
> I just tried the set statistics on our test system with essentially the
> same end result.

Can you describe the problem in a bit more detail? Because maybe you
just have the same problem as the OP.

Because with this (very simple) test case it works just fine.

========================================================================
create table test_tab (id int primary key, val int, txtval text);

insert into test_tab select i, mod(i, 10000), md5(i::text) from
generate_series(1,10000000) s(i);

create index test_tab_idx on test_tab (val);

analyze test_tab;
========================================================================

The table is about 730MB, the indexes are about 214MB each.

========================================================================
explain analyze select * from test_tab where val = 500 order by id;

1st execution (not cached): http://explain.depesz.com/s/1VQ (7786 ms)
2nd execution (cached):     http://explain.depesz.com/s/cnt (1 ms)

explain analyze select * from test_tab where val = 500 order by id limit 1;

1st execution (not cached): http://explain.depesz.com/s/nlE (66 ms)
2nd execution (cached):     http://explain.depesz.com/s/WNa (0.08 ms)
========================================================================

So in both cases the LIMIT (with index scan) is faster. Sure, there may
be cases when this does not work that well - maybe it's not well cached,
maybe there's some other issue.

But it clearly is not true that LIMIT is evil and should be avoided.

Tomas

Re: Query Optimizer makes a poor choice

From
Marcin Mańk
Date:
On Tue, Nov 29, 2011 at 7:21 PM, Tyler Hains <thains@profitpointinc.com> wrote:
> # explain analyze select * from cards where card_set_id=2850 order by
> card_id limit 1;
>                                                                QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..105.19 rows=1 width=40) (actual time=6026.947..6026.948
> rows=1 loops=1)
>    ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38
> rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)
>          Filter: (card_set_id = 2850)
>  Total runtime: 6026.985 ms
> (4 rows)
>

I believe this is the old problem of the planner expecting that the
card_set_id's are randomly distributed over the card_ids . This is not
the case, I guess?

The planner expects to quickly hit a matching record while scanning
the primary key, an there is a nasty surprise.

It seems there is no perfect solution, things You might want to try:
-fooling with random_page_cost/seq_tuple_cost/work_mem
-"order by card_id-1"
-an index on (card_set_id, card_id)

Greetings
Marcin

Re: Query Optimizer makes a poor choice

From
"Tyler Hains"
Date:
>>On Tue, Nov 29, 2011 at 7:21 PM, Tyler Hains
<thains@profitpointinc.com> wrote:
>> # explain analyze select * from cards where card_set_id=2850 order by
>> card_id limit 1;
>>                                                                QUERY
PLAN
>>
------------------------------------------------------------------------
------------------------------------------->----------------------
>>  Limit  (cost=0.00..105.19 rows=1 width=40) (actual
time=6026.947..6026.948
>> rows=1 loops=1)
>>    ->  Index Scan using cards_pkey on cards  (cost=0.00..2904875.38
>> rows=27616 width=40) (actual time=6026.945..6026.945 rows=1 loops=1)
>>          Filter: (card_set_id = 2850)
>>  Total runtime: 6026.985 ms
>> (4 rows)
>>
>
>I believe this is the old problem of the planner expecting that the
>card_set_id's are randomly distributed over the card_ids . This is not
>the case, I guess?
>
>The planner expects to quickly hit a matching record while scanning
>the primary key, an there is a nasty surprise.
>
>It seems there is no perfect solution, things You might want to try:
>-fooling with random_page_cost/seq_tuple_cost/work_mem
>-"order by card_id-1"
>-an index on (card_set_id, card_id)
>
>Greetings
>Marcin

Ahhh! This makes a lot of sense. Yes, a card set encompasses a specific
range of cards. With that, it might make a big difference to add an
index on the combination of the two fields...

Thanks!
Tyler