Re: Query Optimizer makes a poor choice - Mailing list pgsql-general

From Tomas Vondra
Subject Re: Query Optimizer makes a poor choice
Date
Msg-id 4ED6BF1B.2080604@fuzzy.cz
Whole thread Raw
In response to Re: Query Optimizer makes a poor choice  ("Tyler Hains" <thains@profitpointinc.com>)
List pgsql-general
On 30.11.2011 23:22, Tyler Hains wrote:
>>> I haven't had a chance to experiment with the SET STATISTICS, but
> that
>>> got me going on something interesting...
>>>
>>> Do these statistics look right?
>>>
>>> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
>>> pg_stats WHERE tablename = 'cards';
>>>
>> ...
>>> "card_set_id"   905
>>> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
>>> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
>>
>> This looks promising, because n_distinct is low enough that you can
>> cover almost all values with statistics.
>> raise the statistics and ANALYZE. should help.
>> (NOTE NOTE NOTE: assuming that the distribution is even)
>>
>>
>> ...
>> but one thing we see for sure is that you have not tuned your
>> PostgreSQL instance :-)
>> I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
>> it covers most important stuff, *including* default_statistics_target.
>>
>>
>>
>> Filip
>>
>
> I just tried the set statistics on our test system with essentially the
> same end result.

Can you describe the problem in a bit more detail? Because maybe you
just have the same problem as the OP.

Because with this (very simple) test case it works just fine.

========================================================================
create table test_tab (id int primary key, val int, txtval text);

insert into test_tab select i, mod(i, 10000), md5(i::text) from
generate_series(1,10000000) s(i);

create index test_tab_idx on test_tab (val);

analyze test_tab;
========================================================================

The table is about 730MB, the indexes are about 214MB each.

========================================================================
explain analyze select * from test_tab where val = 500 order by id;

1st execution (not cached): http://explain.depesz.com/s/1VQ (7786 ms)
2nd execution (cached):     http://explain.depesz.com/s/cnt (1 ms)

explain analyze select * from test_tab where val = 500 order by id limit 1;

1st execution (not cached): http://explain.depesz.com/s/nlE (66 ms)
2nd execution (cached):     http://explain.depesz.com/s/WNa (0.08 ms)
========================================================================

So in both cases the LIMIT (with index scan) is faster. Sure, there may
be cases when this does not work that well - maybe it's not well cached,
maybe there's some other issue.

But it clearly is not true that LIMIT is evil and should be avoided.

Tomas

pgsql-general by date:

Previous
From: Scott Mead
Date:
Subject: Re: Extending the volume size of the data directory volume
Next
From: Craig Ringer
Date:
Subject: Re: Is it possible to make a streaming replication faster using COPY instead of lots of INSERTS?