Thread: Incorrect row estimates in plan?

Incorrect row estimates in plan?

From

pgdba

Date:

26 September 2007, 11:22:52

Hi, I am having some trouble understanding a plan and was wondering if anyone
could guide me. The query in question here seems to be showing some
incorrect row counts. I have vacuumed and analyzed the table, but the
estimate versus the actual total seems to be way out (est 2870 vs actual
85k). Perhaps I am reading the plan incorrectly though. (hopefully the plan
below is readable)

db=# select version();
version
-------------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.4 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115
(prerelease) (Debian 4.1.1-21)

db=# show shared_buffers ;
 shared_buffers
----------------
 300MB

#4GB ram, 2 SATA striped, XFS

db=# show default_statistics_target;
 default_statistics_target
---------------------------
 100

# stats have been raised to 1000 on both the destip and srcip columns
# create index slog_gri_idx on slog (gid,rule,(case when rule in (8,9) then
destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10));
# vacuum analyze verbose slog;

db=# show random_page_cost ;
 random_page_cost
------------------
 3

db=# select count(*) from slog
  count
---------
 1,019,121

db=#select  count(*) as total
from slog
where gid=10000::INTEGER
and rule in (1,2,8,9,10)
and (case when rule in (8,9) then destip else srcip
end)='192.168.10.23'::INET;
 total
-------
 83,538

# problematic query
explain analyze
select  coalesce(uri,host((case when rule in (8,9) then srcip else destip
end))) as
destip,
        case when rule in (8,9) then 'ext' else 'int' end as tp,
        count(*) as total,
        coalesce(sum(destbytes),0)+coalesce(sum(srcbytes),0) as bytes
from slog
where gid=10000::INTEGER
and rule in (1,2,8,9,10)
and (case when rule in (8,9) then destip else srcip
end)='192.168.10.23'::INET
group by destip,tp
order by bytes desc,total desc,destip limit 20


Limit  (cost=6490.18..6490.23 rows=20 width=61) (actual
time=2036.968..2037.220 rows=20 loops=1)
->  Sort  (cost=6490.18..6490.90 rows=288 width=61) (actual
time=2036.960..2037.027 rows=20 loops=1)
        Sort Key: (COALESCE(sum(destbytes), 0::numeric) +
COALESCE(sum(srcbytes), 0::numeric)), count(*), COALESCE(uri, host(CASE WHEN
(rule = ANY ('{8,9}'::integer[])) THEN srcip ELSE destip END))
        ->  HashAggregate  (cost=6470.50..6478.42 rows=288 width=61) (actual
time=2008.478..2022.125 rows=2057 loops=1)
            ->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
                    Recheck Cond: ((gid = 10000) AND (rule = ANY
('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
                    ->  Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26
rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
                        Index Cond: ((gid = 10000) AND (rule = ANY
('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
Total runtime: 2037.585 ms

Does anyone have any suggestions?

Thanks!
--
View this message in context: http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12902068
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Incorrect row estimates in plan?

From

Tom Lane

Date:

26 September 2007, 11:45:22

pgdba <postgresql@inbox.com> writes:
>             ->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
> width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
>                     Recheck Cond: ((gid = 10000) AND (rule = ANY
> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
> ('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
>                     ->  Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26
> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
>                         Index Cond: ((gid = 10000) AND (rule = ANY
> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
> ('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))

[ blink... ]  Pray tell, what is the definition of this index?

With such a bizarre scan condition, it's unlikely you'll get any really
accurate row estimate.

            regards, tom lane

Re: Incorrect row estimates in plan?

From

pgdba

Date:

26 September 2007, 12:24:25

Hi Tom,

Tom Lane-2 wrote:
>
> pgdba <postgresql@inbox.com> writes:
>>             ->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
>> width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
>>                     Recheck Cond: ((gid = 10000) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
>>                     ->  Bitmap Index Scan on slog_gri_idx
>> (cost=0.00..82.26
>> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
>>                         Index Cond: ((gid = 10000) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
>
> [ blink... ]  Pray tell, what is the definition of this index?
>
> With such a bizarre scan condition, it's unlikely you'll get any really
> accurate row estimate.
>
>             regards, tom lane
>
>

Original index: "create index slog_gri_idx on slog (gid,rule,(case when rule
in (8,9) then
destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"

The purpose of that index is to match a specific query (one that gets run
frequently and needs to be fast). It is using the destip when rule 8/9, and
srcip when other, but only for a subset of the rules (1,2,8,9,10). There are
about 18 rules in total, but I'm only interested in those 5. I have tried a
couple of indices like:
create index test_destip_idx on slog (gid,destip) where rule in (8,9);
create index test_srcip_idx on slog (gid,srcip) where rule in (1,2,10);

But the original slog_gri_idx index was used instead. Is there a way that I
can rewrite that index then? Not that I'm a fan of a CASE statement in a
functional index, but I'm at a loss as to how else I can create this. Or
what else I can look into to make this faster?

--
View this message in context: http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12903194
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Incorrect row estimates in plan?

From

Tom Lane

Date:

26 September 2007, 13:38:23

pgdba <postgresql@inbox.com> writes:
> Tom Lane-2 wrote:
> ->  Bitmap Index Scan on slog_gri_idx
> (cost=0.00..82.26
> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
> Index Cond: ((gid = 10000) AND (rule = ANY
> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
> '192.168.10.23'::inet))
>>
>> [ blink... ]  Pray tell, what is the definition of this index?

> Original index: "create index slog_gri_idx on slog (gid,rule,(case when rule
> in (8,9) then
> destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"

> The purpose of that index is to match a specific query (one that gets run
> frequently and needs to be fast).

Ah.  I didn't think you would've put such a specific thing into an index
definition, but if you're stuck supporting such badly written queries,
maybe there's no other way.

I rather doubt that you're going to be able to make this query any
faster than it is, short of buying enough RAM to keep the whole table
RAM-resident.  Pulling 80000 random rows in 1200 msec doesn't sound
all that slow to me.

The ultimate solution might be to rethink your table designs ...

            regards, tom lane

Re: Incorrect row estimates in plan?

From

pgdba

Date:

26 September 2007, 13:56:40



Tom Lane-2 wrote:
>
> pgdba <postgresql@inbox.com> writes:
>> Tom Lane-2 wrote:
>> ->  Bitmap Index Scan on slog_gri_idx
>> (cost=0.00..82.26
>> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
>> Index Cond: ((gid = 10000) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
>>>
>>> [ blink... ]  Pray tell, what is the definition of this index?
>
>> Original index: "create index slog_gri_idx on slog (gid,rule,(case when
>> rule
>> in (8,9) then
>> destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"
>
>> The purpose of that index is to match a specific query (one that gets run
>> frequently and needs to be fast).
>
> Ah.  I didn't think you would've put such a specific thing into an index
> definition, but if you're stuck supporting such badly written queries,
> maybe there's no other way.
>
> I rather doubt that you're going to be able to make this query any
> faster than it is, short of buying enough RAM to keep the whole table
> RAM-resident.  Pulling 80000 random rows in 1200 msec doesn't sound
> all that slow to me.
>
> The ultimate solution might be to rethink your table designs ...
>
>             regards, tom lane
>

Badly written the query may be, but I do have the opportunity to change it.
Part of the problem is that I cannot come up with a better way of writing
it.

What about the discrepancy between the estimated row count and the actual
row count for that index access?
"Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26 rows=2870 width=0)
(actual time=41.306..41.306 rows=83538 loops=1)"

Is there anything I can do to influence that (not that it is likely to
change the plan, but...). I vacuumed and analyzed after I created the index,
so the stats should be at least be close (with stats target set to 1000
there).

--
View this message in context: http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12905186
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.