Thread: Query Plan choice with timestamps

Query Plan choice with timestamps

From
Giorgio Valoti
Date:
Hi, I have a timestamptz field that I want to use with a query, but I
don’t need the full timestamp resolution, so I’ve created a
day_trunc(timestamptz) immutable function which I’ll use with the
query and with a new index:

logs=> create index test_idx on blackbox (day_trunc(ts));

However, the query plan doesn’t use the index:

logs=>explain select count(*) from blackbox group by day_trunc(ts)
order by day_trunc(ts);
                                         QUERY PLAN
------------------------------------------------------------------------------------------
  GroupAggregate  (cost=98431.58..119773.92 rows=74226 width=8)
    ->  Sort  (cost=98431.58..99050.92 rows=247736 width=8)
          Sort Key: (day_trunc(ts))
          ->  Seq Scan on blackbox  (cost=0.00..72848.36 rows=247736
width=8)
(4 rows)

while with this index:

logs=>create index test_2_idx on blackbox (ts);

the query plan is the expected one:

logs=>explain select count(*) from blackbox group by ts order by ts;
                                         QUERY PLAN
------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.00..19109.66 rows=74226 width=8)
    ->  Index Scan using test_2_idx on blackbox  (cost=0.00..16943.16
rows=247736 width=8)

But I fail to see why. Any hints?

Thank you in advance
--
Giorgio Valoti

Re: Query Plan choice with timestamps

From
Richard Huxton
Date:
Giorgio Valoti wrote:
> Hi, I have a timestamptz field that I want to use with a query, but I
> don’t need the full timestamp resolution, so I’ve created a
> day_trunc(timestamptz) immutable function which I’ll use with the query
> and with a new index:
>
> logs=> create index test_idx on blackbox (day_trunc(ts));
>
> However, the query plan doesn’t use the index:

Does it use it ever? e.g. with
   SELECT * FROM blackbox WHERE day_trunk(ts) = '...'

--
   Richard Huxton
   Archonet Ltd

Re: Query Plan choice with timestamps

From
Giorgio Valoti
Date:
On 07/ago/08, at 10:35, Richard Huxton wrote:

> Giorgio Valoti wrote:
>> Hi, I have a timestamptz field that I want to use with a query, but
>> I don’t need the full timestamp resolution, so I’ve created a
>> day_trunc(timestamptz) immutable function which I’ll use with the
>> query and with a new index:
>> logs=> create index test_idx on blackbox (day_trunc(ts));
>> However, the query plan doesn’t use the index:
>
> Does it use it ever? e.g. with
>  SELECT * FROM blackbox WHERE day_trunk(ts) = '...'

It’s used:

logs=> explain select * from blackbox where day_trunc(ts) =
day_trunc(now());
                                 QUERY PLAN
---------------------------------------------------------------------------
  Bitmap Heap Scan on blackbox  (cost=22.38..3998.43 rows=1239
width=264)
    Recheck Cond: (day_trunc(ts) = day_trunc(now()))
    ->  Bitmap Index Scan on date_idx  (cost=0.00..22.07 rows=1239
width=0)
          Index Cond: (day_trunc(ts) = day_trunc(now()))

--
Giorgio Valoti

Re: Query Plan choice with timestamps

From
Richard Huxton
Date:
Giorgio Valoti wrote:
>
> On 07/ago/08, at 10:35, Richard Huxton wrote:
>
>> Giorgio Valoti wrote:
>>> Hi, I have a timestamptz field that I want to use with a query, but I
>>> don’t need the full timestamp resolution, so I’ve created a
>>> day_trunc(timestamptz) immutable function which I’ll use with the
>>> query and with a new index:
>>> logs=> create index test_idx on blackbox (day_trunc(ts));
>>> However, the query plan doesn’t use the index:
>>
>> Does it use it ever? e.g. with
>>  SELECT * FROM blackbox WHERE day_trunk(ts) = '...'
>
> It’s used:
[snip]

OK - so the index is working.

If you disable seq-scans before running the query, does it use it then?

SET enable_seqscan = off;

 > logs=>explain select count(*) from blackbox group by day_trunc(ts) order
 > by day_trunc(ts);
 >                                         QUERY PLAN
 >
------------------------------------------------------------------------------------------

 >
 >  GroupAggregate  (cost=98431.58..119773.92 rows=74226 width=8)

In particular:
1. Is the estimated cost more or less than 119773.92?
2. How does that match the actual time taken?

--
   Richard Huxton
   Archonet Ltd

Re: Query Plan choice with timestamps

From
Tom Lane
Date:
Giorgio Valoti <giorgio_v@mac.com> writes:
>   GroupAggregate  (cost=98431.58..119773.92 rows=74226 width=8)
>     ->  Sort  (cost=98431.58..99050.92 rows=247736 width=8)
>           Sort Key: (day_trunc(ts))
>           ->  Seq Scan on blackbox  (cost=0.00..72848.36 rows=247736 width=8)

>   GroupAggregate  (cost=0.00..19109.66 rows=74226 width=8)
>     ->  Index Scan using test_2_idx on blackbox  (cost=0.00..16943.16 rows=247736 width=8)

These numbers seem pretty bogus: there is hardly any scenario in which a
full-table indexscan should be costed as significantly cheaper than a
seqscan.  Have you put in silly values for random_page_cost?

If you haven't mucked with the cost parameters, the only way I can think
of to get this result is to have an enormously bloated table that's
mostly empty.  Maybe you need to review your vacuuming procedures.

            regards, tom lane

Re: Query Plan choice with timestamps

From
Giorgio Valoti
Date:
On 07/ago/08, at 17:50, Tom Lane wrote:

> Giorgio Valoti <giorgio_v@mac.com> writes:
>>  GroupAggregate  (cost=98431.58..119773.92 rows=74226 width=8)
>>    ->  Sort  (cost=98431.58..99050.92 rows=247736 width=8)
>>          Sort Key: (day_trunc(ts))
>>          ->  Seq Scan on blackbox  (cost=0.00..72848.36 rows=247736
>> width=8)
>
>>  GroupAggregate  (cost=0.00..19109.66 rows=74226 width=8)
>>    ->  Index Scan using test_2_idx on blackbox
>> (cost=0.00..16943.16 rows=247736 width=8)
>
> These numbers seem pretty bogus: there is hardly any scenario in
> which a
> full-table indexscan should be costed as significantly cheaper than a
> seqscan.  Have you put in silly values for random_page_cost?

No,

>
>
> If you haven't mucked with the cost parameters, the only way I can
> think
> of to get this result is to have an enormously bloated table that's
> mostly empty.  Maybe you need to review your vacuuming procedures.

I’ll review them.

Thank you
--
Giorgio Valoti

Re: Query Plan choice with timestamps

From
Giorgio Valoti
Date:
On 07/ago/08, at 14:36, Richard Huxton wrote:

> Giorgio Valoti wrote:
>> On 07/ago/08, at 10:35, Richard Huxton wrote:
>>> Giorgio Valoti wrote:
>>>> Hi, I have a timestamptz field that I want to use with a query,
>>>> but I don’t need the full timestamp resolution, so I’ve created a
>>>> day_trunc(timestamptz) immutable function which I’ll use with the
>>>> query and with a new index:
>>>> logs=> create index test_idx on blackbox (day_trunc(ts));
>>>> However, the query plan doesn’t use the index:
>>>
>>> Does it use it ever? e.g. with
>>> SELECT * FROM blackbox WHERE day_trunk(ts) = '...'
>> It’s used:
> [snip]
>
> OK - so the index is working.
>
> If you disable seq-scans before running the query, does it use it
> then?
>
> SET enable_seqscan = off;

Yes

> […]
>
> In particular:
> 1. Is the estimated cost more or less than 119773.92?

                                        QUERY PLAN
-----------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.00..122309.32 rows=74226 width=8)
    ->  Index Scan using date_idx on blackbox  (cost=0.00..101586.31
rows=247736 width=8)

>
> 2. How does that match the actual time taken?

                                                                 QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.00..122309.32 rows=74226 width=8) (actual
time=0.222..1931.651 rows=428 loops=1)
    ->  Index Scan using date_idx on blackbox  (cost=0.00..101586.31
rows=247736 width=8) (actual time=0.072..1861.367 rows=247736 loops=1)
  Total runtime: 1931.782 ms

But I haven’t revised the vacuum settings.

Thank you
--
Giorgio Valoti



Re: Query Plan choice with timestamps

From
Giorgio Valoti
Date:
On 07/ago/08, at 20:37, Giorgio Valoti wrote:

>
> […]
>
>>
>>
>> If you haven't mucked with the cost parameters, the only way I can
>> think
>> of to get this result is to have an enormously bloated table that's
>> mostly empty.  Maybe you need to review your vacuuming procedures.
>
> I’ll review them.

I’ve manually vacuum’ed the table:
logs=> VACUUM FULL verbose analyze blackbox;
INFO:  vacuuming "public.blackbox"
INFO:  "blackbox": found 0 removable, 247736 nonremovable row versions
in 8436 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 137 to 1210 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 894432 bytes.
0 pages are or will become empty, including 0 at the end of the table.
2926 pages containing 564212 free bytes are potential move destinations.
CPU 0.00s/0.04u sec elapsed 0.04 sec.
INFO:  index "blackbox_pkey" now contains 247736 row versions in 1602
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.01 sec.
INFO:  index "vhost_idx" now contains 247736 row versions in 1226 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "remoteip_idx" now contains 247736 row versions in 682
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "date_idx" now contains 247736 row versions in 547 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "test_2_idx" now contains 247736 row versions in 682 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "blackbox": moved 0 row versions, truncated 8436 to 8436 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_45532"
INFO:  "pg_toast_45532": found 0 removable, 0 nonremovable row
versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_45532_index" now contains 0 row versions in 1
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.blackbox"
INFO:  "blackbox": scanned 3000 of 8436 pages, containing 87941 live
rows and 0 dead rows; 3000 rows in sample, 247290 estimated total rows
VACUUM

And here the explain results:
logs=> explain select count(*) from blackbox group by day_trunc(ts)
order by day_trunc(ts);
                                  QUERY PLAN
-----------------------------------------------------------------------------
  Sort  (cost=74210.52..74211.54 rows=407 width=8)
    Sort Key: (day_trunc(ts))
    ->  HashAggregate  (cost=74086.04..74192.88 rows=407 width=8)
          ->  Seq Scan on blackbox  (cost=0.00..72847.36 rows=247736
width=8)
(4 rows)

logs=> explain select count(*) from blackbox group by ts order by ts;
                                         QUERY PLAN
------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.00..18381.54 rows=77738 width=8)
    ->  Index Scan using test_2_idx on blackbox  (cost=0.00..16171.13
rows=247736 width=8)
(2 rows)

Maybe it’s the silly test queries that prove nothing:

logs=> explain select * from blackbox  where day_trunc(ts) =
day_trunc(now());
                                   QUERY PLAN
-------------------------------------------------------------------------------
  Index Scan using date_idx on blackbox  (cost=0.50..158.65 rows=569
width=237)
    Index Cond: (day_trunc(ts) = day_trunc(now()))
(2 rows)

Ciao
--
Giorgio Valoti

Re: Query Plan choice with timestamps

From
Tom Lane
Date:
Giorgio Valoti <giorgio_v@mac.com> writes:
> On 07/ago/08, at 17:50, Tom Lane wrote:
>> These numbers seem pretty bogus: there is hardly any scenario in
>> which a
>> full-table indexscan should be costed as significantly cheaper than a
>> seqscan.  Have you put in silly values for random_page_cost?

> No,

I looked at it more closely and realized that the cost discrepancy is
from the evaluation of the function: having to evaluate a SQL or plpgsql
function 247736 times more than explains the cost estimate differential
compared to a query that involves no function call.  Some experiments
here suggest that it hardly matters whether the query uses indexscan or
seqscan because the time is dominated by the function calls anyway.

            regards, tom lane

Re: Query Plan choice with timestamps

From
Giorgio Valoti
Date:
On 07/ago/08, at 23:01, Tom Lane wrote:

> Giorgio Valoti <giorgio_v@mac.com> writes:
>> On 07/ago/08, at 17:50, Tom Lane wrote:
>>> These numbers seem pretty bogus: there is hardly any scenario in
>>> which a
>>> full-table indexscan should be costed as significantly cheaper
>>> than a
>>> seqscan.  Have you put in silly values for random_page_cost?
>
>> No,
>
> I looked at it more closely and realized that the cost discrepancy is
> from the evaluation of the function: having to evaluate a SQL or
> plpgsql
> function 247736 times more than explains the cost estimate
> differential
> compared to a query that involves no function call.  Some experiments
> here suggest that it hardly matters whether the query uses indexscan
> or
> seqscan because the time is dominated by the function calls anyway.

I see, thank you Tom. Could it be a good idea adding some notes about
it in <http://www.postgresql.org/docs/8.3/interactive/indexes-expressional.html
 >? As you said, since the function call dominates the query cost, in
this case, I think there’s no point to use an index expression.

Ciao
--
Giorgio Valoti