Re: BRIN index which is much faster never chosen by planner - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: BRIN index which is much faster never chosen by planner
Date
Msg-id 20191011184620.fopsssbxwmcouitl@development
Whole thread Raw
In response to Re: BRIN index which is much faster never chosen by planner  (Jeremy Finzel <finzelj@gmail.com>)
List pgsql-hackers
On Fri, Oct 11, 2019 at 09:08:05AM -0500, Jeremy Finzel wrote:
>On Thu, Oct 10, 2019 at 7:22 PM David Rowley <david.rowley@2ndquadrant.com>
>wrote:
>
>> The planner might be able to get a better estimate on the number of
>> matching rows if the now() - interval '10 days' expression was
>> replaced with 'now'::timestamptz - interval '10 days'. However, care
>> would need to be taken to ensure the plan is never prepared since
>> 'now' is evaluated during parse. The same care must be taken when
>> creating views, functions, stored procedures and the like.
>>
>
>You are on to something here I think with the now() function, even if above
>suggestion is not exactly right as you said further down.  I am finding a
>hard-coded timestamp gives the right query plan.  I also tested same with
>even bigger window (last 16 days) and it yet still chooses the brin index.
>
>foo_prod=# EXPLAIN
>foo_prod-# SELECT
>foo_prod-#  category, source, MIN(rec_insert_time) OVER (partition by
>source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time)
>OVER (partition by source order by rec_insert_time) AS last_source_time
>foo_prod-# FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
>foo_prod(# category, source(field1) AS source, rec_insert_time
>foo_prod(# FROM log_table l
>foo_prod(# INNER JOIN public.small_join_table filter ON filter.category =
>l.category
>foo_prod(# WHERE field1 IS NOT NULL AND l.category = 'music'
>foo_prod(# AND l.rec_insert_time >= now() - interval '10 days'
>foo_prod(# ORDER BY brand_id, last_change, log_id, rec_insert_time DESC)
>unique_cases;
>
>   QUERY PLAN

>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> WindowAgg  (cost=24436329.10..24436343.56 rows=643 width=120)
>   ->  Sort  (cost=24436329.10..24436330.70 rows=643 width=104)
>         Sort Key: unique_cases.source, unique_cases.rec_insert_time
>         ->  Subquery Scan on unique_cases  (cost=24436286.24..24436299.10
>rows=643 width=104)
>               ->  Unique  (cost=24436286.24..24436292.67 rows=643
>width=124)
>                     ->  Sort  (cost=24436286.24..24436287.85 rows=643
>width=124)
>                           Sort Key: l.brand_id, l.last_change, l.log_id,
>l.rec_insert_time DESC
>                           ->  Nested Loop  (cost=0.00..24436256.25
>rows=643 width=124)
>                                 Join Filter: ((l.category)::text =
>filter.category)
>                                 ->  Seq Scan on small_join_table filter
> (cost=0.00..26.99 rows=1399 width=8)
>                                 ->  Materialize  (cost=0.00..24420487.02
>rows=643 width=99)
>                                       ->  Seq Scan on log_table l
> (cost=0.00..24420483.80 rows=643 width=99)
>                                             Filter: ((field1 IS NOT NULL)
>AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10
>days'::interval)))
>(13 rows)
>
>foo_prod=# SELECT now() - interval '10 days';
>           ?column?
>-------------------------------
> 2019-10-01 08:20:38.115471-05
>(1 row)
>
>foo_prod=# EXPLAIN
>SELECT
> category, source, MIN(rec_insert_time) OVER (partition by source order by
>rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition
>by source order by rec_insert_time) AS last_source_time
>FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
>category, source(field1) AS source, rec_insert_time
>FROM log_table l
>INNER JOIN public.small_join_table filter ON filter.category = l.category
>WHERE field1 IS NOT NULL AND l.category = 'music'
>AND l.rec_insert_time >= '2019-10-01 08:20:38.115471-05'
>ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;
>
>  QUERY PLAN

>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> WindowAgg  (cost=19664576.17..19664590.63 rows=643 width=120)
>   ->  Sort  (cost=19664576.17..19664577.77 rows=643 width=104)
>         Sort Key: unique_cases.source, unique_cases.rec_insert_time
>         ->  Subquery Scan on unique_cases  (cost=19664533.31..19664546.17
>rows=643 width=104)
>               ->  Unique  (cost=19664533.31..19664539.74 rows=643
>width=124)
>                     ->  Sort  (cost=19664533.31..19664534.92 rows=643
>width=124)
>                           Sort Key: l.brand_id, l.last_change, l.log_id,
>l.rec_insert_time DESC
>                           ->  Nested Loop  (cost=3181.19..19664503.32
>rows=643 width=124)
>                                 ->  Gather  (cost=3180.91..19662574.92
>rows=643 width=99)
>                                       Workers Planned: 3
>                                       ->  Parallel Bitmap Heap Scan on
>log_table l  (cost=2180.91..19661510.62 rows=207 width=99)
>                                             Recheck Cond: (rec_insert_time
>>= '2019-10-01 08:20:38.115471-05'::timestamp with time zone)
>                                             Filter: ((field1 IS NOT NULL)
>AND (category = 'music'::name))
>                                             ->  Bitmap Index Scan on
>rec_insert_time_brin_1000  (cost=0.00..2180.75 rows=142602171 width=0)
>                                                   Index Cond:
>(rec_insert_time >= '2019-10-01 08:20:38.115471-05'::timestamp with time
>zone)
>
>
>Let me know if this rings any bells!  I will respond to other comments with
>other replies.
>

My guess - it's (at least partially) due to cpu_operator_cost,
associated with the now() call. When replaced with a literal, this cost
disappears and so the total query cost decreases.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Connect as multiple users using single client certificate
Next
From: Kyle Bateman
Date:
Subject: Re: Connect as multiple users using single client certificate