Home > mailing lists

Re: BRIN index which is much faster never chosen by planner - Mailing list pgsql-hackers

From	Jeremy Finzel
Subject	Re: BRIN index which is much faster never chosen by planner
Date	October 11, 2019 17:08:05
Msg-id	CAMa1XUhdoKchMmbeSef7HhRM44-_00eMt8LPeBuwPA4UrSNc-Q@mail.gmail.com Whole thread Raw
In response to	Re: BRIN index which is much faster never chosen by planner (David Rowley <david.rowley@2ndquadrant.com>)
Responses	Re: BRIN index which is much faster never chosen by planner
List	pgsql-hackers

Tree view

On Thu, Oct 10, 2019 at 7:22 PM David Rowley <david.rowley@2ndquadrant.com> wrote:

The planner might be able to get a better estimate on the number of
matching rows if the now() - interval '10 days' expression was
replaced with 'now'::timestamptz - interval '10 days'. However, care
would need to be taken to ensure the plan is never prepared since
'now' is evaluated during parse. The same care must be taken when
creating views, functions, stored procedures and the like.

You are on to something here I think with the now() function, even if above suggestion is not exactly right as you said further down. I am finding a hard-coded timestamp gives the right query plan. I also tested same with even bigger window (last 16 days) and it yet still chooses the brin index.

foo_prod=# EXPLAIN
foo_prod-# SELECT
foo_prod-# category, source, MIN(rec_insert_time) OVER (partition by source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition by source order by rec_insert_time) AS last_source_time
foo_prod-# FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
foo_prod(# category, source(field1) AS source, rec_insert_time
foo_prod(# FROM log_table l
foo_prod(# INNER JOIN public.small_join_table filter ON filter.category = l.category
foo_prod(# WHERE field1 IS NOT NULL AND l.category = 'music'
foo_prod(# AND l.rec_insert_time >= now() - interval '10 days'
foo_prod(# ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=24436329.10..24436343.56 rows=643 width=120)
-> Sort (cost=24436329.10..24436330.70 rows=643 width=104)
Sort Key: unique_cases.source, unique_cases.rec_insert_time
-> Subquery Scan on unique_cases (cost=24436286.24..24436299.10 rows=643 width=104)
-> Unique (cost=24436286.24..24436292.67 rows=643 width=124)
-> Sort (cost=24436286.24..24436287.85 rows=643 width=124)
Sort Key: l.brand_id, l.last_change, l.log_id, l.rec_insert_time DESC
-> Nested Loop (cost=0.00..24436256.25 rows=643 width=124)
Join Filter: ((l.category)::text = filter.category)
-> Seq Scan on small_join_table filter (cost=0.00..26.99 rows=1399 width=8)
-> Materialize (cost=0.00..24420487.02 rows=643 width=99)
-> Seq Scan on log_table l (cost=0.00..24420483.80 rows=643 width=99)
Filter: ((field1 IS NOT NULL) AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10 days'::interval)))
(13 rows)

foo_prod=# SELECT now() - interval '10 days';
?column?
-------------------------------
2019-10-01 08:20:38.115471-05
(1 row)

foo_prod=# EXPLAIN
SELECT
category, source, MIN(rec_insert_time) OVER (partition by source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition by source order by rec_insert_time) AS last_source_time
FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
category, source(field1) AS source, rec_insert_time
FROM log_table l
INNER JOIN public.small_join_table filter ON filter.category = l.category
WHERE field1 IS NOT NULL AND l.category = 'music'
AND l.rec_insert_time >= '2019-10-01 08:20:38.115471-05'
ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=19664576.17..19664590.63 rows=643 width=120)
-> Sort (cost=19664576.17..19664577.77 rows=643 width=104)
Sort Key: unique_cases.source, unique_cases.rec_insert_time
-> Subquery Scan on unique_cases (cost=19664533.31..19664546.17 rows=643 width=104)
-> Unique (cost=19664533.31..19664539.74 rows=643 width=124)
-> Sort (cost=19664533.31..19664534.92 rows=643 width=124)
Sort Key: l.brand_id, l.last_change, l.log_id, l.rec_insert_time DESC
-> Nested Loop (cost=3181.19..19664503.32 rows=643 width=124)
-> Gather (cost=3180.91..19662574.92 rows=643 width=99)
Workers Planned: 3
-> Parallel Bitmap Heap Scan on log_table l (cost=2180.91..19661510.62 rows=207 width=99)
Recheck Cond: (rec_insert_time >= '2019-10-01 08:20:38.115471-05'::timestamp with time zone)
Filter: ((field1 IS NOT NULL) AND (category = 'music'::name))
-> Bitmap Index Scan on rec_insert_time_brin_1000 (cost=0.00..2180.75 rows=142602171 width=0)
Index Cond: (rec_insert_time >= '2019-10-01 08:20:38.115471-05'::timestamp with time zone)

Let me know if this rings any bells! I will respond to other comments with other replies.

Thanks,

Jeremy

pgsql-hackers by date:

From: Konstantin Knizhnik
Date: 11 October 2019, 16:50:07
Subject: Re: [Proposal] Global temporary tables

From: Jeremy Finzel
Date: 11 October 2019, 17:19:33
Subject: Re: BRIN index which is much faster never chosen by planner

Re: BRIN index which is much faster never chosen by planner - Mailing list pgsql-hackers

Previous

Next