Re: BRIN index which is much faster never chosen by planner - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: BRIN index which is much faster never chosen by planner |
Date | |
Msg-id | 20191011184620.fopsssbxwmcouitl@development Whole thread Raw |
In response to | Re: BRIN index which is much faster never chosen by planner (Jeremy Finzel <finzelj@gmail.com>) |
List | pgsql-hackers |
On Fri, Oct 11, 2019 at 09:08:05AM -0500, Jeremy Finzel wrote: >On Thu, Oct 10, 2019 at 7:22 PM David Rowley <david.rowley@2ndquadrant.com> >wrote: > >> The planner might be able to get a better estimate on the number of >> matching rows if the now() - interval '10 days' expression was >> replaced with 'now'::timestamptz - interval '10 days'. However, care >> would need to be taken to ensure the plan is never prepared since >> 'now' is evaluated during parse. The same care must be taken when >> creating views, functions, stored procedures and the like. >> > >You are on to something here I think with the now() function, even if above >suggestion is not exactly right as you said further down. I am finding a >hard-coded timestamp gives the right query plan. I also tested same with >even bigger window (last 16 days) and it yet still chooses the brin index. > >foo_prod=# EXPLAIN >foo_prod-# SELECT >foo_prod-# category, source, MIN(rec_insert_time) OVER (partition by >source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time) >OVER (partition by source order by rec_insert_time) AS last_source_time >foo_prod-# FROM (SELECT DISTINCT ON (brand_id, last_change, log_id) >foo_prod(# category, source(field1) AS source, rec_insert_time >foo_prod(# FROM log_table l >foo_prod(# INNER JOIN public.small_join_table filter ON filter.category = >l.category >foo_prod(# WHERE field1 IS NOT NULL AND l.category = 'music' >foo_prod(# AND l.rec_insert_time >= now() - interval '10 days' >foo_prod(# ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) >unique_cases; > > QUERY PLAN >------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > WindowAgg (cost=24436329.10..24436343.56 rows=643 width=120) > -> Sort (cost=24436329.10..24436330.70 rows=643 width=104) > Sort Key: unique_cases.source, unique_cases.rec_insert_time > -> Subquery Scan on unique_cases (cost=24436286.24..24436299.10 >rows=643 width=104) > -> Unique (cost=24436286.24..24436292.67 rows=643 >width=124) > -> Sort (cost=24436286.24..24436287.85 rows=643 >width=124) > Sort Key: l.brand_id, l.last_change, l.log_id, >l.rec_insert_time DESC > -> Nested Loop (cost=0.00..24436256.25 >rows=643 width=124) > Join Filter: ((l.category)::text = >filter.category) > -> Seq Scan on small_join_table filter > (cost=0.00..26.99 rows=1399 width=8) > -> Materialize (cost=0.00..24420487.02 >rows=643 width=99) > -> Seq Scan on log_table l > (cost=0.00..24420483.80 rows=643 width=99) > Filter: ((field1 IS NOT NULL) >AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10 >days'::interval))) >(13 rows) > >foo_prod=# SELECT now() - interval '10 days'; > ?column? >------------------------------- > 2019-10-01 08:20:38.115471-05 >(1 row) > >foo_prod=# EXPLAIN >SELECT > category, source, MIN(rec_insert_time) OVER (partition by source order by >rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition >by source order by rec_insert_time) AS last_source_time >FROM (SELECT DISTINCT ON (brand_id, last_change, log_id) >category, source(field1) AS source, rec_insert_time >FROM log_table l >INNER JOIN public.small_join_table filter ON filter.category = l.category >WHERE field1 IS NOT NULL AND l.category = 'music' >AND l.rec_insert_time >= '2019-10-01 08:20:38.115471-05' >ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases; > > QUERY PLAN >----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > WindowAgg (cost=19664576.17..19664590.63 rows=643 width=120) > -> Sort (cost=19664576.17..19664577.77 rows=643 width=104) > Sort Key: unique_cases.source, unique_cases.rec_insert_time > -> Subquery Scan on unique_cases (cost=19664533.31..19664546.17 >rows=643 width=104) > -> Unique (cost=19664533.31..19664539.74 rows=643 >width=124) > -> Sort (cost=19664533.31..19664534.92 rows=643 >width=124) > Sort Key: l.brand_id, l.last_change, l.log_id, >l.rec_insert_time DESC > -> Nested Loop (cost=3181.19..19664503.32 >rows=643 width=124) > -> Gather (cost=3180.91..19662574.92 >rows=643 width=99) > Workers Planned: 3 > -> Parallel Bitmap Heap Scan on >log_table l (cost=2180.91..19661510.62 rows=207 width=99) > Recheck Cond: (rec_insert_time >>= '2019-10-01 08:20:38.115471-05'::timestamp with time zone) > Filter: ((field1 IS NOT NULL) >AND (category = 'music'::name)) > -> Bitmap Index Scan on >rec_insert_time_brin_1000 (cost=0.00..2180.75 rows=142602171 width=0) > Index Cond: >(rec_insert_time >= '2019-10-01 08:20:38.115471-05'::timestamp with time >zone) > > >Let me know if this rings any bells! I will respond to other comments with >other replies. > My guess - it's (at least partially) due to cpu_operator_cost, associated with the now() call. When replaced with a literal, this cost disappears and so the total query cost decreases. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: