Home > mailing lists

Subplan result caching - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Subplan result caching
Date	May 23, 2018 12:31:40
Msg-id	daceb327-9a20-51f4-fe6c-60b898692305@iki.fi Whole thread Raw
Responses	Re: Subplan result caching Re: Subplan result caching Re: Subplan result caching Re: Subplan result caching Re: Subplan result caching
List	pgsql-hackers

Tree view

Hi,

I've been working on a patch to add a little cache to SubPlans, to speed 
up queries with correlated subqueries, where the same subquery is 
currently executed multiple times with the same parameters. The idea is 
to cache the result of the subplan, with the correlation vars as the 
cache key.

That helps a lot, if you happen to have that kind of a query. I bumped 
into this while looking at TPC-DS query 6:

select a.ca_state state, count(*) cnt
  from customer_address a
      ,customer c
      ,store_sales s
      ,date_dim d
      ,item i
  where       a.ca_address_sk = c.c_current_addr_sk
         and c.c_customer_sk = s.ss_customer_sk
         and s.ss_sold_date_sk = d.d_date_sk
         and s.ss_item_sk = i.i_item_sk
         and d.d_month_seq =
              (select distinct (d_month_seq)
               from date_dim
                where d_year = 2000
                 and d_moy = 5 )
         and i.i_current_price > 1.2 *
              (select avg(j.i_current_price)
              from item j
              where j.i_category = i.i_category)
  group by a.ca_state
  having count(*) >= 10
  order by cnt

The first subquery is uncorrelated, and is already handled efficiently 
as an InitPlan. This patch helps with the second subquery. There are 
only 11 different categories, but we currently re-execute it for every 
row of the outer query, over 26000 times. (I think I have about 1 GB of 
data in my little database I've been testing with, I'm not sure how this 
would scale with the amount of data.) With this patch, it's only 
executed 11 times, the cache avoids the rest of the executions. That 
brings the runtime, on my laptop, from about 30 s to 120 ms.

For this particular query, I actually wish we could pull up that 
subquery, instead. I did some investigation into that last summer, 
https://www.postgresql.org/message-id/67e353e8-c20e-7980-a282-538779edf25b%40iki.fi, 
but that's a much bigger project. In any case, even if the planner was 
able to pull up subqueries in more cases, a cache like this would still 
be helpful for those cases where pulling up was still not possible.

Thoughts?

- Heikki

Attachment

pgsql-hackers by date:

From: Magnus Hagander
Date: 23 May 2018, 12:15:28
Subject: Re: SCRAM with channel binding downgrade attack

From: Laurenz Albe
Date: 23 May 2018, 13:03:30
Subject: Re: Subplan result caching

Subplan result caching - Mailing list pgsql-hackers

Attachment

Previous

Next