Subplan result caching - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Subplan result caching
Date
Msg-id daceb327-9a20-51f4-fe6c-60b898692305@iki.fi
Whole thread Raw
Responses Re: Subplan result caching  (Laurenz Albe <laurenz.albe@cybertec.at>)
Re: Subplan result caching  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Subplan result caching  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Subplan result caching  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Re: Subplan result caching  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
Hi,

I've been working on a patch to add a little cache to SubPlans, to speed 
up queries with correlated subqueries, where the same subquery is 
currently executed multiple times with the same parameters. The idea is 
to cache the result of the subplan, with the correlation vars as the 
cache key.

That helps a lot, if you happen to have that kind of a query. I bumped 
into this while looking at TPC-DS query 6:

select a.ca_state state, count(*) cnt
  from customer_address a
      ,customer c
      ,store_sales s
      ,date_dim d
      ,item i
  where       a.ca_address_sk = c.c_current_addr_sk
         and c.c_customer_sk = s.ss_customer_sk
         and s.ss_sold_date_sk = d.d_date_sk
         and s.ss_item_sk = i.i_item_sk
         and d.d_month_seq =
              (select distinct (d_month_seq)
               from date_dim
                where d_year = 2000
                 and d_moy = 5 )
         and i.i_current_price > 1.2 *
              (select avg(j.i_current_price)
              from item j
              where j.i_category = i.i_category)
  group by a.ca_state
  having count(*) >= 10
  order by cnt

The first subquery is uncorrelated, and is already handled efficiently 
as an InitPlan. This patch helps with the second subquery. There are 
only 11 different categories, but we currently re-execute it for every 
row of the outer query, over 26000 times. (I think I have about 1 GB of 
data in my little database I've been testing with, I'm not sure how this 
would scale with the amount of data.) With this patch, it's only 
executed 11 times, the cache avoids the rest of the executions. That 
brings the runtime, on my laptop, from about 30 s to 120 ms.

For this particular query, I actually wish we could pull up that 
subquery, instead. I did some investigation into that last summer, 
https://www.postgresql.org/message-id/67e353e8-c20e-7980-a282-538779edf25b%40iki.fi, 
but that's a much bigger project. In any case, even if the planner was 
able to pull up subqueries in more cases, a cache like this would still 
be helpful for those cases where pulling up was still not possible.

Thoughts?

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: SCRAM with channel binding downgrade attack
Next
From: Laurenz Albe
Date:
Subject: Re: Subplan result caching