Re: Any way to get nested loop index joins on CTEs? - Mailing list pgsql-performance

From Renan Alves Fonseca
Subject Re: Any way to get nested loop index joins on CTEs?
Date
Msg-id 87seharskl.fsf@gmail.com
Whole thread Raw
List pgsql-performance
Hi,

it is definitively possible to get nested loop joins on successively
aggregated CTEs. However, for the index to be used, it must exist. And
you can only create the index on a real table, not on the intermediate
CTEs.

> WITH series1h AS MATERIALIZED (SELECT generate_series AS ts FROM
> generate_series('1990-01-01 00:00', '1990-12-31 23:59', INTERVAL '1
> hour') ORDER BY ts),
> series15m AS MATERIALIZED (SELECT generate_series AS ts FROM
> generate_series('1990-01-01 00:00', '1990-12-31 23:59', INTERVAL '15
> minutes') ORDER BY ts)
> SELECT count(*) FROM (SELECT h1.ts, count(*) FROM series1h h1 JOIN
> series15m m15 ON (m15.ts > (h1.ts - INTERVAL '1 hour') AND  m15.ts <=
> h1.ts ) GROUP BY h1.ts ORDER BY h1.ts);

Here is an example based on the query above:


CREATE TEMP TABLE series15m AS
    (SELECT
        generate_series('1990-01-01 00:00',
                        '1990-12-31 23:59',
                        INTERVAL '15 minutes')::timestamp as ts,
        random() as your_data
    ORDER BY 1);

CREATE INDEX short_period ON series15m (ts);
CREATE INDEX long_period ON series15m (date_trunc('hour',ts));

EXPLAIN (costs 'off')
WITH series1h AS (SELECT date_trunc('hour',ts) as ts FROM series15m)
SELECT
  h1.ts,
  sum(your_data)
FROM series1h h1
JOIN series15m m15 ON
  (m15.ts >(h1.ts - INTERVAL '1 hour') AND  m15.ts <= h1.ts )
GROUP BY h1.ts
ORDER BY h1.ts ;

                        QUERY PLAN                                                                       
-------------------------------------------------------------------
 GroupAggregate
   Group Key: date_trunc('hour'::text, series15m.ts)
   ->  Nested Loop
         ->  Index Scan using long_period on series15m
         ->  Index Scan using short_period on series15m m15
               Index Cond: (...)


So, the general idea is to create functional indexes (long_period in
this example) on the base table that will cover the aggregate keys of
intermediate CTEs. This approach works as long as the aggregate keys
depend only on one table.

Best regards,
Renan Fonseca



pgsql-performance by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Keeping some tables in cache
Next
From: Алексей Борщёв
Date:
Subject: Planner makes sub-optimal execution plan