Re: CTE materializing sets? - Mailing list pgsql-general

From Виктор Егоров
Subject Re: CTE materializing sets?
Date
Msg-id CAGnEbohkbvxQegibq5SZj+qoD_JERWNbq7xcnihqf+wbcjG-Kw@mail.gmail.com
Whole thread Raw
In response to Re: CTE materializing sets?  (Serge Fonville <serge.fonville@gmail.com>)
List pgsql-general
2012/10/9 Serge Fonville <serge.fonville@gmail.com>:
> This indeed is a very interesting question.
>
> At http://wiki.postgresql.org/wiki/CTEReadme it seems to suggest that a CTE
> is just rewritten and the resulting query is executed.

As was mentioned a couple of times in this list, CTE do have
optimization fence feature (per SQL Standard).
I asked on the #postgresql channel and was pointed, that typically you
get benefits of this feature
when you have to join grouping subquery to itself.

I went and did some tests. Table "attempt" contains e-mail delivery
attempts for the postfix:

# select relname,relpages,reltuples::numeric,pg_size_pretty(pg_relation_size(oid))
from pg_class where relname='attempt';
 relname | relpages | reltuples | pg_size_pretty
---------+----------+-----------+----------------
 attempt |   145117 |   4252530 | 1134 MB


My default work_mem is 1MB on this instance.

First, plain query with 2 subqueries:

# explain (analyze, buffers)
select a.eid, b.eid from
  (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg,
max(tstamp) as maxmsg from attempt group by recipient_email_id) a,
  (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg,
max(tstamp) as maxmsg from attempt group by recipient_email_id) b
where a.minmsg = b.maxmsg;

 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1861911.11..1953183.16 rows=6067386 width=16)
(actual time=65758.378..66115.400 rows=59845 loops=1)
   Merge Cond: (a.minmsg = b.maxmsg)
   Buffers: shared hit=1590 read=288644, temp read=103129 written=103134
   ->  Sort  (cost=930955.56..931042.64 rows=34835 width=16) (actual
time=30242.503..30370.379 rows=212434 loops=1)
         Sort Key: a.minmsg
         Sort Method: external merge  Disk: 5400kB
         Buffers: shared hit=779 read=144338, temp read=51481 written=51481
         ->  Subquery Scan on a  (cost=873875.76..927729.06 rows=34835
width=16) (actual time=26744.434..30008.996 rows=212434 loops=1)
               Buffers: shared hit=779 read=144338, temp read=50561
written=50561
               ->  GroupAggregate  (cost=873875.76..927380.71
rows=34835 width=16) (actual time=26744.433..29951.390 rows=212434
loops=1)
                     Buffers: shared hit=779 read=144338, temp
read=50561 written=50561
                     ->  Sort  (cost=873875.76..884507.08 rows=4252528
width=16) (actual time=26744.273..28296.850 rows=4255749 loops=1)
                           Sort Key: public.attempt.recipient_email_id
                           Sort Method: external merge  Disk: 108168kB
                           Buffers: shared hit=779 read=144338, temp
read=50561 written=50561
                           ->  Seq Scan on attempt
(cost=0.00..187642.28 rows=4252528 width=16) (actual
time=0.010..13618.612 rows=4255749 loops=1)
                                 Buffers: shared hit=779 read=144338
   ->  Materialize  (cost=930955.56..931129.73 rows=34835 width=16)
(actual time=35515.860..35640.974 rows=214271 loops=1)
         Buffers: shared hit=811 read=144306, temp read=51648 written=51653
         ->  Sort  (cost=930955.56..931042.64 rows=34835 width=16)
(actual time=35515.853..35586.598 rows=210800 loops=1)
               Sort Key: b.maxmsg
               Sort Method: external merge  Disk: 5384kB
               Buffers: shared hit=811 read=144306, temp read=51648
written=51653
               ->  Subquery Scan on b  (cost=873875.76..927729.06
rows=34835 width=16) (actual time=31879.743..35251.218 rows=212434
loops=1)
                     Buffers: shared hit=811 read=144306, temp
read=50561 written=50561
                     ->  GroupAggregate  (cost=873875.76..927380.71
rows=34835 width=16) (actual time=31879.741..35184.965 rows=212434
loops=1)
                           Buffers: shared hit=811 read=144306, temp
read=50561 written=50561
                           ->  Sort  (cost=873875.76..884507.08
rows=4252528 width=16) (actual time=31879.577..33460.975 rows=4255749
loops=1)
                                 Sort Key: public.attempt.recipient_email_id
                                 Sort Method: external merge  Disk: 108168kB
                                 Buffers: shared hit=811 read=144306,
temp read=50561 written=50561
                                 ->  Seq Scan on attempt
(cost=0.00..187642.28 rows=4252528 width=16) (actual
time=0.012..17637.516 rows=4255749 loops=1)
                                       Buffers: shared hit=811 read=144306
 Total runtime: 67611.657 ms
(34 rows)

The source relation is scanned twice. Now, using CTE and it's
materialization feature:

# explain (analyze, buffers)
with msgs as (select recipient_email_id eid, count(*) cnt, min(tstamp)
as minmsg, max(tstamp) as maxmsg from attempt group by
recipient_email_id)
select a.eid, b.eid from msgs a, msgs b where a.minmsg=b.maxmsg;
                                                                QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=935227.10..1026499.15 rows=6067386 width=16)
(actual time=41104.560..41414.873 rows=59845 loops=1)
   Merge Cond: (a.minmsg = b.maxmsg)
   Buffers: shared hit=747 read=144370, temp read=53659 written=53663
   CTE msgs
     ->  GroupAggregate  (cost=873875.76..927380.71 rows=34835
width=16) (actual time=36786.597..40186.758 rows=212434 loops=1)
           Buffers: shared hit=747 read=144370, temp read=50561 written=50561
           ->  Sort  (cost=873875.76..884507.08 rows=4252528 width=16)
(actual time=36786.430..38434.272 rows=4255749 loops=1)
                 Sort Key: attempt.recipient_email_id
                 Sort Method: external merge  Disk: 108168kB
                 Buffers: shared hit=747 read=144370, temp read=50561
written=50561
                 ->  Seq Scan on attempt  (cost=0.00..187642.28
rows=4252528 width=16) (actual time=0.028..23059.290 rows=4255749
loops=1)
                       Buffers: shared hit=747 read=144370
   ->  Sort  (cost=3923.20..4010.28 rows=34835 width=16) (actual
time=40756.464..40836.892 rows=212434 loops=1)
         Sort Key: a.minmsg
         Sort Method: external merge  Disk: 5400kB
         Buffers: shared hit=747 read=144370, temp read=51481 written=52570
         ->  CTE Scan on msgs a  (cost=0.00..696.70 rows=34835
width=16) (actual time=36786.604..40439.648 rows=212434 loops=1)
               Buffers: shared hit=747 read=144370, temp read=50561
written=51650
   ->  Materialize  (cost=3923.20..4097.37 rows=34835 width=16)
(actual time=348.080..473.192 rows=214271 loops=1)
         Buffers: temp read=2178 written=1093
         ->  Sort  (cost=3923.20..4010.28 rows=34835 width=16) (actual
time=348.073..418.229 rows=210800 loops=1)
               Sort Key: b.maxmsg
               Sort Method: external merge  Disk: 5384kB
               Buffers: temp read=2178 written=1093
               ->  CTE Scan on msgs b  (cost=0.00..696.70 rows=34835
width=16) (actual time=0.055..75.441 rows=212434 loops=1)
                     Buffers: temp read=1091 written=1
 Total runtime: 43122.214 ms
(27 rows)

The source relation is scanned only once, but system does a lot of IO
with temp files.
Now small tweak and another try:

# set work_mem = '300MB';
SET
# explain (analyze, buffers)
with msgs as (select recipient_email_id eid, count(*) cnt, min(tstamp)
as minmsg, max(tstamp) as maxmsg from attempt group by
recipient_email_id)
select a.eid, b.eid from msgs a, msgs b where a.minmsg=b.maxmsg;
                                                             QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=237165.30..328350.27 rows=6067386 width=16) (actual
time=31494.137..31768.606 rows=59845 loops=1)
   Merge Cond: (a.minmsg = b.maxmsg)
   Buffers: shared hit=843 read=144274
   CTE msgs
     ->  HashAggregate  (cost=230167.56..230515.91 rows=34835
width=16) (actual time=30396.065..30520.867 rows=212434 loops=1)
           Buffers: shared hit=843 read=144274
           ->  Seq Scan on attempt  (cost=0.00..187642.28 rows=4252528
width=16) (actual time=0.077..25349.466 rows=4255749 loops=1)
                 Buffers: shared hit=843 read=144274
   ->  Sort  (cost=3324.70..3411.78 rows=34835 width=16) (actual
time=31115.205..31247.966 rows=212434 loops=1)
         Sort Key: a.minmsg
         Sort Method: quicksort  Memory: 16102kB
         Buffers: shared hit=843 read=144274
         ->  CTE Scan on msgs a  (cost=0.00..696.70 rows=34835
width=16) (actual time=30396.129..30780.369 rows=212434 loops=1)
               Buffers: shared hit=843 read=144274
   ->  Sort  (cost=3324.70..3411.78 rows=34835 width=16) (actual
time=377.917..424.277 rows=214271 loops=1)
         Sort Key: b.maxmsg
         Sort Method: quicksort  Memory: 16102kB
         ->  CTE Scan on msgs b  (cost=0.00..696.70 rows=34835
width=16) (actual time=0.005..205.626 rows=212434 loops=1)
 Total runtime: 31816.000 ms
(19 rows)

As long as I read the output, source table is scanned only once, all
further operations are done in RAM.
P.S. Not a perfect query, but shows the feature.


--
Victor Y. Yegorov


pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: CTE materializing sets?
Next
From: Willy-Bas Loos
Date:
Subject: something better than pgtrgm?