Re: CTE materializing sets? - Mailing list pgsql-general
From | Виктор Егоров |
---|---|
Subject | Re: CTE materializing sets? |
Date | |
Msg-id | CAGnEbohkbvxQegibq5SZj+qoD_JERWNbq7xcnihqf+wbcjG-Kw@mail.gmail.com Whole thread Raw |
In response to | Re: CTE materializing sets? (Serge Fonville <serge.fonville@gmail.com>) |
List | pgsql-general |
2012/10/9 Serge Fonville <serge.fonville@gmail.com>: > This indeed is a very interesting question. > > At http://wiki.postgresql.org/wiki/CTEReadme it seems to suggest that a CTE > is just rewritten and the resulting query is executed. As was mentioned a couple of times in this list, CTE do have optimization fence feature (per SQL Standard). I asked on the #postgresql channel and was pointed, that typically you get benefits of this feature when you have to join grouping subquery to itself. I went and did some tests. Table "attempt" contains e-mail delivery attempts for the postfix: # select relname,relpages,reltuples::numeric,pg_size_pretty(pg_relation_size(oid)) from pg_class where relname='attempt'; relname | relpages | reltuples | pg_size_pretty ---------+----------+-----------+---------------- attempt | 145117 | 4252530 | 1134 MB My default work_mem is 1MB on this instance. First, plain query with 2 subqueries: # explain (analyze, buffers) select a.eid, b.eid from (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg, max(tstamp) as maxmsg from attempt group by recipient_email_id) a, (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg, max(tstamp) as maxmsg from attempt group by recipient_email_id) b where a.minmsg = b.maxmsg; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=1861911.11..1953183.16 rows=6067386 width=16) (actual time=65758.378..66115.400 rows=59845 loops=1) Merge Cond: (a.minmsg = b.maxmsg) Buffers: shared hit=1590 read=288644, temp read=103129 written=103134 -> Sort (cost=930955.56..931042.64 rows=34835 width=16) (actual time=30242.503..30370.379 rows=212434 loops=1) Sort Key: a.minmsg Sort Method: external merge Disk: 5400kB Buffers: shared hit=779 read=144338, temp read=51481 written=51481 -> Subquery Scan on a (cost=873875.76..927729.06 rows=34835 width=16) (actual time=26744.434..30008.996 rows=212434 loops=1) Buffers: shared hit=779 read=144338, temp read=50561 written=50561 -> GroupAggregate (cost=873875.76..927380.71 rows=34835 width=16) (actual time=26744.433..29951.390 rows=212434 loops=1) Buffers: shared hit=779 read=144338, temp read=50561 written=50561 -> Sort (cost=873875.76..884507.08 rows=4252528 width=16) (actual time=26744.273..28296.850 rows=4255749 loops=1) Sort Key: public.attempt.recipient_email_id Sort Method: external merge Disk: 108168kB Buffers: shared hit=779 read=144338, temp read=50561 written=50561 -> Seq Scan on attempt (cost=0.00..187642.28 rows=4252528 width=16) (actual time=0.010..13618.612 rows=4255749 loops=1) Buffers: shared hit=779 read=144338 -> Materialize (cost=930955.56..931129.73 rows=34835 width=16) (actual time=35515.860..35640.974 rows=214271 loops=1) Buffers: shared hit=811 read=144306, temp read=51648 written=51653 -> Sort (cost=930955.56..931042.64 rows=34835 width=16) (actual time=35515.853..35586.598 rows=210800 loops=1) Sort Key: b.maxmsg Sort Method: external merge Disk: 5384kB Buffers: shared hit=811 read=144306, temp read=51648 written=51653 -> Subquery Scan on b (cost=873875.76..927729.06 rows=34835 width=16) (actual time=31879.743..35251.218 rows=212434 loops=1) Buffers: shared hit=811 read=144306, temp read=50561 written=50561 -> GroupAggregate (cost=873875.76..927380.71 rows=34835 width=16) (actual time=31879.741..35184.965 rows=212434 loops=1) Buffers: shared hit=811 read=144306, temp read=50561 written=50561 -> Sort (cost=873875.76..884507.08 rows=4252528 width=16) (actual time=31879.577..33460.975 rows=4255749 loops=1) Sort Key: public.attempt.recipient_email_id Sort Method: external merge Disk: 108168kB Buffers: shared hit=811 read=144306, temp read=50561 written=50561 -> Seq Scan on attempt (cost=0.00..187642.28 rows=4252528 width=16) (actual time=0.012..17637.516 rows=4255749 loops=1) Buffers: shared hit=811 read=144306 Total runtime: 67611.657 ms (34 rows) The source relation is scanned twice. Now, using CTE and it's materialization feature: # explain (analyze, buffers) with msgs as (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg, max(tstamp) as maxmsg from attempt group by recipient_email_id) select a.eid, b.eid from msgs a, msgs b where a.minmsg=b.maxmsg; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=935227.10..1026499.15 rows=6067386 width=16) (actual time=41104.560..41414.873 rows=59845 loops=1) Merge Cond: (a.minmsg = b.maxmsg) Buffers: shared hit=747 read=144370, temp read=53659 written=53663 CTE msgs -> GroupAggregate (cost=873875.76..927380.71 rows=34835 width=16) (actual time=36786.597..40186.758 rows=212434 loops=1) Buffers: shared hit=747 read=144370, temp read=50561 written=50561 -> Sort (cost=873875.76..884507.08 rows=4252528 width=16) (actual time=36786.430..38434.272 rows=4255749 loops=1) Sort Key: attempt.recipient_email_id Sort Method: external merge Disk: 108168kB Buffers: shared hit=747 read=144370, temp read=50561 written=50561 -> Seq Scan on attempt (cost=0.00..187642.28 rows=4252528 width=16) (actual time=0.028..23059.290 rows=4255749 loops=1) Buffers: shared hit=747 read=144370 -> Sort (cost=3923.20..4010.28 rows=34835 width=16) (actual time=40756.464..40836.892 rows=212434 loops=1) Sort Key: a.minmsg Sort Method: external merge Disk: 5400kB Buffers: shared hit=747 read=144370, temp read=51481 written=52570 -> CTE Scan on msgs a (cost=0.00..696.70 rows=34835 width=16) (actual time=36786.604..40439.648 rows=212434 loops=1) Buffers: shared hit=747 read=144370, temp read=50561 written=51650 -> Materialize (cost=3923.20..4097.37 rows=34835 width=16) (actual time=348.080..473.192 rows=214271 loops=1) Buffers: temp read=2178 written=1093 -> Sort (cost=3923.20..4010.28 rows=34835 width=16) (actual time=348.073..418.229 rows=210800 loops=1) Sort Key: b.maxmsg Sort Method: external merge Disk: 5384kB Buffers: temp read=2178 written=1093 -> CTE Scan on msgs b (cost=0.00..696.70 rows=34835 width=16) (actual time=0.055..75.441 rows=212434 loops=1) Buffers: temp read=1091 written=1 Total runtime: 43122.214 ms (27 rows) The source relation is scanned only once, but system does a lot of IO with temp files. Now small tweak and another try: # set work_mem = '300MB'; SET # explain (analyze, buffers) with msgs as (select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg, max(tstamp) as maxmsg from attempt group by recipient_email_id) select a.eid, b.eid from msgs a, msgs b where a.minmsg=b.maxmsg; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=237165.30..328350.27 rows=6067386 width=16) (actual time=31494.137..31768.606 rows=59845 loops=1) Merge Cond: (a.minmsg = b.maxmsg) Buffers: shared hit=843 read=144274 CTE msgs -> HashAggregate (cost=230167.56..230515.91 rows=34835 width=16) (actual time=30396.065..30520.867 rows=212434 loops=1) Buffers: shared hit=843 read=144274 -> Seq Scan on attempt (cost=0.00..187642.28 rows=4252528 width=16) (actual time=0.077..25349.466 rows=4255749 loops=1) Buffers: shared hit=843 read=144274 -> Sort (cost=3324.70..3411.78 rows=34835 width=16) (actual time=31115.205..31247.966 rows=212434 loops=1) Sort Key: a.minmsg Sort Method: quicksort Memory: 16102kB Buffers: shared hit=843 read=144274 -> CTE Scan on msgs a (cost=0.00..696.70 rows=34835 width=16) (actual time=30396.129..30780.369 rows=212434 loops=1) Buffers: shared hit=843 read=144274 -> Sort (cost=3324.70..3411.78 rows=34835 width=16) (actual time=377.917..424.277 rows=214271 loops=1) Sort Key: b.maxmsg Sort Method: quicksort Memory: 16102kB -> CTE Scan on msgs b (cost=0.00..696.70 rows=34835 width=16) (actual time=0.005..205.626 rows=212434 loops=1) Total runtime: 31816.000 ms (19 rows) As long as I read the output, source table is scanned only once, all further operations are done in RAM. P.S. Not a perfect query, but shows the feature. -- Victor Y. Yegorov
pgsql-general by date: