Re: Poor performance using CTE - Mailing list pgsql-performance
From | Merlin Moncure |
---|---|
Subject | Re: Poor performance using CTE |
Date | |
Msg-id | CAHyXU0xeEVX2P=HaNxPiKayEhkWp3GobgAXCQPrfsqbawXVV-w@mail.gmail.com Whole thread Raw |
In response to | Poor performance using CTE (David Greco <David_Greco@harte-hanks.com>) |
Responses |
Re: Poor performance using CTE
|
List | pgsql-performance |
On Tue, Nov 13, 2012 at 2:57 PM, David Greco <David_Greco@harte-hanks.com> wrote: > Have a query using a CTE that is performing very poorly. The equivalent > query against the same data in an Oracle database runs in under 1 second, in > Postgres it takes 2000 seconds. > > > > The smp_pkg.get_invoice_charges queries fedexinvoices for some data and > normalizes it into a SETOF some record type. It is declared to be STABLE. > Fedexinvoices consists of about 1.3M rows of medium width. Fedexinvoices.id > is the primary key on that table, and trim(fedexinvoices.trackno) is indexed > via the function trim. > > > > The plan for the equivalent query in Oracle is much smaller and simpler. No > sequential (or full table) scans on fedexinvoices. > > > > > > > > WITH charges as ( > > SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) > charge_info from fedexinvoices fi2 > > ) > > select fedexinvoices.* from > > fedexinvoices > > inner join charges on charges.id = fedexinvoices.id AND > (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS > CORRECTION') > > where > > trim(fedexinvoices.trackno)='799159791643' > > ; > > > > Explain Analyze output, I abbreviated some of the column lists for brevity: > > > > Nested Loop (cost=457380.38..487940.77 rows=1 width=1024) (actual > time=1978019.858..1978019.858 rows=0 loops=1) > > Output: fedexinvoices.id, ……… > > Join Filter: (fedexinvoices.id = charges.id) > > Buffers: shared hit=20387611, temp written=94071 > > CTE charges > > -> Seq Scan on hits.fedexinvoices fi2 (cost=0.00..457380.38 > rows=1350513 width=8) (actual time=0.613..1964632.763 rows=9007863 loops=1) > > Output: fi2.id, smp_pkg.get_invoice_charges(fi2.id, > NULL::character varying) > > Buffers: shared hit=20387606 > > -> Index Scan using fedexinvoices_trim_track_idx on hits.fedexinvoices > (cost=0.00..5.46 rows=1 width=1024) (actual time=0.024..0.026 rows=1 > loops=1) > > Output: fedexinvoices.id, ……… > > Index Cond: (btrim((fedexinvoices.trackno)::text) = > '799159791643'::text) > > Buffers: shared hit=5 > > -> CTE Scan on charges (cost=0.00..30386.54 rows=13471 width=8) (actual > time=1978019.827..1978019.827 rows=0 loops=1) > > Output: charges.id, charges.charge_info > > Filter: (((charges.charge_info).charge_name)::text = ANY ('{"ADDRESS > CORRECTION CHARGE","ADDRESS CORRECTION"}'::text[])) > > Buffers: shared hit=20387606, temp written=94071 > > Total runtime: 1978214.743 ms The problem here is very clear. Oracle is optimizing through the CTE. PostgreSQL does not do this by design -- CTE's are used as a forced materialization step. merlin
pgsql-performance by date: