Re: Poor performance using CTE - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Poor performance using CTE
Date
Msg-id CAHyXU0xeEVX2P=HaNxPiKayEhkWp3GobgAXCQPrfsqbawXVV-w@mail.gmail.com
Whole thread Raw
In response to Poor performance using CTE  (David Greco <David_Greco@harte-hanks.com>)
Responses Re: Poor performance using CTE
List pgsql-performance
On Tue, Nov 13, 2012 at 2:57 PM, David Greco
<David_Greco@harte-hanks.com> wrote:
> Have a query using a CTE that is performing very poorly. The equivalent
> query against the same data in an Oracle database runs in under 1 second, in
> Postgres  it takes 2000 seconds.
>
>
>
> The smp_pkg.get_invoice_charges queries fedexinvoices for some data and
> normalizes it into a SETOF some record type. It is declared to be STABLE.
> Fedexinvoices consists of about 1.3M rows of medium width. Fedexinvoices.id
> is the primary key on that table, and trim(fedexinvoices.trackno) is indexed
> via the function trim.
>
>
>
> The plan for the equivalent query in Oracle is much smaller and simpler. No
> sequential (or full table) scans on fedexinvoices.
>
>
>
>
>
>
>
> WITH charges as (
>
>                 SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id)
> charge_info from fedexinvoices fi2
>
> )
>
> select fedexinvoices.* from
>
> fedexinvoices
>
> inner join charges on charges.id = fedexinvoices.id AND
> (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS
> CORRECTION')
>
> where
>
> trim(fedexinvoices.trackno)='799159791643'
>
> ;
>
>
>
> Explain Analyze output, I abbreviated some of the column lists for brevity:
>
>
>
> Nested Loop  (cost=457380.38..487940.77 rows=1 width=1024) (actual
> time=1978019.858..1978019.858 rows=0 loops=1)
>
>   Output: fedexinvoices.id, ………
>
>   Join Filter: (fedexinvoices.id = charges.id)
>
>   Buffers: shared hit=20387611, temp written=94071
>
>   CTE charges
>
>     ->  Seq Scan on hits.fedexinvoices fi2  (cost=0.00..457380.38
> rows=1350513 width=8) (actual time=0.613..1964632.763 rows=9007863 loops=1)
>
>           Output: fi2.id, smp_pkg.get_invoice_charges(fi2.id,
> NULL::character varying)
>
>           Buffers: shared hit=20387606
>
>   ->  Index Scan using fedexinvoices_trim_track_idx on hits.fedexinvoices
> (cost=0.00..5.46 rows=1 width=1024) (actual time=0.024..0.026 rows=1
> loops=1)
>
>         Output: fedexinvoices.id, ………
>
>         Index Cond: (btrim((fedexinvoices.trackno)::text) =
> '799159791643'::text)
>
>         Buffers: shared hit=5
>
>   ->  CTE Scan on charges  (cost=0.00..30386.54 rows=13471 width=8) (actual
> time=1978019.827..1978019.827 rows=0 loops=1)
>
>         Output: charges.id, charges.charge_info
>
>         Filter: (((charges.charge_info).charge_name)::text = ANY ('{"ADDRESS
> CORRECTION CHARGE","ADDRESS CORRECTION"}'::text[]))
>
>         Buffers: shared hit=20387606, temp written=94071
>
> Total runtime: 1978214.743 ms

The problem here is very clear.  Oracle is optimizing through the CTE.
 PostgreSQL does not do this by design -- CTE's are used as a forced
materialization step.

merlin


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: PQconnectStart/PQconnectPoll
Next
From: Claudio Freire
Date:
Subject: Re: Poor performance using CTE