Thread: Poor performance using CTE

Poor performance using CTE

From
David Greco
Date:

Have a query using a CTE that is performing very poorly. The equivalent query against the same data in an Oracle database runs in under 1 second, in Postgres  it takes 2000 seconds.

 

The smp_pkg.get_invoice_charges queries fedexinvoices for some data and normalizes it into a SETOF some record type. It is declared to be STABLE. Fedexinvoices consists of about 1.3M rows of medium width. Fedexinvoices.id is the primary key on that table, and trim(fedexinvoices.trackno) is indexed via the function trim.

 

The plan for the equivalent query in Oracle is much smaller and simpler. No sequential (or full table) scans on fedexinvoices.

 

 

 

WITH charges as (

                SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) charge_info from fedexinvoices fi2

)

select fedexinvoices.* from

fedexinvoices

inner join charges on charges.id = fedexinvoices.id AND (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION')

where

trim(fedexinvoices.trackno)='799159791643'

;

 

Explain Analyze output, I abbreviated some of the column lists for brevity:

 

Nested Loop  (cost=457380.38..487940.77 rows=1 width=1024) (actual time=1978019.858..1978019.858 rows=0 loops=1)

  Output: fedexinvoices.id, ………

  Join Filter: (fedexinvoices.id = charges.id)

  Buffers: shared hit=20387611, temp written=94071

  CTE charges

    ->  Seq Scan on hits.fedexinvoices fi2  (cost=0.00..457380.38 rows=1350513 width=8) (actual time=0.613..1964632.763 rows=9007863 loops=1)

          Output: fi2.id, smp_pkg.get_invoice_charges(fi2.id, NULL::character varying)

          Buffers: shared hit=20387606

  ->  Index Scan using fedexinvoices_trim_track_idx on hits.fedexinvoices  (cost=0.00..5.46 rows=1 width=1024) (actual time=0.024..0.026 rows=1 loops=1)

        Output: fedexinvoices.id, ………

        Index Cond: (btrim((fedexinvoices.trackno)::text) = '799159791643'::text)

        Buffers: shared hit=5

  ->  CTE Scan on charges  (cost=0.00..30386.54 rows=13471 width=8) (actual time=1978019.827..1978019.827 rows=0 loops=1)

        Output: charges.id, charges.charge_info

        Filter: (((charges.charge_info).charge_name)::text = ANY ('{"ADDRESS CORRECTION CHARGE","ADDRESS CORRECTION"}'::text[]))

        Buffers: shared hit=20387606, temp written=94071

Total runtime: 1978214.743 ms

 

 

Re: Poor performance using CTE

From
Andrew Dunstan
Date:
On 11/14/2012 10:23 AM, David Greco wrote:
>
> Have a query using a CTE that is performing very poorly. The
> equivalent query against the same data in an Oracle database runs in
> under 1 second, in Postgres  it takes 2000 seconds.
>
> The smp_pkg.get_invoice_charges queries fedexinvoices for some data
> and normalizes it into a SETOF some record type. It is declared to be
> STABLE. Fedexinvoices consists of about 1.3M rows of medium width.
> Fedexinvoices.id is the primary key on that table, and
> trim(fedexinvoices.trackno) is indexed via the function trim.
>
> The plan for the equivalent query in Oracle is much smaller and
> simpler. No sequential (or full table) scans on fedexinvoices.
>
> WITH charges as (
>
>                 SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id)
> charge_info from fedexinvoices fi2
>
> )
>
> select fedexinvoices.* from
>
> fedexinvoices
>
> inner join charges on charges.id = fedexinvoices.id AND
> (charges.charge_info).charge_name IN ('ADDRESS CORRECTION
> CHARGE','ADDRESS CORRECTION')
>
> where
>
> trim(fedexinvoices.trackno)='799159791643'
>
> ;
>


Can you explain what you're actually trying to do here? The query looks
rather odd. Why are you joining this table (or an extract from it) to
itself?


In any case, you could almost certainly recast it and have it run fast
by first filtering on the tracking number.


cheers

andrew


Re: Poor performance using CTE

From
David Greco
Date:
You're right. I was translating an oracle query , but looks like PG will allow some syntax that is different. Trying to
findentries in fedexinvoices where smp_pkg.get_invoice_charges(id) returns a record containing charge_name in ('ADDRESS
CORRECTIONCHARGE','ADDRESS CORRECTION'). Should return the fedexinvoices row and the row from
smp_pkg.get_invoice_chargesthat contains the address correction. 


Something like this, though this is syntactically incorrect as smp_pkg.get_invoice_charges returns a set:


select fedexinvoices.*, (smp_pkg.get_invoice_charges(id)).*
from fedexinvoices
WHERE
trim(fedexinvoices.trackno)='799159791643'
and
(smp_pkg.get_invoice_charges(id)).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION')




-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Wednesday, November 14, 2012 10:51 AM
To: David Greco
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Poor performance using CTE


On 11/14/2012 10:23 AM, David Greco wrote:
>
> Have a query using a CTE that is performing very poorly. The
> equivalent query against the same data in an Oracle database runs in
> under 1 second, in Postgres  it takes 2000 seconds.
>
> The smp_pkg.get_invoice_charges queries fedexinvoices for some data
> and normalizes it into a SETOF some record type. It is declared to be
> STABLE. Fedexinvoices consists of about 1.3M rows of medium width.
> Fedexinvoices.id is the primary key on that table, and
> trim(fedexinvoices.trackno) is indexed via the function trim.
>
> The plan for the equivalent query in Oracle is much smaller and
> simpler. No sequential (or full table) scans on fedexinvoices.
>
> WITH charges as (
>
>                 SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id)
> charge_info from fedexinvoices fi2
>
> )
>
> select fedexinvoices.* from
>
> fedexinvoices
>
> inner join charges on charges.id = fedexinvoices.id AND
> (charges.charge_info).charge_name IN ('ADDRESS CORRECTION
> CHARGE','ADDRESS CORRECTION')
>
> where
>
> trim(fedexinvoices.trackno)='799159791643'
>
> ;
>


Can you explain what you're actually trying to do here? The query looks rather odd. Why are you joining this table (or
anextract from it) to itself? 


In any case, you could almost certainly recast it and have it run fast by first filtering on the tracking number.


cheers

andrew




Re: Poor performance using CTE

From
Andrew Dunstan
Date:
On 11/14/2012 10:56 AM, David Greco wrote:
> You're right. I was translating an oracle query , but looks like PG will allow some syntax that is different. Trying
tofind entries in fedexinvoices where smp_pkg.get_invoice_charges(id) returns a record containing charge_name in
('ADDRESSCORRECTION CHARGE','ADDRESS CORRECTION'). Should return the fedexinvoices row and the row from
smp_pkg.get_invoice_chargesthat contains the address correction. 
>
>
> Something like this, though this is syntactically incorrect as smp_pkg.get_invoice_charges returns a set:
>
>
> select fedexinvoices.*, (smp_pkg.get_invoice_charges(id)).*
> from fedexinvoices
> WHERE
> trim(fedexinvoices.trackno)='799159791643'
> and
> (smp_pkg.get_invoice_charges(id)).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION')


First, please don't top-post when someone has replied underneath your
post. It makes the thread totally unreadable. See
<http://idallen.com/topposting.html>

You could do something like this:

WITH invoices as
(
    select *
    from fedexinvoices
    where trim(fedexinvoices.trackno)='799159791643'
),

charges as
(
    SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) charge_info
    from fedexinvoices fi2 join invoices i on i.id = f12.id
)

select invoices.*
from invoices
inner join charges on charges.id = invoices.id
     AND (charges.charge_info).charge_name IN ('ADDRESS CORRECTION
CHARGE','ADDRESS CORRECTION')

;


Or probably something way simpler but I just did this fairly quickly and
mechanically


cheers

andrew