Thread: Breaking up a query

Breaking up a query

From
Saad Anis
Date:
Hi Guys,

A fellow at work has written the SQL below to retrieve some data from
multiple tables. Obviously it is inefficient and unnecessarily complex, and
I am trying to break it into 2 or more queries so as to enhance performance.

Can you please help me do so? I would appreciate any help you can provide.

I have also attached the output of the "explain analyze" of this query.

Thank you.
Saad

SELECT    v.xcvr_id as xcvr_id
    , v.bumper_number as bumper_number
    , v.vehicle_type as vehicle_type
    , p.epoch as epoch
    , p.latitude as latitude
    , p.longitude as longitude
    , p.fom as fom
    , i.version as version
    , i.rfid_status as rfid_status
    , t.tag_id as tag_id
    , t.tag_status as tag_status
FROM    positions p
    LEFT OUTER JOIN data_transfers dt
        ON p.id = dt.position_id
    INNER JOIN vehicles v
        ON p.vehicle_id = v.id
    LEFT OUTER JOIN interrogations i
        ON p.id = i.position_id
        AND v.id = i.vehicle_id
    LEFT OUTER JOIN tags t
        ON i.id = t.interrogation_id
WHERE    p.id NOT IN (
        SELECT dt.position_id
        FROM data_transfers
        WHERE dt.target_id = ?
    )
ORDER BY v.xcvr_id
    , v.bumper_number
    , v.vehicle_type
    , i.version
    , i.rfid_status
    , p.epoch;



Attachment

Re: Breaking up a query

From
"Aaron Bono"
Date:
On 8/10/06, Saad Anis <saad.anis@comtechmobile.com> wrote:
Hi Guys,

A fellow at work has written the SQL below to retrieve some data from
multiple tables. Obviously it is inefficient and unnecessarily complex, and
I am trying to break it into 2 or more queries so as to enhance performance.

Can you please help me do so? I would appreciate any help you can provide.

I have also attached the output of the "explain analyze" of this query.

Thank you.
Saad

SELECT    v.xcvr_id as xcvr_id
    , v.bumper_number as bumper_number
    , v.vehicle_type as vehicle_type
    , p.epoch as epoch
    , p.latitude as latitude
    , p.longitude as longitude
    , p.fom as fom
    , i.version as version
    , i.rfid_status as rfid_status
    , t.tag_id as tag_id
    , t.tag_status as tag_status
FROM    positions p
    LEFT OUTER JOIN data_transfers dt
        ON p.id = dt.position_id
    INNER JOIN vehicles v
        ON p.vehicle_id = v.id
    LEFT OUTER JOIN interrogations i
        ON p.id = i.position_id
        AND v.id = i.vehicle_id
    LEFT OUTER JOIN tags t
        ON i.id = t.interrogation_id
WHERE    p.id NOT IN (
        SELECT dt.position_id
        FROM data_transfers
        WHERE dt.target_id = ?
    )
ORDER BY v.xcvr_id
    , v.bumper_number
    , v.vehicle_type
    , i.version
    , i.rfid_status
    , p.epoch;

 
On an surface scan of what you have I don't see anything obviously wrong.  Do you have your foreign keys defined along all the joins?  What kind of indexes do you have defined on the tables?

My guess is your problem is occurring here:

->  Merge Left Join  (cost=0.00..11334.00 rows=1000 width=28) (actual time=705.104..706.789 rows=55 loops=1)
      Merge Cond: ("outer".id = "inner".position_id)
       Filter: (NOT (subplan))
       ->  Index Scan using positions_pkey on positions p  (cost=0.00..32.00 rows=1000 width=28) (actual time=0.019..90.920 rows=13958 loops=1)
       ->  Index Scan using data_transfers_position_id_idx on data_transfers dt  (cost= 0.00..32.00 rows=1000 width=8) (actual time=0.015..91.859 rows=13903 loops=1)

You see that the cost jumps significantly.

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: Breaking up a query

From
Andrew Sullivan
Date:
On Thu, Aug 10, 2006 at 06:53:33PM -0400, Saad Anis wrote:
> Hi Guys,
> 
> A fellow at work has written the SQL below to retrieve some data from
> multiple tables. Obviously it is inefficient and unnecessarily complex, and

Why is that obvious?  Why do you think that joining in your
application is going to give better performance than joining in the
database?  This is what RDBMS _do_, for heaven's sake.

(I agree that your plan shows it's taking a long time.  But that's a
different problem.  We don't actually know what you want from your
query.)

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.    --J.D. Baldwin


Re: Breaking up a query

From
Scott Marlowe
Date:
On Thu, 2006-08-10 at 17:53, Saad Anis wrote:
> Hi Guys,
> 
> A fellow at work has written the SQL below to retrieve some data from
> multiple tables. Obviously it is inefficient and unnecessarily complex, and
> I am trying to break it into 2 or more queries so as to enhance performance.

Nope, that's not true in PostgreSQL.  It is for some databases with
relatively simplistic query planners, but not postgresql.

I'd check that you have indexes where you need them (generally when you
see a seq scan on a small set) including, especially, the foreign key
columns (i.e. the ones pointing to another table's primary keys).

On to your explain analyze, I noticed a lot of lines like this:

Index Scan using positions_pkey on positions p  (cost=0.00..32.00
rows=1000 width=28) (actual time=0.019..90.920 rows=13958 loop

seeing as how the statistical default for a new, unanalyzed table is
1000, and you've got 1000 peppered all through your explain analyze, I'd
guess you've not analyzed your database.  Which means you've likely not
read the admin docs.  which means you've likely not vacuumed the
database.  

Read the admin docs (they're not that thick, and there's lots of good
info in there) and apply things like vacuum and analyze, and get back to
us on how things are doing then.