Thread: Breaking up a query
Hi Guys, A fellow at work has written the SQL below to retrieve some data from multiple tables. Obviously it is inefficient and unnecessarily complex, and I am trying to break it into 2 or more queries so as to enhance performance. Can you please help me do so? I would appreciate any help you can provide. I have also attached the output of the "explain analyze" of this query. Thank you. Saad SELECT v.xcvr_id as xcvr_id , v.bumper_number as bumper_number , v.vehicle_type as vehicle_type , p.epoch as epoch , p.latitude as latitude , p.longitude as longitude , p.fom as fom , i.version as version , i.rfid_status as rfid_status , t.tag_id as tag_id , t.tag_status as tag_status FROM positions p LEFT OUTER JOIN data_transfers dt ON p.id = dt.position_id INNER JOIN vehicles v ON p.vehicle_id = v.id LEFT OUTER JOIN interrogations i ON p.id = i.position_id AND v.id = i.vehicle_id LEFT OUTER JOIN tags t ON i.id = t.interrogation_id WHERE p.id NOT IN ( SELECT dt.position_id FROM data_transfers WHERE dt.target_id = ? ) ORDER BY v.xcvr_id , v.bumper_number , v.vehicle_type , i.version , i.rfid_status , p.epoch;
Attachment
On 8/10/06, Saad Anis <saad.anis@comtechmobile.com> wrote:
On an surface scan of what you have I don't see anything obviously wrong. Do you have your foreign keys defined along all the joins? What kind of indexes do you have defined on the tables?
My guess is your problem is occurring here:
-> Merge Left Join (cost=0.00..11334.00 rows=1000 width=28) (actual time=705.104..706.789 rows=55 loops=1)
Merge Cond: ("outer".id = "inner".position_id)
Filter: (NOT (subplan))
-> Index Scan using positions_pkey on positions p (cost=0.00..32.00 rows=1000 width=28) (actual time=0.019..90.920 rows=13958 loops=1)
-> Index Scan using data_transfers_position_id_idx on data_transfers dt (cost= 0.00..32.00 rows=1000 width=8) (actual time=0.015..91.859 rows=13903 loops=1)
You see that the cost jumps significantly.
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
Hi Guys,
A fellow at work has written the SQL below to retrieve some data from
multiple tables. Obviously it is inefficient and unnecessarily complex, and
I am trying to break it into 2 or more queries so as to enhance performance.
Can you please help me do so? I would appreciate any help you can provide.
I have also attached the output of the "explain analyze" of this query.
Thank you.
Saad
SELECT v.xcvr_id as xcvr_id
, v.bumper_number as bumper_number
, v.vehicle_type as vehicle_type
, p.epoch as epoch
, p.latitude as latitude
, p.longitude as longitude
, p.fom as fom
, i.version as version
, i.rfid_status as rfid_status
, t.tag_id as tag_id
, t.tag_status as tag_status
FROM positions p
LEFT OUTER JOIN data_transfers dt
ON p.id = dt.position_id
INNER JOIN vehicles v
ON p.vehicle_id = v.id
LEFT OUTER JOIN interrogations i
ON p.id = i.position_id
AND v.id = i.vehicle_id
LEFT OUTER JOIN tags t
ON i.id = t.interrogation_id
WHERE p.id NOT IN (
SELECT dt.position_id
FROM data_transfers
WHERE dt.target_id = ?
)
ORDER BY v.xcvr_id
, v.bumper_number
, v.vehicle_type
, i.version
, i.rfid_status
, p.epoch;
My guess is your problem is occurring here:
-> Merge Left Join (cost=0.00..11334.00 rows=1000 width=28) (actual time=705.104..706.789 rows=55 loops=1)
Merge Cond: ("outer".id = "inner".position_id)
Filter: (NOT (subplan))
-> Index Scan using positions_pkey on positions p (cost=0.00..32.00 rows=1000 width=28) (actual time=0.019..90.920 rows=13958 loops=1)
-> Index Scan using data_transfers_position_id_idx on data_transfers dt (cost= 0.00..32.00 rows=1000 width=8) (actual time=0.015..91.859 rows=13903 loops=1)
You see that the cost jumps significantly.
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On Thu, Aug 10, 2006 at 06:53:33PM -0400, Saad Anis wrote: > Hi Guys, > > A fellow at work has written the SQL below to retrieve some data from > multiple tables. Obviously it is inefficient and unnecessarily complex, and Why is that obvious? Why do you think that joining in your application is going to give better performance than joining in the database? This is what RDBMS _do_, for heaven's sake. (I agree that your plan shows it's taking a long time. But that's a different problem. We don't actually know what you want from your query.) A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
On Thu, 2006-08-10 at 17:53, Saad Anis wrote: > Hi Guys, > > A fellow at work has written the SQL below to retrieve some data from > multiple tables. Obviously it is inefficient and unnecessarily complex, and > I am trying to break it into 2 or more queries so as to enhance performance. Nope, that's not true in PostgreSQL. It is for some databases with relatively simplistic query planners, but not postgresql. I'd check that you have indexes where you need them (generally when you see a seq scan on a small set) including, especially, the foreign key columns (i.e. the ones pointing to another table's primary keys). On to your explain analyze, I noticed a lot of lines like this: Index Scan using positions_pkey on positions p (cost=0.00..32.00 rows=1000 width=28) (actual time=0.019..90.920 rows=13958 loop seeing as how the statistical default for a new, unanalyzed table is 1000, and you've got 1000 peppered all through your explain analyze, I'd guess you've not analyzed your database. Which means you've likely not read the admin docs. which means you've likely not vacuumed the database. Read the admin docs (they're not that thick, and there's lots of good info in there) and apply things like vacuum and analyze, and get back to us on how things are doing then.