Query performance and understanding explain analzye - Mailing list pgsql-performance
From | Benjamin Wragg |
---|---|
Subject | Query performance and understanding explain analzye |
Date | |
Msg-id | 20050120032352.673F9CCC8@vscan02.westnet.com.au Whole thread Raw |
List | pgsql-performance |
Hi, I'm trying to tune a query that is taking to long to execute. I haven't done much sql tuning and have only had a little exposure to explain and explain analyze but from what I've read on the list and in books the following is generally true: Seq Scans are the almost always evil (except if a table has only a few values) Nested Joins are generally evil as every node below it is executed the number of times the "loops=" value says. Hash Joins are extremely quick. This is because when postgres uses Hash joins it creates a copy of the values of the table in memory and then Hashes (some type of memory join) to the other table. Is that correct? If so, I'm after some help on the following query which I feel is taking too long. At the outset I want to apologise for the length of this email, I just wanted to provide as much info as possible. I just can't seem to make sense of it and have been trying for days! SELECT abs(item.area-userpolygon.area) as area,item.title as item_title,item.id as item_id,item.collection_id as item_collection_id,item.type_id as item_type_id,item.scale as item_scale,publisher.publisher as publisher_publisher,publisher.description as publisher_description,language.language as language_language,language.description as language_description,language.code2 as language_code2,language.code3 as language_code3,collection.collection as collection_collection,collection.description as collection_description,item_base_type.type as item_type_combination_type,item_subtype.subtype as item_type_combination_subtype,item_format.format as item_type_combination_format,status.status as status_status,status.description as status_description,currency.code as currency_code,currency.description as currency_description,item.subtitle as item_subtitle,item.description as item_description,item.item_number as item_item_number,item.edition as item_edition,item.h_datum as item_h_datum,item.v_datum as item_v_datum,item.projection as item_projection,item.isbn as item_isbn,client_item_field.stock as client_item_field_stock,client_item_field.price as client_item_field_price,client_freight.freight as client_freight_freight,client_freight.description as client_freight_description FROM item INNER JOIN (client INNER JOIN client_item ON (client.id=client_item.client_id)) ON (client_item.item_id=item.id ) INNER JOIN publisher ON (item.publisher_id = publisher.id) INNER JOIN language ON (item.language_id = language.id) LEFT OUTER JOIN collection ON (item.collection_id = collection.id) INNER JOIN item_base_type ON (item.type_id = item_base_type.id) INNER JOIN item_subtype ON (item.subtype_id = item_subtype.id) INNER JOIN item_format ON (item.format_id = item_format.id) INNER JOIN status ON (item.status_id = status.id) INNER JOIN currency ON (item.publisher_currency_id = currency.id) LEFT OUTER JOIN client_item_field ON (client_item.client_id=client_item_field.client_id) AND (client_item.item_id=client_item_field.item_id) LEFT OUTER JOIN client_item_freight ON (client_item.client_id=client_item_freight.client_id) AND (client_item.item_id=client_item_freight.item_id) LEFT OUTER JOIN client_freight ON (client_freight.id=client_item_freight.client_freight_id), userpolygon WHERE item.the_geom && userpolygon.the_geom AND distance(item.the_geom, userpolygon.the_geom)=0 AND userpolygon.session_id='TestQuery' AND client.id=1 ORDER BY area asc When I explain analyze it I get: QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ------------ Sort (cost=4793.89..4793.91 rows=7 width=622) (actual time=4066.52..4067.79 rows=4004 loops=1) Sort Key: abs((item.area - userpolygon.area)) -> Nested Loop (cost=533.45..4793.79 rows=7 width=622) (actual time=66.89..4054.01 rows=4004 loops=1) Join Filter: (("outer".the_geom && "inner".the_geom) AND (distance("outer".the_geom, "inner".the_geom) = 0::double precision)) -> Hash Join (cost=533.45..4548.30 rows=14028 width=582) (actual time=63.79..3826.16 rows=14028 loops=1) Hash Cond: ("outer".client_freight_id = "inner".id) -> Hash Join (cost=532.38..4437.64 rows=14028 width=540) (actual time=63.52..3413.48 rows=14028 loops=1) Hash Cond: ("outer".item_id = "inner".item_id) Join Filter: ("outer".client_id = "inner".client_id) -> Hash Join (cost=532.38..4367.49 rows=14028 width=528) (actual time=62.95..2993.37 rows=14028 loops=1) Hash Cond: ("outer".item_id = "inner".item_id) Join Filter: ("outer".client_id = "inner".client_id) -> Hash Join (cost=532.38..4297.33 rows=14028 width=508) (actual time=62.48..2576.46 rows=14028 loops=1) Hash Cond: ("outer".publisher_currency_id = "inner".id) -> Hash Join (cost=528.23..4047.69 rows=14028 width=476) (actual time=61.64..2189.57 rows=14028 loops=1) Hash Cond: ("outer".status_id = "inner".id) -> Hash Join (cost=527.17..3766.07 rows=14028 width=430) (actual time=61.30..1846.30 rows=14028 loops=1) Hash Cond: ("outer".format_id = "inner".id) -> Hash Join (cost=526.02..3519.43 rows=14028 width=417) (actual time=60.62..1537.19 rows=14028 loops=1) Hash Cond: ("outer".subtype_id = "inner".id) -> Hash Join (cost=524.67..3272.59 rows=14028 width=400) (actual time=60.09..1258.45 rows=14028 loops=1) Hash Cond: ("outer".type_id = "inner".id) -> Hash Join (cost=523.60..2990.96 rows=14028 width=388) (actual time=59.53..1009.52 rows=14028 loops=1) Hash Cond: ("outer".collection_id = "inner".id) -> Hash Join (cost=522.35..2709.15 rows=14028 width=329) (actual time=59.21..785.50 rows=14028 loops=1) Hash Cond: ("outer".language_id = "inner".id) -> Hash Join (cost=513.30..2419.54 rows=14028 width=269) (actual time=57.65..582.34 rows=14028 loops=1) Hash Cond: ("outer".publisher_id = "inner".id) -> Hash Join (cost=510.85..2171.60 rows=14028 width=220) (actual time=57.03..414.43 rows=14028 loops=1) Hash Cond: ("outer".id = "inner".item_id) -> Seq Scan on item (cost=0.00..924.28 rows=14028 width=208) (actual time=0.03..211.81 rows=14028 loops=1) -> Hash (cost=475.78..475.78 rows=14028 width=12) (actual time=56.47..56.47 rows=0 loops=1) -> Nested Loop (cost=0.00..475.78 rows=14028 width=12) (actual time=0.06..43.86 rows=14028 loops=1) -> Seq Scan on client (cost=0.00..1.05 rows=1 width=4) (actual time=0.01..0.03 rows=1 loops=1) Filter: (id = 1) -> Index Scan using client_item_client_id_idx on client_item (cost=0.00..299.38 rows=14028 width=8) (actual time=0.03..27.45 rows=14028 loops=1) Index Cond: ("outer".id = client_item.client_id) -> Hash (cost=2.21..2.21 rows=97 width=49) (actual time=0.33..0.33 rows=0 loops=1) -> Seq Scan on organisation (cost=0.00..2.21 rows=97 width=49) (actual time=0.02..0.22 rows=97 loops=1) Filter: (type_id = 1) -> Hash (cost=8.04..8.04 rows=404 width=60) (actual time=1.27..1.27 rows=0 loops=1) -> Seq Scan on "language" (cost=0.00..8.04 rows=404 width=60) (actual time=0.01..0.81 rows=404 loops=1) -> Hash (cost=1.20..1.20 rows=20 width=59) (actual time=0.06..0.06 rows=0 loops=1) -> Seq Scan on collection (cost=0.00..1.20 rows=20 width=59) (actual time=0.01..0.04 rows=20 loops=1) -> Hash (cost=1.05..1.05 rows=5 width=12) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on item_base_type (cost=0.00..1.05 rows=5 width=12) (actual time=0.01..0.02 rows=5 loops=1) -> Hash (cost=1.28..1.28 rows=28 width=17) (actual time=0.07..0.07 rows=0 loops=1) -> Seq Scan on item_subtype (cost=0.00..1.28 rows=28 width=17) (actual time=0.01..0.05 rows=28 loops=1) -> Hash (cost=1.12..1.12 rows=12 width=13) (actual time=0.05..0.05 rows=0 loops=1) -> Seq Scan on item_format (cost=0.00..1.12 rows=12 width=13) (actual time=0.01..0.03 rows=12 loops=1) -> Hash (cost=1.05..1.05 rows=5 width=46) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on status (cost=0.00..1.05 rows=5 width=46) (actual time=0.01..0.02 rows=5 loops=1) -> Hash (cost=3.72..3.72 rows=172 width=32) (actual time=0.45..0.45 rows=0 loops=1) -> Seq Scan on currency (cost=0.00..3.72 rows=172 width=32) (actual time=0.02..0.28 rows=172 loops=1) -> Hash (cost=0.00..0.00 rows=1 width=20) (actual time=0.01..0.01 rows=0 loops=1) -> Seq Scan on client_item_field (cost=0.00..0.00 rows=1 width=20) (actual time=0.00..0.00 rows=0 loops=1) -> Hash (cost=0.00..0.00 rows=1 width=12) (actual time=0.01..0.01 rows=0 loops=1) -> Seq Scan on client_item_freight (cost=0.00..0.00 rows=1 width=12) (actual time=0.00..0.00 rows=0 loops=1) -> Hash (cost=1.05..1.05 rows=5 width=42) (actual time=0.03..0.03 rows=0 loops=1) -> Seq Scan on client_freight (cost=0.00..1.05 rows=5 width=42) (actual time=0.01..0.02 rows=5 loops=1) -> Seq Scan on userpolygon (cost=0.00..0.00 rows=1 width=40) (actual time=0.01..0.01 rows=1 loops=14028) Filter: (session_id = 'TestQuery'::character varying) Total runtime: 4070.87 msec (63 rows) (if you have trouble reading it I can send it in a formatted txt file) So from my basic knowledge of explain analyze, am I correct is saying that postgres is deciding to attack the query in the following way: 1) All the small tables which I join to item should be loaded into a hashes. (e.g currency, status, collection, language, etc)? 2) The following indicates that the client table is joined to the client_item table and a hash is created in memory? -> Hash (cost=475.78..475.78 rows=14028 width=12) (actual time=56.47..56.47 rows=0 loops=1) -> Nested Loop (cost=0.00..475.78 rows=14028 width=12) (actual time=0.06..43.86 rows=14028 loops=1) -> Seq Scan on client (cost=0.00..1.05 rows=1 width=4) (actual time=0.01..0.03 rows=1 loops=1) Filter: (id = 1) -> Index Scan using client_item_client_id_idx on client_item (cost=0.00..299.38 rows=14028 width=8) (actual time=0.03..27.45 rows=14028 loops=1) Index Cond: ("outer".id = client_item.client_id) 3) All the records in the items table are selected with: -> Seq Scan on item (cost=0.00..924.28 rows=14028 width=208) (actual time=0.03..211.81 rows=14028 loops=1). This is ok as I am selecting everything from the item table. 4) Then the hash in step 2 is joined to the records from step 3: -> Hash Join (cost=510.85..2171.60 rows=14028 width=220) (actual time=57.03..414.43 rows=14028 loops=1) Hash Cond: ("outer".id = "inner".item_id) 5) All the hashes created in the first step are joined to the items returned in the second step (e.g currency, status, collection, language, etc)? 6) A nested loop runs at the end (actually a spatial operation for PostGIS) and a sort occurs. Am I correct??? If I am correct in the above execution path, doesn't this show that what is slowing down the query down is all the hash joins of the small tables??? I say this because at the start of step 4 the time taken so far is 57.03 milli secs and at the end of step 4 the time taken is 414.43 millisecs. So that part of the query took 414.43-57.03 but then just before step 6 at the last hash join the time taken is reported as: -> Hash Join (cost=533.45..4548.30 rows=14028 width=582) (actual time=63.79..3826.16 rows=14028 loops=1) So does this mean that at that point 3826.16 milli seconds had past and if we take 414.43 from 3826.16 it shows that all the hash joins took about 3.4 seconds to do? This doesn't seem right, as I thought that the hash joins were the quickest way to do a join? My final question is that on the last Nested loops, as well as on some of the hash joins I see "actual time" report as: -> Nested Loop (cost=533.45..4793.79 rows=7 width=622) (actual time=66.89..4054.01 rows=4004 loops=1) What does the first time, 66.89, represent? It can't be the time taken so far in the query because the node below it reports 3826.16 milli sec had passed. How do I interpret this? Thanks, Benjamin Wragg P.S I have indexes on all the primary and foreign keys and have vacuum analyzed -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.7.1 - Release Date: 19/01/2005
pgsql-performance by date: