Re: PostgreSQL and Linux 2.6 kernel. - Mailing list pgsql-performance
From | Gary Doades |
---|---|
Subject | Re: PostgreSQL and Linux 2.6 kernel. |
Date | |
Msg-id | E7C9E6AB2B6A@gpdnet.co.uk Whole thread Raw |
In response to | PostgreSQL and Linux 2.6 kernel. ("Gary Doades" <gpd@gpdnet.co.uk>) |
List | pgsql-performance |
Following on from Josh's response and my previous reply on SQLServer planning. The main problem query is this one: SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS, SC.MIN_HOURS, (SELECT COUNT(*) FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID AND MAIN_ORDER.CLIENT_ID = 6) WHERE TIMESHEET_DETAIL.CONTRACT_ID = SC.CONTRACT_ID) AS VISITS, (SELECT (SUM(R.DURATION+1))/60.0 FROM ORDER_REQT R JOIN STAFF_BOOKING B ON (B.REQT_ID = R.REQT_ID) JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) WHERE B.CONTRACT_ID = SC.CONTRACT_ID AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18') AS RHOURS FROM VSTAFF VS JOIN STAFF_CONTRACT SC ON (SC.STAFF_ID = VS.STAFF_ID) JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID) JOIN SEARCH_REQT_RESULT SR ON (SR.STAFF_ID = VS.STAFF_ID) WHERE SR.SEARCH_ID = 1 AND SC.CONTRACT_ID IN (SELECT C.CONTRACT_ID FROM STAFF_PRODUCT P,STAFF_CONTRACT C WHERE P.CONTRACT_ID=C.CONTRACT_ID AND C.STAFF_ID = VS.STAFF_ID AND P.PRODUCT_ID IN (SELECT PRODUCT_ID FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1) AND C.AVAIL_DATE_FROM <= '2004-06-12' AND C.AVAIL_DATE_TO >= '2004-06-18' GROUP BY C.CONTRACT_ID HAVING (COUNT(C.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID) FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1))) The explain analyze is: QUERY PLAN Nested Loop (cost=101.54..1572059.57 rows=135 width=152) (actual time=13749.100..1304586.501 rows=429 loops=1) InitPlan -> Index Scan using fk_idx_wruserarea on wruserarea (cost=3.26..6.52 rows=1 width=4) (actual time=0.944..0.944 rows=1 loops=1) Index Cond: (area_id = 1) Filter: (uid = $4) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.686..0.691 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Hash Join (cost=95.02..3701.21 rows=215 width=138) (actual time=100.476..1337.392 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) Join Filter: (subplan) -> Seq Scan on staff_contract sc (cost=0.00..33.24 rows=1024 width=37) (actual time=0.114..245.366 rows=1024 loops=1) -> Hash (cost=93.95..93.95 rows=430 width=109) (actual time=38.563..38.563 rows=0 loops=1) -> Hash Join (cost=47.47..93.95 rows=430 width=109) (actual time=15.502..36.627 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) -> Seq Scan on staff (cost=34.61..66.48 rows=1030 width=105) (actual time=9.655..15.264 rows=1030 loops=1) Filter: ((hashed subplan) OR $5) SubPlan -> Seq Scan on staff_area (cost=10.73..33.38 rows=493 width=4) (actual time=8.452..8.452 rows=0 loops=1) Filter: ((hashed subplan) OR (area_id = 1)) SubPlan -> Seq Scan on wruserarea (cost=3.26..10.72 rows=5 width=4) (actual time=0.977..1.952 rows=1 loops=1) Filter: (uid = $1) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.921..0.926 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Hash (cost=11.79..11.79 rows=430 width=4) (actual time=5.705..5.705 rows=0 loops=1) -> Index Scan using fk_idx_search_reqt_result on search_reqt_result sr (cost=0.00..11.79 rows=430 width=4) (actual time=0.470..4.482 rows=429 loops=1) Index Cond: (search_id = 1) SubPlan -> HashAggregate (cost=8.32..8.32 rows=1 width=4) (actual time=2.157..2.157 rows=1 loops=429) Filter: (count(contract_id) = $9) InitPlan -> Aggregate (cost=1.04..1.04 rows=1 width=4) (actual time=0.172..0.173 rows=1 loops=1) -> Seq Scan on search_order_reqt (cost=0.00..1.04 rows=1 width=4) (actual time=0.022..0.038 rows=1 loops=1) Filter: (search_id = 1) -> Hash IN Join (cost=1.04..7.27 rows=1 width=4) (actual time=2.064..2.117 rows=1 loops=429) Hash Cond: ("outer".product_id = "inner".product_id) -> Nested Loop (cost=0.00..6.19 rows=7 width=8) (actual time=1.112..2.081 rows=8 loops=429) -> Index Scan using fk_idx_staff_contract_2 on staff_contract c (cost=0.00..3.03 rows=1 width=4) (actual time=0.206..0.245 rows=1 loops=429) Index Cond: (staff_id = $8) Filter: ((avail_date_from <= '2004-06-12'::date) AND (avail_date_to >= '2004-06-18'::date)) -> Index Scan using fk_idx_staff_product on staff_product p (cost=0.00..3.08 rows=6 width=8) (actual time=0.873..1.764 rows=8 loops=429) Index Cond: (p.contract_id = "outer".contract_id) -> Hash (cost=1.04..1.04 rows=1 width=4) (actual time=0.086..0.086 rows=0 loops=1) -> Seq Scan on search_order_reqt (cost=0.00..1.04 rows=1 width=4) (actual time=0.037..0.050 rows=1 loops=1) Filter: (search_id = 1) -> Index Scan using location_pkey on "location" (cost=0.00..12.66 rows=1 width=18) (actual time=0.876..0.887 rows=1 loops=429) Index Cond: ("location".location_id = "outer".location_id) Filter: ((area_id = 1) OR (subplan)) SubPlan -> Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on wruserarea (cost=3.26..9.64 rows=1 width=4) (never executed) Index Cond: ((area_id = 1) OR (area_id = $7)) Filter: (uid = $6) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (never executed) Filter: ((username)::name = "current_user"()) SubPlan -> Aggregate (cost=11233.28..11233.29 rows=1 width=2) (actual time=3036.814..3036.815 rows=1 loops=429) -> Nested Loop (cost=10391.71..11233.21 rows=30 width=2) (actual time=2817.923..3036.516 rows=34 loops=429) -> Hash Join (cost=10391.71..11142.43 rows=30 width=4) (actual time=2813.349..3007.936 rows=34 loops=429) Hash Cond: ("outer".booking_id = "inner".booking_id) -> Index Scan using booking_plan_idx2 on booking_plan bp (cost=0.00..572.52 rows=23720 width=4) (actual time=0.070..157.028 rows=24613 loops=429) Index Cond: ((booking_date >= '2004-06-12'::date) AND (booking_date <= '2004-06-18'::date)) -> Hash (cost=10382.78..10382.78 rows=3571 width=8) (actual time=2746.122..2746.122 rows=0 loops=429) -> Index Scan using fk_idx_staff_booking on staff_booking b (cost=0.00..10382.78 rows=3571 width=8) (actual time=14.168..2733.315 rows=3815 loops=429) Index Cond: (contract_id = $0) -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..3.01 rows=1 width=6) (actual time=0.826..0.832 rows=1 loops=14401) Index Cond: ("outer".reqt_id = r.reqt_id) -> Aggregate (cost=363.94..363.94 rows=1 width=0) (actual time=0.057..0.058 rows=1 loops=429) -> Nested Loop (cost=0.00..363.94 rows=1 width=0) (actual time=0.034..0.034 rows=0 loops=429) -> Index Scan using fk_idx_main_order on main_order (cost=0.00..4.99 rows=1 width=4) (actual time=0.031..0.031 rows=0 loops=429) Index Cond: (client_id = 6) -> Index Scan using fk_idx_timesheet_detail_3 on timesheet_detail (cost=0.00..358.93 rows=1 width=4) (never executed) Index Cond: ("outer".order_id = timesheet_detail.order_id) Filter: (contract_id = $0) Total runtime: 1304591.861 ms Long Time! The main issue here is that the RHOURS subselect is executed as a nested join 429 times. unfortunately this is an expensive subquery. SQLServer executed this in just over 1 second on comparable hardware. Looking at its execution plan it flattens out the two subselects with a merge join. So I manually rewrote the query using derived tables and joins as: SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS, SC.MIN_HOURS, TBOOK.RHOURS, TVIS.VISITS FROM SEARCH_REQT_RESULT SR JOIN STAFF_CONTRACT SC ON (SR.STAFF_ID = SC.STAFF_ID) AND SC.AVAIL_DATE_FROM <= '2004-06-12' AND SC.AVAIL_DATE_TO >= '2004-06-18' JOIN VSTAFF VS ON (VS.STAFF_ID = SC.STAFF_ID) JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID) LEFT OUTER JOIN (SELECT B.CONTRACT_ID, SUM(R.DURATION+1)/60.0 AS RHOURS FROM STAFF_BOOKING B JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18' JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID) GROUP BY B.CONTRACT_ID) AS TBOOK ON (SC.CONTRACT_ID = TBOOK.CONTRACT_ID) LEFT OUTER JOIN (SELECT CONTRACT_ID,COUNT(*) AS VISITS FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID) WHERE MAIN_ORDER.CLIENT_ID = 6 GROUP BY CONTRACT_ID) AS TVIS ON (TVIS.CONTRACT_ID = SC.CONTRACT_ID) JOIN (SELECT P.CONTRACT_ID FROM STAFF_PRODUCT P, SEARCH_ORDER_REQT SR WHERE P.PRODUCT_ID = SR.PRODUCT_ID AND SR.SEARCH_ID = 1 GROUP BY P.CONTRACT_ID HAVING (COUNT(P.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID) FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1))) AS TCONT ON (TCONT.CONTRACT_ID = SC.CONTRACT_ID) WHERE SR.SEARCH_ID = 1 With the explain analyze as: QUERY PLAN Hash Join (cost=137054.42..137079.74 rows=159 width=192) (actual time=6228.354..6255.058 rows=429 loops=1) Hash Cond: ("outer".contract_id = "inner".contract_id) InitPlan -> Index Scan using fk_idx_wruserarea on wruserarea (cost=3.26..6.52 rows=1 width=4) (actual time=0.850..0.850 rows=1 loops=1) Index Cond: (area_id = 1) Filter: (uid = $3) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.670..0.675 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Subquery Scan tcont (cost=152.63..161.81 rows=612 width=4) (actual time=36.312..42.268 rows=612 loops=1) -> HashAggregate (cost=152.63..155.69 rows=612 width=4) (actual time=36.301..40.040 rows=612 loops=1) Filter: (count(contract_id) = $7) InitPlan -> Aggregate (cost=1.04..1.04 rows=1 width=4) (actual time=0.107..0.108 rows=1 loops=1) -> Seq Scan on search_order_reqt (cost=0.00..1.04 rows=1 width=4) (actual time=0.025..0.037 rows=1 loops=1) Filter: (search_id = 1) -> Hash Join (cost=1.04..148.53 rows=612 width=4) (actual time=0.419..32.284 rows=612 loops=1) Hash Cond: ("outer".product_id = "inner".product_id) -> Seq Scan on staff_product p (cost=0.00..109.91 rows=6291 width=8) (actual time=0.117..17.943 rows=6291 loops=1) -> Hash (cost=1.04..1.04 rows=1 width=4) (actual time=0.190..0.190 rows=0 loops=1) -> Seq Scan on search_order_reqt sr (cost=0.00..1.04 rows=1 width=4) (actual time=0.165..0.177 rows=1 loops=1) Filter: (search_id = 1) -> Hash (cost=136894.61..136894.61 rows=266 width=192) (actual time=6191.923..6191.923 rows=0 loops=1) -> Merge Left Join (cost=136886.03..136894.61 rows=266 width=192) (actual time=6143.315..6189.685 rows=429 loops=1) Merge Cond: ("outer".contract_id = "inner".contract_id) -> Merge Left Join (cost=136517.64..136525.04 rows=266 width=184) (actual time=6142.896..6171.676 rows=429 loops=1) Merge Cond: ("outer".contract_id = "inner".contract_id) -> Sort (cost=5529.68..5530.34 rows=266 width=152) (actual time=129.548..130.027 rows=429 loops=1) Sort Key: sc.contract_id -> Nested Loop (cost=88.35..5518.96 rows=266 width=152) (actual time=33.213..121.666 rows=429 loops=1) -> Hash Join (cost=88.35..143.88 rows=424 width=138) (actual time=32.739..76.357 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) -> Hash Join (cost=47.47..93.95 rows=430 width=109) (actual time=15.232..40.040 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) -> Seq Scan on staff (cost=34.61..66.48 rows=1030 width=105) (actual time=9.412..16.105 rows=1030 loops=1) Filter: ((hashed subplan) OR $4) SubPlan -> Seq Scan on staff_area (cost=10.73..33.38 rows=493 width=4) (actual time=8.380..8.380 rows=0 loops=1) Filter: ((hashed subplan) OR (area_id = 1)) SubPlan -> Seq Scan on wruserarea (cost=3.26..10.72 rows=5 width=4) (actual time=0.953..1.941 rows=1 loops=1) Filter: (uid = $0) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.902..0.908 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Hash (cost=11.79..11.79 rows=430 width=4) (actual time=5.670..5.670 rows=0 loops=1) -> Index Scan using fk_idx_search_reqt_result on search_reqt_result sr (cost=0.00..11.79 rows=430 width=4) (actual time=0.448..4.516 rows=429 loops=1) Index Cond: (search_id = 1) -> Hash (cost=38.36..38.36 rows=1008 width=37) (actual time=17.386..17.386 rows=0 loops=1) -> Seq Scan on staff_contract sc (cost=0.00..38.36 rows=1008 width=37) (actual time=0.222..14.063 rows=1008 loops=1) Filter: ((avail_date_from <= '2004-06-12'::date) AND (avail_date_to >= '2004-06-18'::date)) -> Index Scan using location_pkey on "location" (cost=0.00..12.66 rows=1 width=18) (actual time=0.043..0.050 rows=1 loops=429) Index Cond: ("location".location_id = "outer".location_id) Filter: ((area_id = 1) OR (subplan)) SubPlan -> Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on wruserarea (cost=3.26..9.64 rows=1 width=4) (never executed) Index Cond: ((area_id = 1) OR (area_id = $6)) Filter: (uid = $5) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (never executed) Filter: ((username)::name = "current_user"()) -> Sort (cost=130987.97..130989.96 rows=797 width=36) (actual time=6013.254..6014.112 rows=746 loops=1) Sort Key: tbook.contract_id -> Subquery Scan tbook (cost=130933.62..130949.56 rows=797 width=36) (actual time=5993.070..6007.677 rows=746 loops=1) -> HashAggregate (cost=130933.62..130941.59 rows=797 width=6) (actual time=5993.055..6004.099 rows=746 loops=1) -> Merge Join (cost=74214.90..130815.02 rows=23720 width=6) (actual time=4950.951..5807.985 rows=24613 loops=1) Merge Cond: ("outer".reqt_id = "inner".reqt_id) -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..50734.20 rows=2206291 width=6) (actual time=0.444..2753.374 rows=447439 loops=1) -> Sort (cost=74214.90..74274.20 rows=23720 width=8) (actual time=1822.405..1856.081 rows=24613 loops=1) Sort Key: b.reqt_id -> Nested Loop (cost=0.00..72491.19 rows=23720 width=8) (actual time=1.955..1633.124 rows=24613 loops=1) -> Index Scan using booking_plan_idx2 on booking_plan bp (cost=0.00..572.52 rows=23720 width=4) (actual time=1.468..243.827 rows=24613 loops=1) Index Cond: ((booking_date >= '2004-06- 12'::date) AND (booking_date <= '2004-06-18'::date)) -> Index Scan using staff_booking_pkey on staff_booking b (cost=0.00..3.02 rows=1 width=12) (actual time=0.037..0.042 rows=1 loops=24613) Index Cond: ("outer".booking_id = b.booking_id) -> Sort (cost=368.38..368.55 rows=68 width=12) (actual time=0.338..0.338 rows=0 loops=1) Sort Key: tvis.contract_id -> Subquery Scan tvis (cost=365.46..366.31 rows=68 width=12) (actual time=0.307..0.307 rows=0 loops=1) -> HashAggregate (cost=365.46..365.63 rows=68 width=4) (actual time=0.302..0.302 rows=0 loops=1) -> Nested Loop (cost=0.00..365.12 rows=68 width=4) (actual time=0.290..0.290 rows=0 loops=1) -> Index Scan using fk_idx_main_order on main_order (cost=0.00..4.99 rows=1 width=4) (actual time=0.286..0.286 rows=0 loops=1) Index Cond: (client_id = 6) -> Index Scan using fk_idx_timesheet_detail_3 on timesheet_detail (cost=0.00..358.63 rows=120 width=8) (never executed) Index Cond: ("outer".order_id = timesheet_detail.order_id) Total runtime: 6266.205 ms This now gives me the same results, but with orders of magnitude better execution times! Oddly enough, SQLServer really struggles with the second query, taking longer then PostgreSQL!!!! Regards, Gary. On 3 Apr 2004 at 10:59, Josh Berkus wrote: Gary, > There are no indexes on the columns involved in the update, they are > not required for my usual select statements. This is an attempt to > slightly denormalise the design to get the performance up comparable > to SQL Server 2000. We hope to move some of our databases over to > PostgreSQL later in the year and this is part of the ongoing testing. > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > so I am hand optimising some of the more frequently used > SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets "smarter". -- -Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
pgsql-performance by date: