Thread: Re: PostgreSQL and Linux 2.6 kernel.
On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote: OK, some more detail: Before wiping 2.4 off my test box for the second time: SQL Statement for update: update staff_booking set time_from = r.time_from from order_reqt r where r.reqt_id = staff_booking.reqt_id; Explain: (on 2.4) QUERY PLAN Merge Join (cost=0.00..185731.30 rows=2845920 width=92) Merge Cond: ("outer".reqt_id = "inner".reqt_id) -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..53068.20 rows=2206291 width=6) -> Index Scan using staff_book_idx2 on staff_booking (cost=0.00..99579.21 rows=2845920 width=90) Total execution time: 18 hours 12 minutes vacuum full analyze: total time 3 hours 22 minutes Wait 2 hours for re-install 2.6, set params etc. restore database. Same SQL Statement Explain: (on 2.6) QUERY PLAN Merge Join (cost=0.00..209740.24 rows=2845920 width=92) Merge Cond: ("outer".reqt_id = "inner".reqt_id) -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..50734.20 rows=2206291 width=6) -> Index Scan using staff_book_idx2 on staff_booking (cost=0.00..117921.92 rows=2845920 width=90) Total execution time: 2 hours 53 minutes vacuum full analyze: total time 1 hours 6 minutes Table definitions for the two tables involved: CREATE TABLE ORDER_REQT ( REQT_ID SERIAL, ORDER_ID integer NOT NULL, DAYOFWEEK smallint NOT NULL CHECK (DAYOFWEEK BETWEEN 0 AND 6), TIME_FROM smallint NOT NULL CHECK (TIME_FROM BETWEEN 0 AND 1439), DURATION smallint NOT NULL CHECK (DURATION BETWEEN 0 AND 1439), PRODUCT_ID integer NOT NULL, NUMBER_REQT smallint NOT NULL DEFAULT (1), WROPTIONS integer NOT NULL DEFAULT 0, UID_REF integer NOT NULL, DT_STAMP timestamp NOT NULL DEFAULT current_timestamp, Sentinel_Priority integer NOT NULL DEFAULT 0, PERIOD smallint NOT NULL DEFAULT 1 CHECK (PERIOD BETWEEN -2 AND 4), FREQUENCY smallint NOT NULL DEFAULT 1, PRIMARY KEY (REQT_ID) ); CREATE TABLE STAFF_BOOKING ( BOOKING_ID SERIAL, REQT_ID integer NOT NULL, ENTITY_TYPE smallint NOT NULL DEFAULT 3 check(ENTITY_TYPE in(3,4)), STAFF_ID integer NOT NULL, CONTRACT_ID integer NOT NULL, TIME_FROM smallint NOT NULL CHECK (TIME_FROM BETWEEN 0 AND 1439), DURATION smallint NOT NULL CHECK (DURATION BETWEEN 0 AND 1439), PERIOD smallint NOT NULL DEFAULT 1 CHECK (PERIOD BETWEEN -2 AND 4), FREQUENCY smallint NOT NULL DEFAULT 1, TRAVEL_TO smallint NOT NULL DEFAULT 0, UID_REF integer NOT NULL, DT_STAMP timestamp NOT NULL DEFAULT current_timestamp, SELL_PRICE numeric(10,4) NOT NULL DEFAULT 0, COST_PRICE numeric(10,4) NOT NULL DEFAULT 0, MIN_SELL_PRICE numeric(10,4) NOT NULL DEFAULT 0, MIN_COST_PRICE numeric(10,4) NOT NULL DEFAULT 0, Sentinel_Priority integer NOT NULL DEFAULT 0, CHECK_INTERVAL smallint NOT NULL DEFAULT 0, STATUS smallint NOT NULL DEFAULT 0, WROPTIONS integer NOT NULL DEFAULT 0, PRIMARY KEY (BOOKING_ID) ); Foreign keys: ALTER TABLE ORDER_REQT ADD FOREIGN KEY ( ORDER_ID ) REFERENCES MAIN_ORDER ( ORDER_ID ) ON DELETE CASCADE; ALTER TABLE ORDER_REQT ADD FOREIGN KEY ( PRODUCT_ID ) REFERENCES PRODUCT ( PRODUCT_ID ); ALTER TABLE STAFF_BOOKING ADD FOREIGN KEY ( CONTRACT_ID ) REFERENCES STAFF_CONTRACT ( CONTRACT_ID ); ALTER TABLE STAFF_BOOKING ADD FOREIGN KEY ( STAFF_ID ) REFERENCES STAFF ( STAFF_ID ); Indexes: CREATE INDEX FK_IDX_ORDER_REQT ON ORDER_REQT ( ORDER_ID ); CREATE INDEX FK_IDX_ORDER_REQT_2 ON ORDER_REQT ( PRODUCT_ID ); CREATE INDEX ORDER_REQT_IDX ON ORDER_REQT ( ORDER_ID, PRODUCT_ID ); CREATE INDEX ORDER_REQT_IDX4 ON ORDER_REQT ( REQT_ID, TIME_FROM, DURATION ); CREATE INDEX FK_IDX_STAFF_BOOKING ON STAFF_BOOKING ( CONTRACT_ID ); CREATE INDEX FK_IDX_STAFF_BOOKING_2 ON STAFF_BOOKING ( STAFF_ID ); CREATE INDEX STAFF_BOOK_IDX1 ON STAFF_BOOKING ( STAFF_ID, REQT_ID ); CREATE INDEX STAFF_BOOK_IDX2 ON STAFF_BOOKING ( REQT_ID ); CREATE INDEX STAFF_BOOK_IDX3 ON STAFF_BOOKING ( BOOKING_ID, REQT_ID ); CREATE INDEX STAFF_BOOK_IDX4 ON STAFF_BOOKING ( BOOKING_ID, CONTRACT_ID ); There are no indexes on the columns involved in the update, they are not required for my usual select statements. This is an attempt to slightly denormalise the design to get the performance up comparable to SQL Server 2000. We hope to move some of our databases over to PostgreSQL later in the year and this is part of the ongoing testing. SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) so I am hand optimising some of the more frequently used SQL and/or tweaking the database design slightly. Later, after deciphering SQLServers graphical plans I will attempt to post comparitive performance/access plans, using the same data of course, if anyone would be interested.... Cheers, Gary. On 2 Apr 2004 at 1:32, Tom Lane wrote: > "Gary Doades" <gpd@gpdnet.co.uk> writes: > > As a test in PosgreSQL I issued a statement to update a single column > > of a table containing 2.8 million rows with the values of a column in > > a table with similar rowcount. Using the above spec I had to stop the > > server after 17 hours. The poor thing was thrashing the hard disk and > > doing more swapping than useful work. > > This statement is pretty much content-free, since you did not show us > the table schemas, the query, or the EXPLAIN output for the query. > (I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily > have provided all the other hard facts.) There's really no way to tell > where the bottleneck is. Maybe it's a kernel-level issue, but I would > not bet on that without more evidence. I'd definitely not bet on it > without direct confirmation that the same query plan was used in both > setups. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > > -- > Incoming mail is certified Virus Free. > Checked by AVG Anti-Virus (http://www.grisoft.com). > Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 >
Gary, > There are no indexes on the columns involved in the update, they are > not required for my usual select statements. This is an attempt to > slightly denormalise the design to get the performance up comparable > to SQL Server 2000. We hope to move some of our databases over to > PostgreSQL later in the year and this is part of the ongoing testing. > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > so I am hand optimising some of the more frequently used > SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets "smarter". -- -Josh Berkus Aglio Database Solutions San Francisco
Actually it hasn't been my experience either. Most of my queries against the database, large and small are either a little quicker or no real difference. I have only really noticed big differences under stress when memory (RAM) is being squeezed. The main winner on 2.6 seems to be write performance and memory management. Unfortunately I only have one test machine and I can't really keep switching between 2.4 and 2.6 to do the comparisons. I had written down 27 timings from a set of SQL of varying complexity using the 2.4 kernel. Each SQL statement was executed 10 times and the average of the last 5 was used. I can only really compare those timings against the new installation on 2.6. I know that this is not ideal "real world" testing, but it is good enough for me at the moment. Unless anyone has contradictory indications then I will proceed with 2.6. I did increase the default stats target from 10 to 50 and re-analysed. The explain numbers are slightly different, but the time to run was almost the same. Not surprising since the plan was the same. QUERY PLAN Merge Join (cost=0.00..192636.20 rows=2845920 width=92) Merge Cond: ("outer".reqt_id = "inner".reqt_id) -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..52662.40 rows=2206291 width=6) -> Index Scan using staff_book_idx2 on staff_booking (cost=0.00..102529.28 rows=2845920 width=90) On 3 Apr 2004 at 10:59, Josh Berkus wrote: Gary, > There are no indexes on the columns involved in the update, they are > not required for my usual select statements. This is an attempt to > slightly denormalise the design to get the performance up comparable > to SQL Server 2000. We hope to move some of our databases over to > PostgreSQL later in the year and this is part of the ongoing testing. > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > so I am hand optimising some of the more frequently used > SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets "smarter". -- -Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
Sorry, I think I misread your post in my last reply. I thought you were still talking about the big update.... The main thing I have noticed about SQLServer is it seems more willing to do hash or merge joins than PostgreSQL. I have experimented with various postgresql.conf parameters and even turned off nested loops to see the difference. When actually getting a merge join out of PostgreSQL when it wanted to do a nested loop it, not surprisingly, took longer to execute. Looking at the SQLServer plan it seemed to be spending MUCH less time in the sort operations than PostgreSQL. This is probably what leads SQLServer to go for hash/merge joins more often. The other problem is that the SQLServer timings are skewed by its query plan caching. For one query SQLserver plan said it spent 2% of its time in a big sort, the same query in PostgreSQL when hash join was forced spent 23% of its time on the sort (from explain analyse actual stats). I have played about with the sort_mem, but it doesn't make much diffrence. I have also noticed that SQLServer tends to fold more complex IN subselects into the main query using merge joins, maybe for the same reason as above. SQLServer seems to have some more "exotic" joins ("nested loop/left semi join","nested loop/left anti semi join"). These are probably just variants of nested loops, but I don't know enough about it to say if they make a difference. Clustered indexes and clustered index seeks also seem to be a big player in the more complex queries. I still have quite a lot comparitive testing and tuning to do before I can nail it down further, but I will let you know when I have some hard stats to go on. On 3 Apr 2004 at 10:59, Josh Berkus wrote: Gary, > There are no indexes on the columns involved in the update, they are > not required for my usual select statements. This is an attempt to > slightly denormalise the design to get the performance up comparable > to SQL Server 2000. We hope to move some of our databases over to > PostgreSQL later in the year and this is part of the ongoing testing. > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > so I am hand optimising some of the more frequently used > SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets "smarter". -- -Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
Following on from Josh's response and my previous reply on SQLServer planning. The main problem query is this one: SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS, SC.MIN_HOURS, (SELECT COUNT(*) FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID AND MAIN_ORDER.CLIENT_ID = 6) WHERE TIMESHEET_DETAIL.CONTRACT_ID = SC.CONTRACT_ID) AS VISITS, (SELECT (SUM(R.DURATION+1))/60.0 FROM ORDER_REQT R JOIN STAFF_BOOKING B ON (B.REQT_ID = R.REQT_ID) JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) WHERE B.CONTRACT_ID = SC.CONTRACT_ID AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18') AS RHOURS FROM VSTAFF VS JOIN STAFF_CONTRACT SC ON (SC.STAFF_ID = VS.STAFF_ID) JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID) JOIN SEARCH_REQT_RESULT SR ON (SR.STAFF_ID = VS.STAFF_ID) WHERE SR.SEARCH_ID = 1 AND SC.CONTRACT_ID IN (SELECT C.CONTRACT_ID FROM STAFF_PRODUCT P,STAFF_CONTRACT C WHERE P.CONTRACT_ID=C.CONTRACT_ID AND C.STAFF_ID = VS.STAFF_ID AND P.PRODUCT_ID IN (SELECT PRODUCT_ID FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1) AND C.AVAIL_DATE_FROM <= '2004-06-12' AND C.AVAIL_DATE_TO >= '2004-06-18' GROUP BY C.CONTRACT_ID HAVING (COUNT(C.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID) FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1))) The explain analyze is: QUERY PLAN Nested Loop (cost=101.54..1572059.57 rows=135 width=152) (actual time=13749.100..1304586.501 rows=429 loops=1) InitPlan -> Index Scan using fk_idx_wruserarea on wruserarea (cost=3.26..6.52 rows=1 width=4) (actual time=0.944..0.944 rows=1 loops=1) Index Cond: (area_id = 1) Filter: (uid = $4) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.686..0.691 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Hash Join (cost=95.02..3701.21 rows=215 width=138) (actual time=100.476..1337.392 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) Join Filter: (subplan) -> Seq Scan on staff_contract sc (cost=0.00..33.24 rows=1024 width=37) (actual time=0.114..245.366 rows=1024 loops=1) -> Hash (cost=93.95..93.95 rows=430 width=109) (actual time=38.563..38.563 rows=0 loops=1) -> Hash Join (cost=47.47..93.95 rows=430 width=109) (actual time=15.502..36.627 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) -> Seq Scan on staff (cost=34.61..66.48 rows=1030 width=105) (actual time=9.655..15.264 rows=1030 loops=1) Filter: ((hashed subplan) OR $5) SubPlan -> Seq Scan on staff_area (cost=10.73..33.38 rows=493 width=4) (actual time=8.452..8.452 rows=0 loops=1) Filter: ((hashed subplan) OR (area_id = 1)) SubPlan -> Seq Scan on wruserarea (cost=3.26..10.72 rows=5 width=4) (actual time=0.977..1.952 rows=1 loops=1) Filter: (uid = $1) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.921..0.926 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Hash (cost=11.79..11.79 rows=430 width=4) (actual time=5.705..5.705 rows=0 loops=1) -> Index Scan using fk_idx_search_reqt_result on search_reqt_result sr (cost=0.00..11.79 rows=430 width=4) (actual time=0.470..4.482 rows=429 loops=1) Index Cond: (search_id = 1) SubPlan -> HashAggregate (cost=8.32..8.32 rows=1 width=4) (actual time=2.157..2.157 rows=1 loops=429) Filter: (count(contract_id) = $9) InitPlan -> Aggregate (cost=1.04..1.04 rows=1 width=4) (actual time=0.172..0.173 rows=1 loops=1) -> Seq Scan on search_order_reqt (cost=0.00..1.04 rows=1 width=4) (actual time=0.022..0.038 rows=1 loops=1) Filter: (search_id = 1) -> Hash IN Join (cost=1.04..7.27 rows=1 width=4) (actual time=2.064..2.117 rows=1 loops=429) Hash Cond: ("outer".product_id = "inner".product_id) -> Nested Loop (cost=0.00..6.19 rows=7 width=8) (actual time=1.112..2.081 rows=8 loops=429) -> Index Scan using fk_idx_staff_contract_2 on staff_contract c (cost=0.00..3.03 rows=1 width=4) (actual time=0.206..0.245 rows=1 loops=429) Index Cond: (staff_id = $8) Filter: ((avail_date_from <= '2004-06-12'::date) AND (avail_date_to >= '2004-06-18'::date)) -> Index Scan using fk_idx_staff_product on staff_product p (cost=0.00..3.08 rows=6 width=8) (actual time=0.873..1.764 rows=8 loops=429) Index Cond: (p.contract_id = "outer".contract_id) -> Hash (cost=1.04..1.04 rows=1 width=4) (actual time=0.086..0.086 rows=0 loops=1) -> Seq Scan on search_order_reqt (cost=0.00..1.04 rows=1 width=4) (actual time=0.037..0.050 rows=1 loops=1) Filter: (search_id = 1) -> Index Scan using location_pkey on "location" (cost=0.00..12.66 rows=1 width=18) (actual time=0.876..0.887 rows=1 loops=429) Index Cond: ("location".location_id = "outer".location_id) Filter: ((area_id = 1) OR (subplan)) SubPlan -> Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on wruserarea (cost=3.26..9.64 rows=1 width=4) (never executed) Index Cond: ((area_id = 1) OR (area_id = $7)) Filter: (uid = $6) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (never executed) Filter: ((username)::name = "current_user"()) SubPlan -> Aggregate (cost=11233.28..11233.29 rows=1 width=2) (actual time=3036.814..3036.815 rows=1 loops=429) -> Nested Loop (cost=10391.71..11233.21 rows=30 width=2) (actual time=2817.923..3036.516 rows=34 loops=429) -> Hash Join (cost=10391.71..11142.43 rows=30 width=4) (actual time=2813.349..3007.936 rows=34 loops=429) Hash Cond: ("outer".booking_id = "inner".booking_id) -> Index Scan using booking_plan_idx2 on booking_plan bp (cost=0.00..572.52 rows=23720 width=4) (actual time=0.070..157.028 rows=24613 loops=429) Index Cond: ((booking_date >= '2004-06-12'::date) AND (booking_date <= '2004-06-18'::date)) -> Hash (cost=10382.78..10382.78 rows=3571 width=8) (actual time=2746.122..2746.122 rows=0 loops=429) -> Index Scan using fk_idx_staff_booking on staff_booking b (cost=0.00..10382.78 rows=3571 width=8) (actual time=14.168..2733.315 rows=3815 loops=429) Index Cond: (contract_id = $0) -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..3.01 rows=1 width=6) (actual time=0.826..0.832 rows=1 loops=14401) Index Cond: ("outer".reqt_id = r.reqt_id) -> Aggregate (cost=363.94..363.94 rows=1 width=0) (actual time=0.057..0.058 rows=1 loops=429) -> Nested Loop (cost=0.00..363.94 rows=1 width=0) (actual time=0.034..0.034 rows=0 loops=429) -> Index Scan using fk_idx_main_order on main_order (cost=0.00..4.99 rows=1 width=4) (actual time=0.031..0.031 rows=0 loops=429) Index Cond: (client_id = 6) -> Index Scan using fk_idx_timesheet_detail_3 on timesheet_detail (cost=0.00..358.93 rows=1 width=4) (never executed) Index Cond: ("outer".order_id = timesheet_detail.order_id) Filter: (contract_id = $0) Total runtime: 1304591.861 ms Long Time! The main issue here is that the RHOURS subselect is executed as a nested join 429 times. unfortunately this is an expensive subquery. SQLServer executed this in just over 1 second on comparable hardware. Looking at its execution plan it flattens out the two subselects with a merge join. So I manually rewrote the query using derived tables and joins as: SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS, SC.MIN_HOURS, TBOOK.RHOURS, TVIS.VISITS FROM SEARCH_REQT_RESULT SR JOIN STAFF_CONTRACT SC ON (SR.STAFF_ID = SC.STAFF_ID) AND SC.AVAIL_DATE_FROM <= '2004-06-12' AND SC.AVAIL_DATE_TO >= '2004-06-18' JOIN VSTAFF VS ON (VS.STAFF_ID = SC.STAFF_ID) JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID) LEFT OUTER JOIN (SELECT B.CONTRACT_ID, SUM(R.DURATION+1)/60.0 AS RHOURS FROM STAFF_BOOKING B JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18' JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID) GROUP BY B.CONTRACT_ID) AS TBOOK ON (SC.CONTRACT_ID = TBOOK.CONTRACT_ID) LEFT OUTER JOIN (SELECT CONTRACT_ID,COUNT(*) AS VISITS FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID) WHERE MAIN_ORDER.CLIENT_ID = 6 GROUP BY CONTRACT_ID) AS TVIS ON (TVIS.CONTRACT_ID = SC.CONTRACT_ID) JOIN (SELECT P.CONTRACT_ID FROM STAFF_PRODUCT P, SEARCH_ORDER_REQT SR WHERE P.PRODUCT_ID = SR.PRODUCT_ID AND SR.SEARCH_ID = 1 GROUP BY P.CONTRACT_ID HAVING (COUNT(P.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID) FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1))) AS TCONT ON (TCONT.CONTRACT_ID = SC.CONTRACT_ID) WHERE SR.SEARCH_ID = 1 With the explain analyze as: QUERY PLAN Hash Join (cost=137054.42..137079.74 rows=159 width=192) (actual time=6228.354..6255.058 rows=429 loops=1) Hash Cond: ("outer".contract_id = "inner".contract_id) InitPlan -> Index Scan using fk_idx_wruserarea on wruserarea (cost=3.26..6.52 rows=1 width=4) (actual time=0.850..0.850 rows=1 loops=1) Index Cond: (area_id = 1) Filter: (uid = $3) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.670..0.675 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Subquery Scan tcont (cost=152.63..161.81 rows=612 width=4) (actual time=36.312..42.268 rows=612 loops=1) -> HashAggregate (cost=152.63..155.69 rows=612 width=4) (actual time=36.301..40.040 rows=612 loops=1) Filter: (count(contract_id) = $7) InitPlan -> Aggregate (cost=1.04..1.04 rows=1 width=4) (actual time=0.107..0.108 rows=1 loops=1) -> Seq Scan on search_order_reqt (cost=0.00..1.04 rows=1 width=4) (actual time=0.025..0.037 rows=1 loops=1) Filter: (search_id = 1) -> Hash Join (cost=1.04..148.53 rows=612 width=4) (actual time=0.419..32.284 rows=612 loops=1) Hash Cond: ("outer".product_id = "inner".product_id) -> Seq Scan on staff_product p (cost=0.00..109.91 rows=6291 width=8) (actual time=0.117..17.943 rows=6291 loops=1) -> Hash (cost=1.04..1.04 rows=1 width=4) (actual time=0.190..0.190 rows=0 loops=1) -> Seq Scan on search_order_reqt sr (cost=0.00..1.04 rows=1 width=4) (actual time=0.165..0.177 rows=1 loops=1) Filter: (search_id = 1) -> Hash (cost=136894.61..136894.61 rows=266 width=192) (actual time=6191.923..6191.923 rows=0 loops=1) -> Merge Left Join (cost=136886.03..136894.61 rows=266 width=192) (actual time=6143.315..6189.685 rows=429 loops=1) Merge Cond: ("outer".contract_id = "inner".contract_id) -> Merge Left Join (cost=136517.64..136525.04 rows=266 width=184) (actual time=6142.896..6171.676 rows=429 loops=1) Merge Cond: ("outer".contract_id = "inner".contract_id) -> Sort (cost=5529.68..5530.34 rows=266 width=152) (actual time=129.548..130.027 rows=429 loops=1) Sort Key: sc.contract_id -> Nested Loop (cost=88.35..5518.96 rows=266 width=152) (actual time=33.213..121.666 rows=429 loops=1) -> Hash Join (cost=88.35..143.88 rows=424 width=138) (actual time=32.739..76.357 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) -> Hash Join (cost=47.47..93.95 rows=430 width=109) (actual time=15.232..40.040 rows=429 loops=1) Hash Cond: ("outer".staff_id = "inner".staff_id) -> Seq Scan on staff (cost=34.61..66.48 rows=1030 width=105) (actual time=9.412..16.105 rows=1030 loops=1) Filter: ((hashed subplan) OR $4) SubPlan -> Seq Scan on staff_area (cost=10.73..33.38 rows=493 width=4) (actual time=8.380..8.380 rows=0 loops=1) Filter: ((hashed subplan) OR (area_id = 1)) SubPlan -> Seq Scan on wruserarea (cost=3.26..10.72 rows=5 width=4) (actual time=0.953..1.941 rows=1 loops=1) Filter: (uid = $0) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.902..0.908 rows=1 loops=1) Filter: ((username)::name = "current_user"()) -> Hash (cost=11.79..11.79 rows=430 width=4) (actual time=5.670..5.670 rows=0 loops=1) -> Index Scan using fk_idx_search_reqt_result on search_reqt_result sr (cost=0.00..11.79 rows=430 width=4) (actual time=0.448..4.516 rows=429 loops=1) Index Cond: (search_id = 1) -> Hash (cost=38.36..38.36 rows=1008 width=37) (actual time=17.386..17.386 rows=0 loops=1) -> Seq Scan on staff_contract sc (cost=0.00..38.36 rows=1008 width=37) (actual time=0.222..14.063 rows=1008 loops=1) Filter: ((avail_date_from <= '2004-06-12'::date) AND (avail_date_to >= '2004-06-18'::date)) -> Index Scan using location_pkey on "location" (cost=0.00..12.66 rows=1 width=18) (actual time=0.043..0.050 rows=1 loops=429) Index Cond: ("location".location_id = "outer".location_id) Filter: ((area_id = 1) OR (subplan)) SubPlan -> Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on wruserarea (cost=3.26..9.64 rows=1 width=4) (never executed) Index Cond: ((area_id = 1) OR (area_id = $6)) Filter: (uid = $5) InitPlan -> Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (never executed) Filter: ((username)::name = "current_user"()) -> Sort (cost=130987.97..130989.96 rows=797 width=36) (actual time=6013.254..6014.112 rows=746 loops=1) Sort Key: tbook.contract_id -> Subquery Scan tbook (cost=130933.62..130949.56 rows=797 width=36) (actual time=5993.070..6007.677 rows=746 loops=1) -> HashAggregate (cost=130933.62..130941.59 rows=797 width=6) (actual time=5993.055..6004.099 rows=746 loops=1) -> Merge Join (cost=74214.90..130815.02 rows=23720 width=6) (actual time=4950.951..5807.985 rows=24613 loops=1) Merge Cond: ("outer".reqt_id = "inner".reqt_id) -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..50734.20 rows=2206291 width=6) (actual time=0.444..2753.374 rows=447439 loops=1) -> Sort (cost=74214.90..74274.20 rows=23720 width=8) (actual time=1822.405..1856.081 rows=24613 loops=1) Sort Key: b.reqt_id -> Nested Loop (cost=0.00..72491.19 rows=23720 width=8) (actual time=1.955..1633.124 rows=24613 loops=1) -> Index Scan using booking_plan_idx2 on booking_plan bp (cost=0.00..572.52 rows=23720 width=4) (actual time=1.468..243.827 rows=24613 loops=1) Index Cond: ((booking_date >= '2004-06- 12'::date) AND (booking_date <= '2004-06-18'::date)) -> Index Scan using staff_booking_pkey on staff_booking b (cost=0.00..3.02 rows=1 width=12) (actual time=0.037..0.042 rows=1 loops=24613) Index Cond: ("outer".booking_id = b.booking_id) -> Sort (cost=368.38..368.55 rows=68 width=12) (actual time=0.338..0.338 rows=0 loops=1) Sort Key: tvis.contract_id -> Subquery Scan tvis (cost=365.46..366.31 rows=68 width=12) (actual time=0.307..0.307 rows=0 loops=1) -> HashAggregate (cost=365.46..365.63 rows=68 width=4) (actual time=0.302..0.302 rows=0 loops=1) -> Nested Loop (cost=0.00..365.12 rows=68 width=4) (actual time=0.290..0.290 rows=0 loops=1) -> Index Scan using fk_idx_main_order on main_order (cost=0.00..4.99 rows=1 width=4) (actual time=0.286..0.286 rows=0 loops=1) Index Cond: (client_id = 6) -> Index Scan using fk_idx_timesheet_detail_3 on timesheet_detail (cost=0.00..358.63 rows=120 width=8) (never executed) Index Cond: ("outer".order_id = timesheet_detail.order_id) Total runtime: 6266.205 ms This now gives me the same results, but with orders of magnitude better execution times! Oddly enough, SQLServer really struggles with the second query, taking longer then PostgreSQL!!!! Regards, Gary. On 3 Apr 2004 at 10:59, Josh Berkus wrote: Gary, > There are no indexes on the columns involved in the update, they are > not required for my usual select statements. This is an attempt to > slightly denormalise the design to get the performance up comparable > to SQL Server 2000. We hope to move some of our databases over to > PostgreSQL later in the year and this is part of the ongoing testing. > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > so I am hand optimising some of the more frequently used > SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets "smarter". -- -Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
Thanks, I know about set showplan_text, but it is only the equivalent of explain, not explain analyze. The graphical plan gives full statistics, runtime, percentage cost, loop execution counts etc. which is much more useful. I don't know of a way of getting the graphical plan content in text form. Cheers, Gary. On 3 Apr 2004 at 6:50, @g v t c wrote: Use "Set Show_Plan" or something of the sort in Query Analyzer. Then run your SQL. This will change the graphical plan to a text plan similar to Postgresql or at least something close to readable. Gary Doades wrote: >On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote: > >OK, some more detail: > >Before wiping 2.4 off my test box for the second time: > >SQL Statement for update: >update staff_booking set time_from = r.time_from from order_reqt r where r.reqt_id = >staff_booking.reqt_id; > >Explain: (on 2.4) >QUERY PLAN >Merge Join (cost=0.00..185731.30 rows=2845920 width=92) > Merge Cond: ("outer".reqt_id = "inner".reqt_id) > -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..53068.20 >rows=2206291 width=6) > -> Index Scan using staff_book_idx2 on staff_booking (cost=0.00..99579.21 >rows=2845920 width=90) > >Total execution time: 18 hours 12 minutes > >vacuum full analyze: total time 3 hours 22 minutes > >Wait 2 hours for re-install 2.6, set params etc. >restore database. > >Same SQL Statement >Explain: (on 2.6) >QUERY PLAN >Merge Join (cost=0.00..209740.24 rows=2845920 width=92) > Merge Cond: ("outer".reqt_id = "inner".reqt_id) > -> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..50734.20 >rows=2206291 width=6) > -> Index Scan using staff_book_idx2 on staff_booking (cost=0.00..117921.92 >rows=2845920 width=90) > >Total execution time: 2 hours 53 minutes > >vacuum full analyze: total time 1 hours 6 minutes > >Table definitions for the two tables involved: >CREATE TABLE ORDER_REQT >( > REQT_ID SERIAL, > ORDER_ID integer NOT NULL, > DAYOFWEEK smallint NOT NULL CHECK (DAYOFWEEK >BETWEEN 0 AND 6), > TIME_FROM smallint NOT NULL CHECK (TIME_FROM >BETWEEN 0 AND 1439), > DURATION smallint NOT NULL CHECK (DURATION >BETWEEN 0 AND 1439), > PRODUCT_ID integer NOT NULL, > NUMBER_REQT smallint NOT NULL DEFAULT (1), > WROPTIONS integer NOT NULL DEFAULT 0, > UID_REF integer NOT NULL, > DT_STAMP timestamp NOT NULL DEFAULT >current_timestamp, > Sentinel_Priority integer NOT NULL DEFAULT 0, > PERIOD smallint NOT NULL DEFAULT 1 CHECK >(PERIOD BETWEEN -2 AND 4), > FREQUENCY smallint NOT NULL DEFAULT 1, > PRIMARY KEY (REQT_ID) >); > >CREATE TABLE STAFF_BOOKING >( > BOOKING_ID SERIAL, > REQT_ID integer NOT NULL, > ENTITY_TYPE smallint NOT NULL DEFAULT 3 >check(ENTITY_TYPE in(3,4)), > STAFF_ID integer NOT NULL, > CONTRACT_ID integer NOT NULL, > TIME_FROM smallint NOT NULL CHECK (TIME_FROM >BETWEEN 0 AND 1439), > DURATION smallint NOT NULL CHECK (DURATION >BETWEEN 0 AND 1439), > PERIOD smallint NOT NULL DEFAULT 1 CHECK >(PERIOD BETWEEN -2 AND 4), > FREQUENCY smallint NOT NULL DEFAULT 1, > TRAVEL_TO smallint NOT NULL DEFAULT 0, > UID_REF integer NOT NULL, > DT_STAMP timestamp NOT NULL DEFAULT >current_timestamp, > SELL_PRICE numeric(10,4) NOT NULL DEFAULT 0, > COST_PRICE numeric(10,4) NOT NULL DEFAULT 0, > MIN_SELL_PRICE numeric(10,4) NOT NULL DEFAULT 0, > MIN_COST_PRICE numeric(10,4) NOT NULL DEFAULT 0, > Sentinel_Priority integer NOT NULL DEFAULT 0, > CHECK_INTERVAL smallint NOT NULL DEFAULT 0, > STATUS smallint NOT NULL DEFAULT 0, > WROPTIONS integer NOT NULL DEFAULT 0, > PRIMARY KEY (BOOKING_ID) >); > >Foreign keys: > >ALTER TABLE ORDER_REQT ADD > FOREIGN KEY > ( > ORDER_ID > ) REFERENCES MAIN_ORDER ( > ORDER_ID > ) ON DELETE CASCADE; > >ALTER TABLE ORDER_REQT ADD > FOREIGN KEY > ( > PRODUCT_ID > ) REFERENCES PRODUCT ( > PRODUCT_ID > ); > >ALTER TABLE STAFF_BOOKING ADD > FOREIGN KEY > ( > CONTRACT_ID > ) REFERENCES STAFF_CONTRACT ( > CONTRACT_ID > ); > >ALTER TABLE STAFF_BOOKING ADD > FOREIGN KEY > ( > STAFF_ID > ) REFERENCES STAFF ( > STAFF_ID > ); > > >Indexes: > >CREATE INDEX FK_IDX_ORDER_REQT > ON ORDER_REQT > ( > ORDER_ID > ); > >CREATE INDEX FK_IDX_ORDER_REQT_2 > ON ORDER_REQT > ( > PRODUCT_ID > ); > >CREATE INDEX ORDER_REQT_IDX ON ORDER_REQT >( > ORDER_ID, > PRODUCT_ID >); > >CREATE INDEX ORDER_REQT_IDX4 ON ORDER_REQT >( > REQT_ID, > TIME_FROM, > DURATION >); > >CREATE INDEX FK_IDX_STAFF_BOOKING > ON STAFF_BOOKING > ( > CONTRACT_ID > ); > >CREATE INDEX FK_IDX_STAFF_BOOKING_2 > ON STAFF_BOOKING > ( > STAFF_ID > ); > >CREATE INDEX STAFF_BOOK_IDX1 ON STAFF_BOOKING >( > STAFF_ID, > REQT_ID >); > >CREATE INDEX STAFF_BOOK_IDX2 ON STAFF_BOOKING >( > REQT_ID >); > >CREATE INDEX STAFF_BOOK_IDX3 ON STAFF_BOOKING >( > BOOKING_ID, > REQT_ID >); > > >CREATE INDEX STAFF_BOOK_IDX4 ON STAFF_BOOKING >( > BOOKING_ID, > CONTRACT_ID >); > >There are no indexes on the columns involved in the update, they are >not required for my usual select statements. This is an attempt to >slightly denormalise the design to get the performance up comparable >to SQL Server 2000. We hope to move some of our databases over to >PostgreSQL later in the year and this is part of the ongoing testing. >SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) >so I am hand optimising some of the more frequently used >SQL and/or tweaking the database design slightly. > >Later, after deciphering SQLServers graphical plans I will attempt to >post comparitive performance/access plans, using the same data of >course, if anyone would be interested.... > >Cheers, >Gary. > > > >On 2 Apr 2004 at 1:32, Tom Lane wrote: > > > >>"Gary Doades" <gpd@gpdnet.co.uk> writes: >> >> >>>As a test in PosgreSQL I issued a statement to update a single column >>>of a table containing 2.8 million rows with the values of a column in >>>a table with similar rowcount. Using the above spec I had to stop the >>>server after 17 hours. The poor thing was thrashing the hard disk and >>>doing more swapping than useful work. >>> >>> >> >>This statement is pretty much content-free, since you did not show us >>the table schemas, the query, or the EXPLAIN output for the query. >>(I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily >>have provided all the other hard facts.) There's really no way to tell >>where the bottleneck is. Maybe it's a kernel-level issue, but I would >>not bet on that without more evidence. I'd definitely not bet on it >>without direct confirmation that the same query plan was used in both >>setups. >> >> regards, tom lane >> >>---------------------------(end of broadcast)--------------------------- >>TIP 3: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> >> >>-- >>Incoming mail is certified Virus Free. >>Checked by AVG Anti-Virus (http://www.grisoft.com). >>Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 >> >> >> > > > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > > -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
Almost any cross dbms migration shows a drop in performance. The engine effectively trains developers and administrators in what works and what doesn't. The initial migration thus compares a tuned to an untuned version. /Aaron ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Gary Doades" <gpd@gpdnet.co.uk>; <pgsql-performance@postgresql.org> Sent: Saturday, April 03, 2004 1:59 PM Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel. > Gary, > > > There are no indexes on the columns involved in the update, they are > > not required for my usual select statements. This is an attempt to > > slightly denormalise the design to get the performance up comparable > > to SQL Server 2000. We hope to move some of our databases over to > > PostgreSQL later in the year and this is part of the ongoing testing. > > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > > so I am hand optimising some of the more frequently used > > SQL and/or tweaking the database design slightly. > > Hmmm ... that hasn't been my general experience on complex queries. However, > it may be due to a difference in ANALYZE statistics. I'd love to see you > increase your default_stats_target, re-analyze, and see if PostgreSQL gets > "smarter". > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Sat, 2004-04-03 at 03:50, Gary Doades wrote: > On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote: > > OK, some more detail: > > Before wiping 2.4 off my test box for the second time: Perhaps I missed it, but which io scheduler are you using under 2.6?
> Almost any cross dbms migration shows a drop in performance. The engine > effectively trains developers and administrators in what works and what > doesn't. The initial migration thus compares a tuned to an untuned version. I think it is also possible that Microsoft has more programmers working on tuning issues for SQL Server than PostgreSQL has working on the whole project. -- Mike Nolan
Hi Friends, Does anybody know the substitute of the oracle function 'connect by prior' in postgre sql. The query is basically being used to get a tree structure of records. The query in oracle is :- select pkmsgid from mstmessage connect by prior pkmsgid = msgparentid start with msgparentid = 1 Kindly suggest. regards Kamal ********************************************************************* Network Programs is a SEI CMM Level 5 Certified Company ******************************************************************** The information contained in this communication (including any attachments) is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. Network Programs (India) Limited is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. *********************************************************************
Possibly. A lot of my queries show comparable performance, some a little slower and a few a little faster. There are a few, however, that really grind on PostgreSQL. I am leaning patterns from these to try and and target the most likely performance problems to come and hand tune these types of SQL. I'm not complaining about PostgreSQL or saying that SQLServer is better, in most cases it is not. SQLServer seems to be more predictable and forgiving in performance which tends to make for lazy SQL programming. It also has implications when the SQL is dynamically created based on user input, there are more chances of PostgreSQL hitting a performance problem than SQLServer. Overall I'm still very impressed with PostgreSQL. Given the $7000 per processor licence for SQLServer makes the case for PostgreSQL even stronger! Cheers, Gary. On 3 Apr 2004 at 17:43, Aaron Werman wrote: Almost any cross dbms migration shows a drop in performance. The engine effectively trains developers and administrators in what works and what doesn't. The initial migration thus compares a tuned to an untuned version. /Aaron ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Gary Doades" <gpd@gpdnet.co.uk>; <pgsql-performance@postgresql.org> Sent: Saturday, April 03, 2004 1:59 PM Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel. > Gary, > > > There are no indexes on the columns involved in the update, they are > > not required for my usual select statements. This is an attempt to > > slightly denormalise the design to get the performance up comparable > > to SQL Server 2000. We hope to move some of our databases over to > > PostgreSQL later in the year and this is part of the ongoing testing. > > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > > so I am hand optimising some of the more frequently used > > SQL and/or tweaking the database design slightly. > > Hmmm ... that hasn't been my general experience on complex queries. However, > it may be due to a difference in ANALYZE statistics. I'd love to see you > increase your default_stats_target, re-analyze, and see if PostgreSQL gets > "smarter". > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
Unfortunately I don't understand the question! My background is the primarily Win32. The last time I used a *nix OS was about 20 years ago apart from occasional dips into the linux OS over the past few years. If you can tell be how to find out what you want I will gladly give you the information. Regards, Gary. On 3 Apr 2004 at 16:52, Cott Lang wrote: > On Sat, 2004-04-03 at 03:50, Gary Doades wrote: > > On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote: > > > > OK, some more detail: > > > > Before wiping 2.4 off my test box for the second time: > > Perhaps I missed it, but which io scheduler are you using under 2.6? > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > > -- > Incoming mail is certified Virus Free. > Checked by AVG Anti-Virus (http://www.grisoft.com). > Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 >
Unfortunately I have to try and keep both SQLServer and PostgreSQL compatibilty. Our main web application is currently SQLServer, but we want to migrate customers who don't care what the DB server is over to PostgreSQL. Some of our larger customers demand SQLServer, you know how it is! I don't want to maintain two sets of code or SQL, so I am trying to find common ground. The code is not a problem, but the SQL sometimes is. Cheers, Gary. On 3 Apr 2004 at 17:43, Aaron Werman wrote: > Almost any cross dbms migration shows a drop in performance. The engine > effectively trains developers and administrators in what works and what > doesn't. The initial migration thus compares a tuned to an untuned version. > > /Aaron > > ----- Original Message ----- > From: "Josh Berkus" <josh@agliodbs.com> > To: "Gary Doades" <gpd@gpdnet.co.uk>; <pgsql-performance@postgresql.org> > Sent: Saturday, April 03, 2004 1:59 PM > Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel. > > > > Gary, > > > > > There are no indexes on the columns involved in the update, they are > > > not required for my usual select statements. This is an attempt to > > > slightly denormalise the design to get the performance up comparable > > > to SQL Server 2000. We hope to move some of our databases over to > > > PostgreSQL later in the year and this is part of the ongoing testing. > > > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) > > > so I am hand optimising some of the more frequently used > > > SQL and/or tweaking the database design slightly. > > > > Hmmm ... that hasn't been my general experience on complex queries. > However, > > it may be due to a difference in ANALYZE statistics. I'd love to see you > > increase your default_stats_target, re-analyze, and see if PostgreSQL gets > > "smarter". > > > > -- > > -Josh Berkus > > Aglio Database Solutions > > San Francisco > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > > -- > Incoming mail is certified Virus Free. > Checked by AVG Anti-Virus (http://www.grisoft.com). > Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 >
On 3 Apr 2004 at 21:23, Mike Nolan wrote: > > Almost any cross dbms migration shows a drop in performance. The engine > > effectively trains developers and administrators in what works and what > > doesn't. The initial migration thus compares a tuned to an untuned version. > > I think it is also possible that Microsoft has more programmers working > on tuning issues for SQL Server than PostgreSQL has working on the > whole project. > -- > Mike Nolan > Agreed. Also considering the high price of SQLServer it is in their interests to spend a lot of resources on tuning/performance to give it a commercial edge over it rivals and in silly benchmark scores. Cheers, Gary. > -- > Incoming mail is certified Virus Free. > Checked by AVG Anti-Virus (http://www.grisoft.com). > Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 >
On 04/04/2004 09:56 Gary Doades wrote: > Unfortunately I don't understand the question! > > My background is the primarily Win32. The last time I used a *nix OS > was about 20 years ago apart from occasional dips into the linux OS > over the past few years. If you can tell be how to find out what you want > > I will gladly give you the information. Googling threw up http://spider.tm/apr2004/cstory2.html Interesting and possibly relevant quote: "Benchmarks have shown that in certain conditions the anticipatory algorithm is almost 10 times faster than what 2.4 kernel supports". HTH -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
Hi, Try looking at the contrib/tablefunc add-in module. Chris Kamalraj Singh Madhan wrote: > Hi Friends, > Does anybody know the substitute of the oracle function 'connect by > prior' in postgre sql. > The query is basically being used to get a tree structure of records. The > query in oracle is :- > > select pkmsgid > from mstmessage > connect by prior pkmsgid = msgparentid > start with msgparentid = 1 > > Kindly suggest. > > regards > Kamal > > > > ********************************************************************* > Network Programs is a SEI CMM Level 5 Certified Company > ******************************************************************** > The information contained in this communication (including any attachments) is > intended solely for the use of the individual or entity to whom it is addressed > and others authorized to receive it. It may contain confidential or legally > privileged information. If you are not the intended recipient you are hereby > notified that any disclosure, copying, distribution or taking any action in > reliance on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify us > immediately by responding to this email and then delete it from your system. > Network Programs (India) Limited is neither liable for the proper and complete > transmission of the information contained in this communication nor for any > delay in its receipt. > ********************************************************************* > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Sun, 2004-04-04 at 01:56, Gary Doades wrote: > Unfortunately I don't understand the question! > > My background is the primarily Win32. The last time I used a *nix OS > was about 20 years ago apart from occasional dips into the linux OS > over the past few years. If you can tell be how to find out what you want > I will gladly give you the information. There are two available io schedulers in 2.6 (new feature), deadline and anticipatory. It should show be listed in the boot messages: dmesg | grep scheduler I've seen people arguing for each of the two schedulers, saying one is better than the other for databases. I'm curious which one you're using. :)
It says: Using anticipatory io scheduler. This then fits with the earlier post on other observations of up to 10 times better performance, which I what I was seeing in in certain circumstances. Cheers, Gary. On 4 Apr 2004 at 6:04, Cott Lang wrote: > On Sun, 2004-04-04 at 01:56, Gary Doades wrote: > > Unfortunately I don't understand the question! > > > > My background is the primarily Win32. The last time I used a *nix OS > > was about 20 years ago apart from occasional dips into the linux OS > > over the past few years. If you can tell be how to find out what you want > > I will gladly give you the information. > > There are two available io schedulers in 2.6 (new feature), deadline and > anticipatory. It should show be listed in the boot messages: > > dmesg | grep scheduler > > I've seen people arguing for each of the two schedulers, saying one is > better than the other for databases. I'm curious which one you're > using. :) > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > -- > Incoming mail is certified Virus Free. > Checked by AVG Anti-Virus (http://www.grisoft.com). > Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 >
Mike, > I think it is also possible that Microsoft has more programmers working > on tuning issues for SQL Server than PostgreSQL has working on the > whole project. Ah, but quantity != quality. Or they wouldn't be trolling our mailing lists trying to hire PostgreSQL programmers for the SQL Server project (really!). And we had nearly 200 contributors between 7.3 and 7.4 ... a respectable development staff for even a large corporation. Point taken, though, SQL Server has done a better job in opitimizing for "dumb" queries. This is something that PostgreSQL needs to work on, as is self-referential updates for large tables, which also tend to be really slow. Mind you, in SQL Server 7 I used to be able to crash the server with a big self-referential update, so this is a common database problem. Unfortunately, these days only Tom and Neil seem to be seriously working on the query planner (beg pardon in advance if I've missed someone) so I think the real answer is that we need another person interested in this kind of optimization before it's going to get much better. -- Josh Berkus Aglio Database Solutions San Francisco
On 4 Apr, Cott Lang wrote: > On Sun, 2004-04-04 at 01:56, Gary Doades wrote: >> Unfortunately I don't understand the question! >> >> My background is the primarily Win32. The last time I used a *nix OS >> was about 20 years ago apart from occasional dips into the linux OS >> over the past few years. If you can tell be how to find out what you want >> I will gladly give you the information. > > There are two available io schedulers in 2.6 (new feature), deadline and > anticipatory. It should show be listed in the boot messages: > > dmesg | grep scheduler > > I've seen people arguing for each of the two schedulers, saying one is > better than the other for databases. I'm curious which one you're > using. :) Our database tests (TPC fair use implementations) show that the deadline scheduler has an edge on the anticipatory scheduler. Depending on the current state of the AS scheduler, it can be within a few percent to 10% or so. I have some data with one of our tests here: http://developer.osdl.org/markw/fs/dbt2_project_results.html Mark
On 5 Apr 2004 at 8:36, Josh Berkus wrote: > > Point taken, though, SQL Server has done a better job in opitimizing for > "dumb" queries. This is something that PostgreSQL needs to work on, as is > self-referential updates for large tables, which also tend to be really slow. > Mind you, in SQL Server 7 I used to be able to crash the server with a big > self-referential update, so this is a common database problem. > I agree about the "dumb" queries (I'm not mine are *that* dumb :) ) When you can write SQL that looks right, feels right, gives the right answers during testing and SQLServer runs them really fast, you stop there and tend not to tinker with the SQL further. You *can* (I certainly do) achieve comparable performance with PostgreSQL, but you just have to work harder for it. Now that I have learned the characteristics of both servers I can write SQL that is pretty good on both. I suspect that there are people who evaluate PostgreSQL by executing their favorite SQLSever queries against it, see that it is slower and never bother to go further. Cheers, Gary.
On Mon, 2004-04-05 at 11:36, Josh Berkus wrote: > Unfortunately, these days only Tom and Neil seem to be seriously working on > the query planner (beg pardon in advance if I've missed someone) Actually, Tom is the only person actively working on the planner -- while I hope to contribute to it in the future, I haven't done so yet. -Neil
> Josh Berkus wrote: > Unfortunately, these days only Tom and Neil seem to be > seriously working on > the query planner (beg pardon in advance if I've missed > someone) so I think > the real answer is that we need another person interested in > this kind of > optimization before it's going to get much better. > Hmmmm. Interesting line of thought. Is the problem "a person interested" or is there another issue there? I was thinking the other day that maybe removing the ability to control join order through explicitly manipulating the FROM clause might actually be counter productive, in terms of longer term improvement of the optimizer. Treating the optimizer as a black box is something I'm very used to from other RDBMS. My question is, how can you explicitly re-write a query now to "improve" it? If there's no way of manipulating queries without actually re-writing the optimizer, we're now in a position where we aren't able to diagnose when the optimizer isn't working effectively. For my mind, all the people on this list are potential "optimizer developers" in the sense that we can all look at queries and see whether there is a problem with particular join plans. Providing good cases of poor optimization is just what's needed to assist those few that do understand the internals to continue improving things. I guess what I'm saying is it's not how many people you've got working on the optimizer, its how many accurate field reports of less-than perfect optimization reach them. In that case, PostgreSQL is likely in a better position than Microsoft, since the accessibility of the pg discussion lists makes such cases much more likely to get aired. Any thoughts? Best Regards, Simon Riggs
On Wed, Apr 14, 2004 at 21:12:18 +0100, Simon Riggs <simon@2ndquadrant.com> wrote: > > I guess what I'm saying is it's not how many people you've got working > on the optimizer, its how many accurate field reports of less-than > perfect optimization reach them. In that case, PostgreSQL is likely in a > better position than Microsoft, since the accessibility of the pg > discussion lists makes such cases much more likely to get aired. > > Any thoughts? I have seen exactly this happen a number of times over the last several years. However there is still only one Tom Lane implementing the improvements.
> Bruno Wolff > Simon Riggs <simon@2ndquadrant.com> wrote: > > > > I guess what I'm saying is it's not how many people you've > got working > > on the optimizer, its how many accurate field reports of less-than > > perfect optimization reach them. In that case, PostgreSQL > is likely in a > > better position than Microsoft, since the accessibility of the pg > > discussion lists makes such cases much more likely to get aired. > > > > Any thoughts? > > I have seen exactly this happen a number of times over the > last several > years. However there is still only one Tom Lane implementing the > improvements. > ...and very few Mr.Microsofts too. [I'm uncomfortable with, and it was not my intent, to discuss such an issue with direct reference to particular individuals. There is no intent to critiscise or malign anybody named] Regards, Simon
Simon, > Is the problem "a person interested" or is there another issue there? IMHO, it's "a person interested". > Treating the optimizer as a black box is something I'm very used to from > other RDBMS. My question is, how can you explicitly re-write a query now > to "improve" it? If there's no way of manipulating queries without > actually re-writing the optimizer, we're now in a position where we > aren't able to diagnose when the optimizer isn't working effectively. Well, there is ... all of the various query cost parameters. > For my mind, all the people on this list are potential "optimizer > developers" in the sense that we can all look at queries and see whether > there is a problem with particular join plans. Providing good cases of > poor optimization is just what's needed to assist those few that do > understand the internals to continue improving things. ... which is what this list is for. But, ultimately, improvements on the planner are still bottlenecked by having only one developer actually hacking the changes. -- Josh Berkus Aglio Database Solutions San Francisco
Bruno Wolff III <bruno@wolff.to> writes: > I have seen exactly this happen a number of times over the last several > years. However there is still only one Tom Lane implementing the > improvements. Ob: Well clearly the problem is we need more Tom Lanes. -- greg
Greg Stark wrote: > Bruno Wolff III <bruno@wolff.to> writes: > > >>I have seen exactly this happen a number of times over the last several >>years. However there is still only one Tom Lane implementing the >>improvements. > > > Ob: Well clearly the problem is we need more Tom Lanes. > my $pgGuru = "Tom Lane"; my @morepgGurus; my $howmany = 10; while($howmany--) { push @morepgGurus, $pgGuru; } -- Until later, Geoffrey Registered Linux User #108567 Building secure systems in spite of Microsoft
Greg Stark <gsstark@mit.edu> writes: > Ob: Well clearly the problem is we need more Tom Lanes. ObHHGReference: "Haven't you heard? I come in six-packs!" regards, tom lane
> > my $pgGuru = "Tom Lane"; my @morepgGurus; my $howmany = 10; > > while($howmany--) { push @morepgGurus, $pgGuru; } > This is just wrong... -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
"Joshua D. Drake" <jd@commandprompt.com> writes: > > while($howmany--) { push @morepgGurus, $pgGuru; } > > This is just wrong... yeah, it would have been much clearer written as: push @morepgGurus, ($pgGuru)x$howmany; Or at least the perlish: for (1..$howmany) instead of C style while syntax. Ok. I stop now. -- greg
>Josh Berkus > > Treating the optimizer as a black box is something I'm very > used to from > > other RDBMS. My question is, how can you explicitly > re-write a query now > > to "improve" it? If there's no way of manipulating queries without > > actually re-writing the optimizer, we're now in a position where we > > aren't able to diagnose when the optimizer isn't working > effectively. > > Well, there is ... all of the various query cost parameters. They are very blunt instruments for such a delicate task. Surely someone of your experience might have benefit from something more? My feeling is, I would, though I want those tools as *a developer* rather than for tuning specific queries for people, which is always so sensitive to upgrades etc. > But, ultimately, improvements on the planner are still > bottlenecked by having > only one developer actually hacking the changes. > Do we have a clear list of optimizations we'd like to be working on? The TODO items aren't very related to specific optimizations... The only ones I was aware of was deferred subselect evaluation for DBT-3. ...sounds like there's more to discuss here, so I'll duck out now and get back to my current project... Best Regards, Simon Riggs