7.4beta2 vs 7.3.3 - Mailing list pgsql-hackers
From | Gaetano Mendola |
---|---|
Subject | 7.4beta2 vs 7.3.3 |
Date | |
Msg-id | 3F6A0775.5040600@bigfoot.com Whole thread Raw |
Responses |
Re: 7.4beta2 vs 7.3.3
|
List | pgsql-hackers |
Hi all, my impression on 7.4 performances: ===== Good news ===== How expected the use of clausole 'IN' with Postgres 7.4beta2 now is really fast ( see the result below ) ===== Bad news ===== I posted time ago about a slow query: SELECT ul.* FROM user_logs ul, user_data ud, class_default cd WHERE ul.id_user = ud.id_user AND ud.id_class = cd.id_class AND cd.id_provider = 39; these are the information about the tables involved: user_logs: ~1.5 Milion rows user_data: ~10000 rows class_default ~100 rows and I found also that was better do this query in three steps: SELECT id_class FROM class_default WHERE id_provider = 39; SELECT id_user FROM user_data WHERE id_class IN ( 48 ); <= result of query above SELECT * FROM user_logs WHERE id_user IN ( 11126, ...., 11769 ); <= 43 values result This last query runs for 10.30 msec !!! Here the comparison between Postgres7.3.3 and Postgres7.4beta2 for that original query: Postgres 7.3.3 test=# explain analyze select ul.* test-# from user_logs ul, test-# user_data ud, test-# class_default cd test-# where ul.id_user = ud.id_user and test-# ud.id_class = cd.id_class and test-# cd.id_provider = 39; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- HashJoin (cost=325.79..36234.20 rows=41957 width=60) (actual time=6151.29..7022.29 rows=702 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530 width=48) (actual time=0.02..5427.07 rows=1426530 loops=1) -> Hash (cost=324.97..324.97 rows=329 width=12) (actual time=320.97..320.97 rows=0 loops=1) -> Nested Loop (cost=0.00..324.97 rows=329 width=12) (actual time=0.24..320.89 rows=43 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.43 rows=1 width=4) (actual time=0.05..0.07 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..318.55 rows=400 width=8) (actual time=0.19..320.72 rows=43 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 7023.15 msec (10 rows) Postgres 7.4beta2 test=# explain analyze select ul.* test-# from user_logs ul, test-# user_data ud, test-# class_default cd test-# where ul.id_user = ud.id_user and test-# ud.id_class = cd.id_class and test-# cd.id_provider = 39; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- HashJoin (cost=322.94..36127.70 rows=42081 width=47) (actual time=2629.84..2990.00 rows=702 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530 width=47) (actual time=0.03..1738.65 rows=1426530 loops=1) -> Hash (cost=322.12..322.12 rows=330 width=4) (actual time=0.78..0.78 rows=0 loops=1) -> Nested Loop (cost=0.00..322.12 rows=330 width=4) (actual time=0.19..0.71 rows=43 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.43 rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 2990.70 msec (10 rows) The performance are really improved but not yet 10 msecs. I tried nesting the 3 queries that I was speak about, I did it only with 7.4 due the fact with 7.3.3 I never seen the result. test=# EXPLAIN ANALYZE SELECT * test-# FROM user_logs test-# WHERE id_user in ( test(# SELECT id_user test(# FROM user_data test(# WHERE id_class in ( test(# SELECT id_class FROM class_default WHERE id_provider = 39 test(# ) test(# ); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- HashIN Join (cost=322.94..36127.70 rows=42081 width=47) (actual time=2626.20..2985.53 rows=702 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs (cost=0.00..28251.30rows=1426530 width=47) (actual time=0.03..1731.59 rows=1426530 loops=1) -> Hash (cost=322.12..322.12 rows=330 width=4) (actual time=0.80..0.80 rows=0 loops=1) -> Nested Loop (cost=1.43..322.12 rows=330 width=4) (actual time=0.22..0.72 rows=43 loops=1) -> HashAggregate (cost=1.43..1.43 rows=1 width=4) (actual time=0.10..0.10 rows=1 loops=1) -> Seq Scan on class_default (cost=0.00..1.43 rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data (cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43 loops=1) Index Cond: (user_data.id_class = "outer".id_class) Total runtime: 2986.33 msec (11 rows) How you can see with 7.4 the two queries ( the original with the join and with the IN nested ) are performing at the same way. Regards Gaetano Mendola
pgsql-hackers by date: