7.4beta2 vs 7.3.3 - Mailing list pgsql-hackers

From Gaetano Mendola
Subject 7.4beta2 vs 7.3.3
Date
Msg-id 3F6A0775.5040600@bigfoot.com
Whole thread Raw
Responses Re: 7.4beta2 vs 7.3.3
List pgsql-hackers
Hi all,
my impression on 7.4 performances:


===== Good news =====

How expected the use of clausole 'IN' with Postgres 7.4beta2
now is really fast ( see the result below )


===== Bad news =====

I posted time ago about a slow query:

SELECT ul.*
FROM user_logs ul,     user_data ud,     class_default cd
WHERE ul.id_user = ud.id_user AND      ud.id_class = cd.id_class AND      cd.id_provider = 39;


these are the information about the tables involved:

user_logs:    ~1.5 Milion rows
user_data:    ~10000 rows
class_default ~100 rows

and I found also that was better do this query
in three steps:

SELECT id_class
FROM class_default
WHERE id_provider = 39;

SELECT id_user
FROM user_data
WHERE id_class IN ( 48 ); <= result of query above

SELECT *
FROM user_logs
WHERE id_user IN ( 11126, ...., 11769 );   <= 43 values result


This last query runs for 10.30 msec !!!



Here the comparison between Postgres7.3.3 and
Postgres7.4beta2 for that original query:



Postgres 7.3.3

test=# explain analyze select ul.*
test-# from user_logs ul,
test-#      user_data ud,
test-#      class_default cd
test-# where ul.id_user = ud.id_user and
test-#       ud.id_class = cd.id_class and
test-#       cd.id_provider = 39;
QUERY PLAN 


---------------------------------------------------------------------------------------------------------------------------------------------------------
HashJoin  (cost=325.79..36234.20 rows=41957 width=60) (actual 
 
time=6151.29..7022.29 rows=702 loops=1)   Hash Cond: ("outer".id_user = "inner".id_user)   ->  Seq Scan on user_logs ul
(cost=0.00..28251.30 rows=1426530 
 
width=48) (actual time=0.02..5427.07 rows=1426530 loops=1)   ->  Hash  (cost=324.97..324.97 rows=329 width=12) (actual

time=320.97..320.97 rows=0 loops=1)         ->  Nested Loop  (cost=0.00..324.97 rows=329 width=12) (actual 
time=0.24..320.89 rows=43 loops=1)               ->  Seq Scan on class_default cd  (cost=0.00..1.43 
rows=1 width=4) (actual time=0.05..0.07 rows=1 loops=1)                     Filter: (id_provider = 39)               ->
Index Scan using idx_user_data_class on user_data ud  (cost=0.00..318.55 rows=400 width=8) (actual time=0.19..320.72
rows=43
 
loops=1)                     Index Cond: (ud.id_class = "outer".id_class) Total runtime: 7023.15 msec
(10 rows)


Postgres 7.4beta2

test=# explain analyze select ul.*
test-# from user_logs ul,
test-#      user_data ud,
test-#      class_default cd
test-# where ul.id_user = ud.id_user and
test-#       ud.id_class = cd.id_class and
test-#       cd.id_provider = 39;
QUERY PLAN 


-------------------------------------------------------------------------------------------------------------------------------------------------------
HashJoin  (cost=322.94..36127.70 rows=42081 width=47) (actual 
 
time=2629.84..2990.00 rows=702 loops=1)   Hash Cond: ("outer".id_user = "inner".id_user)   ->  Seq Scan on user_logs ul
(cost=0.00..28251.30 rows=1426530 
 
width=47) (actual time=0.03..1738.65 rows=1426530 loops=1)   ->  Hash  (cost=322.12..322.12 rows=330 width=4) (actual 
time=0.78..0.78 rows=0 loops=1)         ->  Nested Loop  (cost=0.00..322.12 rows=330 width=4) (actual 
time=0.19..0.71 rows=43 loops=1)               ->  Seq Scan on class_default cd  (cost=0.00..1.43 
rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1)                     Filter: (id_provider = 39)               ->
Index Scan using idx_user_data_class on user_data ud  (cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54
rows=43
 
loops=1)                     Index Cond: (ud.id_class = "outer".id_class) Total runtime: 2990.70 msec
(10 rows)



The performance are really improved but not yet 10 msecs.




I tried nesting the 3 queries that I was speak about,
I did it only with 7.4 due the fact with 7.3.3 I never
seen the result.



test=# EXPLAIN ANALYZE SELECT *
test-# FROM user_logs
test-# WHERE id_user in (
test(#    SELECT id_user
test(#    FROM user_data
test(#    WHERE id_class in (
test(#        SELECT id_class FROM class_default WHERE id_provider = 39
test(#    )
test(# );
QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------------------------
HashIN Join  (cost=322.94..36127.70 rows=42081 width=47) (actual 
 
time=2626.20..2985.53 rows=702 loops=1)   Hash Cond: ("outer".id_user = "inner".id_user)   ->  Seq Scan on user_logs
(cost=0.00..28251.30rows=1426530 
 
width=47) (actual time=0.03..1731.59 rows=1426530 loops=1)   ->  Hash  (cost=322.12..322.12 rows=330 width=4) (actual 
time=0.80..0.80 rows=0 loops=1)         ->  Nested Loop  (cost=1.43..322.12 rows=330 width=4) (actual 
time=0.22..0.72 rows=43 loops=1)               ->  HashAggregate  (cost=1.43..1.43 rows=1 width=4) 
(actual time=0.10..0.10 rows=1 loops=1)                     ->  Seq Scan on class_default  (cost=0.00..1.43 
rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1)                           Filter: (id_provider = 39)
  ->  Index Scan using idx_user_data_class on user_data 
 
(cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43 
loops=1)                     Index Cond: (user_data.id_class = "outer".id_class) Total runtime: 2986.33 msec
(11 rows)


How you can see with 7.4 the two queries ( the original with the join
and with the IN nested ) are performing at the same way.



Regards
Gaetano Mendola














pgsql-hackers by date:

Previous
From: "Paulo Scardine"
Date:
Subject: Re: Killing the backend to cancel a long waiting query
Next
From: Tom Lane
Date:
Subject: Re: Killing the backend to cancel a long waiting query