Plan for relatively simple query seems to be very inefficient - Mailing list pgsql-performance
From | Arjen van der Meijden |
---|---|
Subject | Plan for relatively simple query seems to be very inefficient |
Date | |
Msg-id | 425413D3.5030304@vulcanus.its.tudelft.nl Whole thread Raw |
Responses |
Re: Plan for relatively simple query seems to be very inefficient
Re: Plan for relatively simple query seems to be very inefficient Re: Plan for relatively simple query seems to be very inefficient |
List | pgsql-performance |
Hi list, I noticed on a forum a query taking a surprisingly large amount of time in MySQL. Of course I wanted to prove PostgreSQL 8.0.1 could do it much better. To my surprise PostgreSQL was ten times worse on the same machine! And I don't understand why. I don't really need this query to be fast since I don't use it, but the range-thing is not really an uncommon query I suppose. So I'm wondering why it is so slow and this may point to a wrong plan being chosen or generated. Here are table definitions: Table "public.postcodes" Column | Type | Modifiers -------------+---------------+----------- postcode_id | smallint | not null range_from | smallint | range_till | smallint | Indexes: "postcodes_pkey" PRIMARY KEY, btree (postcode_id) "range" UNIQUE, btree (range_from, range_till) Table "public.data_main" Column | Type | Modifiers --------+----------+----------- userid | integer | not null range | smallint | Indexes: "data_main_pkey" PRIMARY KEY, btree (userid) And here's the query I ran: SELECT COUNT(*) FROM data_main AS dm, postcodes AS p WHERE dm.range BETWEEN p.range_from AND p.range_till QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=332586.85..332586.85 rows=1 width=0) (actual time=22712.038..22712.039 rows=1 loops=1) -> Nested Loop (cost=3.76..328945.96 rows=1456356 width=0) (actual time=0.054..22600.826 rows=82688 loops=1) Join Filter: (("outer".range >= "inner".range_from) AND ("outer".range <= "inner".range_till)) -> Seq Scan on data_main dm (cost=0.00..1262.20 rows=81920 width=2) (actual time=0.020..136.930 rows=81920 loops=1) -> Materialize (cost=3.76..5.36 rows=160 width=4) (actual time=0.001..0.099 rows=160 loops=81920) -> Seq Scan on postcodes p (cost=0.00..3.60 rows=160 width=4) (actual time=0.010..0.396 rows=160 loops=1) Total runtime: 22712.211 ms When I do something completely bogus, which will result in coupling the data per record from data_main on one record from postcodes, it still not very fast but acceptable: SELECT COUNT(*) FROM data_main AS dm, postcodes AS p WHERE dm.range / 10 = p.postcode_id QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10076.98..10076.98 rows=1 width=0) (actual time=1456.016..1456.017 rows=1 loops=1) -> Merge Join (cost=8636.81..9913.13 rows=65537 width=0) (actual time=1058.105..1358.571 rows=81920 loops=1) Merge Cond: ("outer".postcode_id = "inner"."?column2?") -> Index Scan using postcodes_pkey on postcodes p (cost=0.00..5.76 rows=160 width=2) (actual time=0.034..0.507 rows=160 loops=1) -> Sort (cost=8636.81..8841.61 rows=81920 width=2) (actual time=1057.698..1169.879 rows=81920 loops=1) Sort Key: (dm.range / 10) -> Seq Scan on data_main dm (cost=0.00..1262.20 rows=81920 width=2) (actual time=0.020..238.886 rows=81920 loops=1) Total runtime: 1461.156 ms Doing something similarily bogus, but with less results is much faster, even though it should have basically the same plan: SELECT COUNT(*) FROM data_main AS dm, postcodes AS p WHERE dm.range = p.postcode_id QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2138.63..2138.63 rows=1 width=0) (actual time=180.667..180.668 rows=1 loops=1) -> Hash Join (cost=4.00..2087.02 rows=20642 width=0) (actual time=180.645..180.645 rows=0 loops=1) Hash Cond: ("outer".range = "inner".postcode_id) -> Seq Scan on data_main dm (cost=0.00..1262.20 rows=81920 width=2) (actual time=0.005..105.548 rows=81920 loops=1) -> Hash (cost=3.60..3.60 rows=160 width=2) (actual time=0.592..0.592 rows=0 loops=1) -> Seq Scan on postcodes p (cost=0.00..3.60 rows=160 width=2) (actual time=0.025..0.349 rows=160 loops=1) Total runtime: 180.807 ms (7 rows) If you like to toy around with the datasets on your heavily optimized postgresql-installs, let me know. The data is just generated for testing-purposes and I'd happily send a copy to anyone interested. Best regards, Arjen van der Meijden
pgsql-performance by date: