Re: Plan for relatively simple query seems to be very inefficient - Mailing list pgsql-performance

From Arjen van der Meijden
Subject Re: Plan for relatively simple query seems to be very inefficient
Date
Msg-id 425423A9.9000504@vulcanus.its.tudelft.nl
Whole thread Raw
In response to Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Plan for relatively simple query seems to be very inefficient  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 6-4-2005 19:42, Tom Lane wrote:
> Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl> writes:
>
>>I noticed on a forum a query taking a surprisingly large amount of time
>>in MySQL. Of course I wanted to prove PostgreSQL 8.0.1 could do it much
>>better. To my surprise PostgreSQL was ten times worse on the same
>>machine! And I don't understand why.
>
>
> Wrong index ... what you probably could use here is an index on
> data_main.range, so that the query could run with postcodes as the
> outer side.  I get such a plan by default with empty tables:
>
>  Aggregate  (cost=99177.80..99177.80 rows=1 width=0)
>    ->  Nested Loop  (cost=0.00..98021.80 rows=462400 width=0)
>          ->  Seq Scan on postcodes p  (cost=0.00..30.40 rows=2040 width=4)
>          ->  Index Scan using rangei on data_main dm  (cost=0.00..44.63 rows=227 width=2)
>                Index Cond: ((dm.range >= "outer".range_from) AND (dm.range <= "outer".range_till))
>
> but I'm not sure if the planner would prefer it with the tables loaded
> up.  (It might not be the right thing anyway ... but seems worth
> trying.)

No it didn't prefer it.

> Given the relatively small size of the postcodes table, and the fact
> that each data_main row seems to join to about one postcodes row,
> it's possible that what the planner did for you was actually the
> optimal thing anyhow.  I'm not sure that any range-capable index would
> be faster than just scanning through 160 entries in memory ...
>
>             regards, tom lane

Yep, there is only one or in corner cases two postcode-ranges per
postcode. Actually it should be only one, but my generated data is not
perfect.
But the sequential scan per record is not really what surprises me,
especially since the postcode table is only two pages of data, I didn't
really expect otherwise.
It is the fact that it takes 22 seconds that surprises me. Especially
since  the two other examples on the same data which consider about the
same amount of records per table/record only take 1.4 and 0.18 seconds.

Best regards,

Arjen

pgsql-performance by date:

Previous
From: "Dave Held"
Date:
Subject: Re: Plan for relatively simple query seems to be very inefficient
Next
From: Tom Lane
Date:
Subject: Re: Plan for relatively simple query seems to be very inefficient