Re: Simple join doesn't use index - Mailing list pgsql-performance

From Alex Vinnik
Subject Re: Simple join doesn't use index
Date
Msg-id CALd8TVHs9UNrQgX5+DF9hrVjTv_6OpAYi-A29sFjhATrXa01Xw@mail.gmail.com
Whole thread Raw
In response to Re: Simple join doesn't use index  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
index definition
CREATE INDEX views_visit_id_visit_buoy_index ON views USING btree (visit_id, visit_buoy)



On Tue, Jan 29, 2013 at 1:35 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Jan 29, 2013 at 12:59 PM, Alex Vinnik <alvinnik.g@gmail.com> wrote:
>
>
>
> On Tue, Jan 29, 2013 at 11:39 AM, Ben Chobot <bench@silentmedia.com> wrote:
>>
>> On Jan 29, 2013, at 6:24 AM, Alex Vinnik wrote:
>>
>>> random_page_cost=1 might be not what you really want.
>>> it would mean that random reads are as fast as as sequential reads, which
>>> probably is true only for SSD
>>
>> What randon_page_cost would be more appropriate for EC2 EBS Provisioned
>> volume that can handle 2,000 IOPS?
>>
>>
>> For EC2 Provisioned IOPS volumes - not standard EBS - random_page_cost=1
>> is exactly what you want.
>>
> Well... after some experimentation it turned out that random_page_cost=0.6
> gives me fast query
>
> QUERY PLAN
> Sort  (cost=754114.96..754510.46 rows=158199 width=8) (actual
> time=1839.324..2035.405 rows=209401 loops=1)
>   Sort Key: visits.id, views.id
>   Sort Method: quicksort  Memory: 15960kB
>   ->  Nested Loop  (cost=0.00..740453.38 rows=158199 width=8) (actual
> time=0.048..1531.592 rows=209401 loops=1)
>         ->  Index Scan using visits_created_at_index on visits
> (cost=0.00..5929.82 rows=115492 width=4) (actual time=0.032..161.488
> rows=131311 loops=1)
>               Index Cond: ((created_at >= '2013-01-15 00:00:00'::timestamp
> without time zone) AND (created_at < '2013-01-16 00:00:00'::timestamp
> without time zone))
>         ->  Index Scan using views_visit_id_index on views  (cost=0.00..6.26
> rows=10 width=8) (actual time=0.003..0.005 rows=2 loops=131311)
>               Index Cond: (visit_id = visits.id)
> Total runtime: 2234.142 ms
>
> random_page_cost=0.7 slows it down 16 times
>
> Sort  (cost=804548.42..804943.92 rows=158199 width=8) (actual
> time=37011.337..37205.449 rows=209401 loops=1)
>   Sort Key: visits.id, views.id
>   Sort Method: quicksort  Memory: 15960kB
>   ->  Merge Join  (cost=15871.37..790886.85 rows=158199 width=8) (actual
> time=35673.602..36714.056 rows=209401 loops=1)
>         Merge Cond: (visits.id = views.visit_id)
>         ->  Sort  (cost=15824.44..16113.17 rows=115492 width=4) (actual
> time=335.486..463.085 rows=131311 loops=1)
>               Sort Key: visits.id
>               Sort Method: quicksort  Memory: 12300kB
>               ->  Index Scan using visits_created_at_index on visits
> (cost=0.00..6113.04 rows=115492 width=4) (actual time=0.034..159.326
> rows=131311 loops=1)
>                     Index Cond: ((created_at >= '2013-01-15
> 00:00:00'::timestamp without time zone) AND (created_at < '2013-01-16
> 00:00:00'::timestamp without time zone))

>         ->  Index Scan using views_visit_id_visit_buoy_index on views
> (cost=0.00..757596.22 rows=6122770 width=8) (actual time=0.017..30765.316
> rows=5145902 loops=1)

Something is awry here. pg is doing an index scan via
views_visit_id_visit_buoy_index with no matching condition.  What's
the definition of that index? The reason why the random_page_cost
adjustment is working is that you are highly penalizing sequential
type scans so that the database is avoiding the merge (sort A, sort B,
stepwise compare).

SQL server is doing a nestloop/index scan, just like the faster pg
plan, but is a bit faster because it's parallelizing.

 merlin

pgsql-performance by date:

Previous
From: Alex Vinnik
Date:
Subject: Re: Simple join doesn't use index
Next
From: Jeff Janes
Date:
Subject: Re: Simple join doesn't use index