Re: Simple join doesn't use index - Mailing list pgsql-performance

From Alex Vinnik
Subject Re: Simple join doesn't use index
Date
Msg-id CALd8TVEo3Gude58qVjk6EUHoWTiQ2969pfbQsugx3gGbpwruNA@mail.gmail.com
Whole thread Raw
In response to Re: Simple join doesn't use index  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Simple join doesn't use index
List pgsql-performance
Setting work_mem to 64MB triggers in memory sort but look what happens with views look up. PG goes through all records there "Seq Scan on views" instead of using visitor_id index and I have only subset of real data to play around. Can imagine what cost would be running it against bigger dataset. Something else is in play here that makes planner to take this route. Any ideas how to gain more insight into planner's inner workings?

QUERY PLAN
Sort  (cost=960280.46..960844.00 rows=225414 width=8) (actual time=23328.040..23537.126 rows=209401 loops=1)
  Sort Key: visits.id, views.id
  Sort Method: quicksort  Memory: 15960kB
  ->  Hash Join  (cost=8089.16..940238.66 rows=225414 width=8) (actual time=6622.072..22995.890 rows=209401 loops=1)
        Hash Cond: (views.visit_id = visits.id)
        ->  Seq Scan on views  (cost=0.00..831748.05 rows=8724205 width=8) (actual time=0.093..10552.306 rows=6995893 loops=1)
        ->  Hash  (cost=6645.51..6645.51 rows=115492 width=4) (actual time=307.389..307.389 rows=131311 loops=1)
              Buckets: 16384  Batches: 1  Memory Usage: 4617kB
              ->  Index Scan using visits_created_at_index on visits  (cost=0.00..6645.51 rows=115492 width=4) (actual time=0.040..163.151 rows=131311 loops=1)
                    Index Cond: ((created_at >= '2013-01-15 00:00:00'::timestamp without time zone) AND (created_at < '2013-01-16 00:00:00'::timestamp without time zone))
Total runtime: 23733.045 ms


On Mon, Jan 28, 2013 at 8:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Jan 28, 2013 at 5:43 PM, Alex Vinnik <alvinnik.g@gmail.com> wrote:
> It sure turned out that default settings are not a good fit. Setting
> random_page_cost to 1.0 made query to run in 2.6 seconds and I clearly see
> that indexes are being used in explain plan and IO utilization is close to
> 0.
>
> QUERY PLAN
> Sort  (cost=969787.23..970288.67 rows=200575 width=8) (actual
> time=2176.045..2418.162 rows=241238 loops=1)
>   Sort Key: visits.id, views.id
>   Sort Method: external sort  Disk: 4248kB
>   ->  Nested Loop  (cost=0.00..950554.81 rows=200575 width=8) (actual
> time=0.048..1735.357 rows=241238 loops=1)
>         ->  Index Scan using visits_created_at_index on visits
> (cost=0.00..5459.16 rows=82561 width=4) (actual time=0.032..178.591
> rows=136021 loops=1)
>               Index Cond: ((created_at >= '2012-12-15 00:00:00'::timestamp
> without time zone) AND (created_at < '2012-12-16 00:00:00'::timestamp
> without time zone))
>         ->  Index Scan using views_visit_id_index on views
> (cost=0.00..11.33 rows=12 width=8) (actual time=0.004..0.006 rows=2
> loops=136021)
>               Index Cond: (visit_id = visits.id)
> Total runtime: 2635.169 ms
>
> However I noticed that sorting is done using disk("external sort  Disk:
> 4248kB") which prompted me to take a look at work_mem. But it turned out
> that small increase to 4MB from default 1MB turns off index usage and query
> gets x10 slower. IO utilization jumped to 100% from literally nothing. so
> back to square one...
>
> QUERY PLAN
> Sort  (cost=936642.75..937144.19 rows=200575 width=8) (actual
> time=33200.762..33474.443 rows=241238 loops=1)
>   Sort Key: visits.id, views.id
>   Sort Method: external merge  Disk: 4248kB
>   ->  Hash Join  (cost=6491.17..917410.33 rows=200575 width=8) (actual
> time=7156.498..32723.221 rows=241238 loops=1)
>         Hash Cond: (views.visit_id = visits.id)
>         ->  Seq Scan on views  (cost=0.00..832189.95 rows=8768395 width=8)
> (actual time=0.100..12126.342 rows=8200704 loops=1)
>         ->  Hash  (cost=5459.16..5459.16 rows=82561 width=4) (actual
> time=353.683..353.683 rows=136021 loops=1)
>               Buckets: 16384  Batches: 2 (originally 1)  Memory Usage:
> 4097kB
>               ->  Index Scan using visits_created_at_index on visits
> (cost=0.00..5459.16 rows=82561 width=4) (actual time=0.032..175.051
> rows=136021 loops=1)
>                     Index Cond: ((created_at >= '2012-12-15
> 00:00:00'::timestamp without time zone) AND (created_at < '2012-12-16
> 00:00:00'::timestamp without time zone))
> Total runtime: 33698.000 ms
>
> Basically PG is going through all views again and not using "Index Scan
> using views_visit_id_index on views". Looks like setting work_mem confuses
> planner somehow. Any idea what can be done to do sorting in memory. I
> suspect it should make query even more faster. Thanks -Alex

hm, what happens when you set work_mem a fair amount higher? (say,
64mb).   You can set it for one session by going "set work_mem='64mb';
" as opposed to the entire server in postgresql.conf.

merlin

pgsql-performance by date:

Previous
From: Alex Vinnik
Date:
Subject: Re: Simple join doesn't use index
Next
From: Filip Rembiałkowski
Date:
Subject: Re: Simple join doesn't use index