Home > mailing lists

Re: how to change the provoke table in hash join - Mailing list pgsql-performance

From	Jeff Janes
Subject	Re: how to change the provoke table in hash join
Date	September 11, 2014 21:09:32
Msg-id	CAMkU=1z3j2DAzkQ4tSDYNkZR0HX1Z8Z9_u_fMpJ9tg3myq5gCw@mail.gmail.com Whole thread Raw
In response to	Re: how to change the provoke table in hash join (Matheus de Oliveira <matioli.matheus@gmail.com>)
Responses	Re: how to change the provoke table in hash join ("Huang, Suya" <Suya.Huang@au.experian.com>)
List	pgsql-performance

Tree view

On Thu, Sep 11, 2014 at 7:09 AM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote:

On Wed, Sep 10, 2014 at 10:05 PM, Huang, Suya <Suya.Huang@au.experian.com> wrote:
--plan 1, 10 seconds were spent on sequential scan on term_weekly table.

dev=# explain analyze select distinct cs_id from lookup_weekly n inner join term_weekly s on s.b_id=n.b_id and s.date=n.date where term in ('cat'::text);

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2100211.06..2100211.11 rows=5 width=4) (actual time=27095.470..27095.487 rows=138 loops=1)...

--plan 2, only 1 second spent on index scan of term_weekly table, however, as it selects the big table to do the hashing, it takes 22 seconds for the hash to complete. The advantage get from index has been totally lost because of this join order.

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1429795.17..1429795.22 rows=5 width=4) (actual time=22991.289..22991.307 rows=138 loops=1)...

Am I reading something wrong here? I haven't looked all the plan, but the second is faster (overall), so why do you think you need a hint or change what the planner choose? For me looks like using the index is the best for this situation. Could you try running this multiple times and taking the min/max/avg time of both?

The difference in time could be a caching effect, not a reproducible difference.

The 2nd plan uses 3GB of memory, and there might be better uses for that memory.

Currently memory is un-costed, other than "cliff costing" once you thinks it will exceed work_mem, which I think is a problem. Just because I will let you use 4GB of memory if you will really benefit from it, doesn't mean you should use 4GB gratuitously.

Suya, what happens if you lower work_mem setting? Does it revert to the plan you want?

Cheers,

Jeff

pgsql-performance by date:

From: Matheus de Oliveira
Date: 11 September 2014, 17:10:20
Subject: Re: how to change the provoke table in hash join

From: "Huang, Suya"
Date: 12 September 2014, 05:27:40
Subject: weird execution plan

Re: how to change the provoke table in hash join - Mailing list pgsql-performance

Previous

Next