Re: hash join vs nested loop join - Mailing list pgsql-performance

From Huan Ruan
Subject Re: hash join vs nested loop join
Date
Msg-id CAD1stZtH9WbvHnKtcRVD_w5OYabXh9k18vUhygvYBe8yozPk6A@mail.gmail.com
Whole thread Raw
In response to Re: hash join vs nested loop join  ("Kevin Grittner" <kgrittn@mail.com>)
List pgsql-performance
On 21 December 2012 01:06, Kevin Grittner <kgrittn@mail.com> wrote:
Huan Ruan wrote:
> Kevin Grittner wrote:

>> Frankly, at 12 microseconds per matched pair of rows, I think
>> you're doing OK.
>
> This plan is the good one, I want the indexscan nested loop join and this
> is only achieved after making all these costing factors change. Before
> that, it was hash join and was very slow.
>
> However, I'm worried about the config changes being too 'extreme', i.e.
> both sequential I/O and random I/O have the same cost and being only 0.1.
> So, I was more wondering why I have to make such dramatic changes to
> convince the optimiser to use NL join instead of hash join. And also, I'm
> not sure what impact will these changes have on other queries yet. e.g.
> will a query that's fine with hash join now choose NL join and runs slower?

I understand the concern, but PostgreSQL doesn't yet have a knob to
turn for "cache hit ratio". You essentially need to build that into
the page costs. Since your cache hit ratio (between shared buffers
and the OS) is so high, the cost of page access relative to CPU
costs has declined and there isn't any effective difference between
sequential and random access. As the level of caching changes, you
may need to adjust. In one production environment where there was
significant caching, but far enough from 100% to matter, we tested
various configurations and found the fastest plans being chosen
with seq_page_cost = 0.3 and random_page_cost = 0.5. Tune to your
workload.


Thanks Kevin. I think I get some ideas now that I can try on the production server when we switch. 

pgsql-performance by date:

Previous
From: Charles Gomes
Date:
Subject: Re: Performance on Bulk Insert to Partitioned Table
Next
From: Jeff Janes
Date:
Subject: Re: Performance on Bulk Insert to Partitioned Table