Thread: How to avoid hashjoin and mergejoin

How to avoid hashjoin and mergejoin

From

"Carlo Stonebanks"

Date:

01 November 2007, 18:27:11

I am comparing the same query on two different PG 8.2 servers, one Linux
(8GB RAM) and one Windows (32GB RAM). Both have similar drives and CPU's.

The Windows posgrestsql.config is pretty well tuned but it looks like
someone had wiped out the Linux config so the default one was re-installed.
All performance-related memory allocation values seem to be set to the
defaults, but mods have been made: max_connections = 100 and shared_buffers
= 32MB.

The performance for this query is terrible on the Linux server, and good on
the Windows server - presumably because the original Linux PG config has
been lost. This query requires: that "set enable_seqscan to 'off';"

Still, the Linux server did not create the same, fast plan as the Windows
server. In order to get the same plan we had to:

set enable_hashjoin to 'off';
set enable_mergejoin to 'off';

The plans were now similar, using nested loops and bitmapped heap scans. Now
the Linux query outperformed the Windows query.

Question: Can anyone tell me which config values would have made PG select
hash join and merge joins when the nested loop/bitmap heap scan combination
was faster?

Carlo

Re: How to avoid hashjoin and mergejoin

From

"Scott Marlowe"

Date:

01 November 2007, 18:59:31

On 11/1/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
> I am comparing the same query on two different PG 8.2 servers, one Linux
> (8GB RAM) and one Windows (32GB RAM). Both have similar drives and CPU's.
>
> The Windows posgrestsql.config is pretty well tuned but it looks like
> someone had wiped out the Linux config so the default one was re-installed.
> All performance-related memory allocation values seem to be set to the
> defaults, but mods have been made: max_connections = 100 and shared_buffers
> = 32MB.
>
> The performance for this query is terrible on the Linux server, and good on
> the Windows server - presumably because the original Linux PG config has
> been lost. This query requires: that "set enable_seqscan to 'off';"

Have you run analyze on the server yet?

A few general points on performance tuning.  With 8.2 you should set
shared_buffers to a pretty big chunk of memory on linux, up to 25% or
so.  That means 32 Meg shared buffers is REAL low for a linux server.
Try running anywhere from 512Meg up to 1Gig for starters and see if
that helps too.  Also turn up work_mem to something like 16 to 32 meg
then restart the server after making these changes.

Then give us the explain analyze output with all the enable_xxx set to ON.

summary: analyze, increase shared_buffers and work_mem, give us explain analyze.

Re: How to avoid hashjoin and mergejoin

From

Tom Lane

Date:

01 November 2007, 19:00:25

"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> Still, the Linux server did not create the same, fast plan as the Windows
> server. In order to get the same plan we had to:

> set enable_hashjoin to 'off';
> set enable_mergejoin to 'off';

This is just about never the appropriate way to solve a performance
problem, as it will inevitably create performance problems in other
queries.

What I'm wondering is whether the tables have been ANALYZEd recently,
and also whether there are any nondefault postgresql.conf settings in
use on the other server.

            regards, tom lane

Re: How to avoid hashjoin and mergejoin

From

"Carlo Stonebanks"

Date:

02 November 2007, 01:11:52

<<This is just about never the appropriate way to solve a performance
problem, as it will inevitably create performance problems in other
queries.
>>

In this particular example, this was done to "force" the query on the Linux
box to use the same plan as on the Windows box to prove that - once the
correct plan was chosen - the Linux box could at least MATCH the Windows
box.

That being said, I should mention this: we take certain "core" queries that
we know are essential and embed them in a plpgsql SRF's that save the
various settings, modify them as required for the query, then restore them
after the rows are returned.

Does this address the problem you mentioned?

<< What I'm wondering is whether the tables have been ANALYZEd recently,>>

This is SUPPOSED to be done after a restore - but I will verify, thanks for
the reminder.

<< and also whether there are any nondefault postgresql.conf settings in
use on the other server.>>

Definitely - this is what alerted me to the fact that there was something
suspicious. We try to optimize our memory settings (based on various tuning
docs, advice from here, and good old trial-and-error). Since the new config
had barely any changes, I knew something was wrong.

Carlo

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: November 1, 2007 5:42 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to avoid hashjoin and mergejoin

"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> Still, the Linux server did not create the same, fast plan as the Windows
> server. In order to get the same plan we had to:

> set enable_hashjoin to 'off';
> set enable_mergejoin to 'off';

This is just about never the appropriate way to solve a performance
problem, as it will inevitably create performance problems in other
queries.

What I'm wondering is whether the tables have been ANALYZEd recently,
and also whether there are any nondefault postgresql.conf settings in
use on the other server.

            regards, tom lane

Re: How to avoid hashjoin and mergejoin

From

"Carlo Stonebanks"

Date:

09 November 2007, 02:55:11

Larry,

Considering these recommendations, let's try setting shared_buffers to 2GB and work_mem to 16MB. The thing is that work_mem is per connection, and if we get too aggressive and we get a lot of simultaneous users, we can potentially eat up a lot of memory.

So 2GB + (100 * 16MB) = 3.6GB total RAM eaten up under peak load for these two values alone.

If we wanted to get more aggressive, we can raise work_mem.

Carlo

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: November 1, 2007 5:39 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to avoid hashjoin and mergejoin

On 11/1/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:

> I am comparing the same query on two different PG 8.2 servers, one Linux

> (8GB RAM) and one Windows (32GB RAM). Both have similar drives and CPU's.

> The Windows posgrestsql.config is pretty well tuned but it looks like

> someone had wiped out the Linux config so the default one was re-installed.

> All performance-related memory allocation values seem to be set to the

> defaults, but mods have been made: max_connections = 100 and shared_buffers

> = 32MB.

> The performance for this query is terrible on the Linux server, and good on

> the Windows server - presumably because the original Linux PG config has

> been lost. This query requires: that "set enable_seqscan to 'off';"

Have you run analyze on the server yet?

A few general points on performance tuning. With 8.2 you should set

shared_buffers to a pretty big chunk of memory on linux, up to 25% or

so. That means 32 Meg shared buffers is REAL low for a linux server.

Try running anywhere from 512Meg up to 1Gig for starters and see if

that helps too. Also turn up work_mem to something like 16 to 32 meg

then restart the server after making these changes.

Then give us the explain analyze output with all the enable_xxx set to ON.

summary: analyze, increase shared_buffers and work_mem, give us explain analyze.