Re: Configuration Recommendations - Mailing list pgsql-performance

From Greg Sabino Mullane
Subject Re: Configuration Recommendations
Date
Msg-id 66570518e99910b4b4509564331e87c2@biglumber.com
Whole thread Raw
In response to Re: Configuration Recommendations  (Robert Klemme <shortcutter@googlemail.com>)
Responses Re: Configuration Recommendations  (Robert Klemme <shortcutter@googlemail.com>)
Re: Configuration Recommendations  (Jan Nielsen <jan.sture.nielsen@gmail.com>)
List pgsql-performance
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


>>> Is it established practice in the Postgres world to separate indexes
>>> from tables? I would assume that the reasoning of Richard Foote -
>>> albeit for Oracle databases - is also true for Postgres:
>
>> Yes, it's an established practice. I'd call it something just short of
>> a best practice though, as it really depends on your situation.
>
> What are the benefits?

Disk seeks, basically. Yes, there are a lot of complications regarding
all the various hardware/OS/PG level cachings, but at the end of the
day, it's less work to have each drive concentrate on a single area
(especially as we always require a heap scan at the moment).

>> I also find his examples a bit contrived, and the whole "multi-user"
>> argument irrelevant for common cases.
>
> Why is that?

Because most Postgres servers are dedicated to serving the same data
or sets of data, and the number of "other users" calling ad-hoc queries
against lots of different tables (per his example) is small. So this
sentence just doesn't ring true to me:

    " ... by the time weâve read the index leaf block, processed and
    read all the associated table blocks referenced by the index leaf
    block, the chances of there being no subsequent physical activity
    in the index tablespace due to another user session is virtually
    nil. We would still need to re-scan the disk to physically access
    the next index leaf block (or table block) anyways."

That's certainly not true for Postgres servers, and I doubt if it
is quite that bad on Oracle either.

>> I lean towards using separate tablespaces in Postgres, as the
>> performance outweighs the additional>> complexity.

> What about his argument with regards to access patterns (i.e.
> interleaving index and table access during an index scan)?  Also,
> Shaun's advice to have more spindles available sounds convincing to
> me, too.

I don't buy his arguments. To do so, you'd have to buy a key point:

    "when most physical I/Os in both index and table segments are
     effectively random, single block reads"

They are not; hence, the rest of his argument falls apart. Certainly,
if things were as truly random and uncached as he states, there would
be no benefit to separation.

As far as spindles, yes: like RAM, it's seldom the case to have
too litte :) But as with all things, one should get some benchmarks
on your specific workload before making hardware changes. (Well, RAM
may be an exception to that, up to a point).

>> It's down on the tuning list however: much more important
>> is getting your kernel/volumes configured correctly, allocating
>> shared_buffers sanely, separating pg_xlog, etc.

> That does make a lot of sense.  Separating pg_xlog would probably the
> first thing I'd do especially since the IO pattern is so dramatically
> different from tablespace IO access patterns.

Yep - moving pg_xlog to something optimized for small, constantly
written files is one of the biggest and easiest wins. Other than
fsync = off ;)

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201205151351
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk+yl8YACgkQvJuQZxSWSshB+QCghfweMspFIqmP4rLv6/tcGPot
jscAn1SZAP1/KBcu/FEpWXilSnWjlA6Z
=FX7j
-----END PGP SIGNATURE-----



pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: SSD selection
Next
From: Rosser Schwarz
Date:
Subject: Re: SSD selection