Thread: Why does splitting $PGDATA and xlog yield a performance benefit?

Why does splitting $PGDATA and xlog yield a performance benefit?

From

David Kerr

Date:

25 August 2015, 17:08:53

Howdy All,

For a very long time I've held the belief that splitting PGDATA and xlog on linux systems fairly universally gives a
decentperformance benefit for many common workloads. 
(i've seen up to 20% personally).

I was under the impression that this had to do with regular fsync()'s from the WAL
interfearing with and over-reaching writing out the filesystem buffers.

Basically, I think i was conflating fsync() with sync().

So if it's not that, then that just leaves bandwith (ignoring all of the other best practice reasons for reliablity,
etc.).So, in theory if you're not swamping your disk I/O then you won't really benefit from relocating your XLOGs. 

However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of your
I/Obandwith). 

Am I missing something?

Thanks

Re: Why does splitting $PGDATA and xlog yield a performance benefit?

From

Andomar

Date:

25 August 2015, 17:16:43

> However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of
yourI/O bandwith). 
>
> Am I missing something?
>
Two things I can think of:

Transaction writes are entirely sequential.  If you have disks assigned
for just this purpose, then the heads will always be in the right spot,
and the writes go through more quickly.

A database server process waits until the transaction logs are written
and then returns control to the client. The data writes can be done in
the background while the client goes on to do other things.  Splitting
up data and logs mean that there is less chance the disk controller will
cause data writes to interfere with log files.

Kind regards,
Andomar

Re: Why does splitting $PGDATA and xlog yield a performance benefit?

From

Bill Moran

Date:

25 August 2015, 17:45:40

On Tue, 25 Aug 2015 10:08:48 -0700
David Kerr <dmk@mr-paradox.net> wrote:

> Howdy All,
>
> For a very long time I've held the belief that splitting PGDATA and xlog on linux systems fairly universally gives a
decentperformance benefit for many common workloads. 
> (i've seen up to 20% personally).
>
> I was under the impression that this had to do with regular fsync()'s from the WAL
> interfearing with and over-reaching writing out the filesystem buffers.
>
> Basically, I think i was conflating fsync() with sync().
>
> So if it's not that, then that just leaves bandwith (ignoring all of the other best practice reasons for reliablity,
etc.).So, in theory if you're not swamping your disk I/O then you won't really benefit from relocating your XLOGs. 

Disk performance can be a bit more complicated than just "swamping." Even if
you're not maxing out the IO bandwidth, you could be getting enough that some
writes are waiting on other writes before they can be processed. Consider the
fact that old-style ethernet was only able to hit ~80% of its theoretical
capacity in the real world, because the chance of collisions increased with
the amount of data, and each collision slowed down the overall transfer speed.
Contrasted with modern ethernet that doesn't do collisions, you can get much
closer to 100% of the rated bandwith because the communications are effectively
partitioned from each other.

In the worst case scenerion, if two processes (due to horrible luck) _always_
try to write at the same time, the overall responsiveness will be lousy, even
if the bandwidth usage is only a small percent of the available. Of course,
that worst case doesn't happen in actual practice, but as the usage goes up,
the chance of hitting that interference increases, and the effective response
goes down, even when there's bandwidth still available.

Separate the competing processes, and the chance of conflict is 0. So your
responsiveness is pretty much at best-case all the time.

> However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of
yourI/O bandwith). 
>
> Am I missing something?

--
Bill Moran

Re: Why does splitting $PGDATA and xlog yield a performance benefit?

From

David Kerr

Date:

25 August 2015, 17:54:12

On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote:
> >However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of
yourI/O bandwith). 
> >
> >Am I missing something?
> >
> Two things I can think of:
>
> Transaction writes are entirely sequential.  If you have disks
> assigned for just this purpose, then the heads will always be in the
> right spot, and the writes go through more quickly.
>
> A database server process waits until the transaction logs are
> written and then returns control to the client. The data writes can
> be done in the background while the client goes on to do other
> things.  Splitting up data and logs mean that there is less chance
> the disk controller will cause data writes to interfere with log
> files.
>
> Kind regards,
> Andomar
>

hmm, yeah those are both what I'd lump into "I/O bandwith".
If your disk subsystem is fast enough, or you're on a RAIDd SAN
or EBS you'd either overcome that, or not neccssarily be able to.

Re: Why does splitting $PGDATA and xlog yield a performance benefit?

From

David Kerr

Date:

25 August 2015, 18:14:47

> On Aug 25, 2015, at 10:45 AM, Bill Moran <wmoran@potentialtech.com> wrote:
>
> On Tue, 25 Aug 2015 10:08:48 -0700
> David Kerr <dmk@mr-paradox.net> wrote:
>
>> Howdy All,
>>
>> For a very long time I've held the belief that splitting PGDATA and xlog on linux systems fairly universally gives a
decentperformance benefit for many common workloads. 
>> (i've seen up to 20% personally).
>>
>> I was under the impression that this had to do with regular fsync()'s from the WAL
>> interfearing with and over-reaching writing out the filesystem buffers.
>>
>> Basically, I think i was conflating fsync() with sync().
>>
>> So if it's not that, then that just leaves bandwith (ignoring all of the other best practice reasons for reliablity,
etc.).So, in theory if you're not swamping your disk I/O then you won't really benefit from relocating your XLOGs. 
>
> Disk performance can be a bit more complicated than just "swamping." Even if

Funny, on revision of my question, I left out basically that exact line for simplicity sake. =)

> you're not maxing out the IO bandwidth, you could be getting enough that some
> writes are waiting on other writes before they can be processed. Consider the
> fact that old-style ethernet was only able to hit ~80% of its theoretical
> capacity in the real world, because the chance of collisions increased with
> the amount of data, and each collision slowed down the overall transfer speed.
> Contrasted with modern ethernet that doesn't do collisions, you can get much
> closer to 100% of the rated bandwith because the communications are effectively
> partitioned from each other.
>
> In the worst case scenerion, if two processes (due to horrible luck) _always_
> try to write at the same time, the overall responsiveness will be lousy, even
> if the bandwidth usage is only a small percent of the available. Of course,
> that worst case doesn't happen in actual practice, but as the usage goes up,
> the chance of hitting that interference increases, and the effective response
> goes down, even when there's bandwidth still available.
>
> Separate the competing processes, and the chance of conflict is 0. So your
> responsiveness is pretty much at best-case all the time.

Understood. Now in my previous delve into this issue, I showed minimal/no disk queuing, the SAN showed nothing on it's
queuesand no retries. (of course #NeverTrustTheSANGuy) but I still yielded a 20% performance increase by splitting the
WALand $PGDATA 

But that's besides the point and my data on that environment is long gone.

I'm content to leave this at "I/O is complicated" I just wanted to make sure that i wasn't correct but for a slightly
wrongreason. 

Thanks!

Re: Why does splitting $PGDATA and xlog yield a performance benefit?

From

Gavin Flower

Date:

25 August 2015, 20:32:12

On 26/08/15 05:54, David Kerr wrote:
> On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote:
>>> However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of
yourI/O bandwith). 
>>>
>>> Am I missing something?
>>>
>> Two things I can think of:
>>
>> Transaction writes are entirely sequential.  If you have disks
>> assigned for just this purpose, then the heads will always be in the
>> right spot, and the writes go through more quickly.
>>
>> A database server process waits until the transaction logs are
>> written and then returns control to the client. The data writes can
>> be done in the background while the client goes on to do other
>> things.  Splitting up data and logs mean that there is less chance
>> the disk controller will cause data writes to interfere with log
>> files.
>>
>> Kind regards,
>> Andomar
>>
> hmm, yeah those are both what I'd lump into "I/O bandwith".
> If your disk subsystem is fast enough, or you're on a RAIDd SAN
> or EBS you'd either overcome that, or not neccssarily be able to.
>
>
>
Back when I actually understood the various timings of disc accessing on
a MainFrame system, back in the 1980's (disc layout & accessing, is way
more complicated now!), I found that there was a considerable difference
between mainly sequential & mostly random access - easily greater than a
factor of 5 (from memory) in terms of throughput.

Considering the time to move heads between tracks and rotational latency
(caused by not reading sequential blocks on the same track).  There are
other complications, which I have glossed over!


Cheers,
Gavin

Re: Why does splitting $PGDATA and xlog yield a performance benefit?

From

Joseph Kregloh

Date:

25 August 2015, 20:36:57

On Tue, Aug 25, 2015 at 4:31 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

On 26/08/15 05:54, David Kerr wrote:
On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote:
However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of your I/O bandwith).

Am I missing something?

Two things I can think of:

Transaction writes are entirely sequential. If you have disks
assigned for just this purpose, then the heads will always be in the
right spot, and the writes go through more quickly.

A database server process waits until the transaction logs are
written and then returns control to the client. The data writes can
be done in the background while the client goes on to do other
things. Splitting up data and logs mean that there is less chance
the disk controller will cause data writes to interfere with log
files.

Kind regards,
Andomar

hmm, yeah those are both what I'd lump into "I/O bandwith".
If your disk subsystem is fast enough, or you're on a RAIDd SAN
or EBS you'd either overcome that, or not neccssarily be able to.

Back when I actually understood the various timings of disc accessing on a MainFrame system, back in the 1980's (disc layout & accessing, is way more complicated now!), I found that there was a considerable difference between mainly sequential & mostly random access - easily greater than a factor of 5 (from memory) in terms of throughput.

Considering the time to move heads between tracks and rotational latency (caused by not reading sequential blocks on the same track). There are other complications, which I have glossed over!

It can go even further now with the use of SSDs. You can put the xlogs on an SSD and the rest of the database on a mechanical drive. Same can be said about partitions, you can place the most accessed partition on an SSD and the rest of the db on a mechanical drive.

-Joseph Kregloh

Cheers,
Gavin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general