Thread: survey of WAL blocksize changes

survey of WAL blocksize changes

From
Mark Wong
Date:
Hi all,

A long long time ago (in 2004) I ran a series of tests surveying the
results of changing BLCKSZ when it was used for both the WAL logs and
the rest of the database system:

http://archives.postgresql.org/pgsql-hackers/2004-03/msg01194.php

Now more than 5 years later and now being able to set the WAL log and
the rest of the database to different block sizes, I have a set of
test results with DBT-2 showing the effects of changing the WAL log
block size on OLTP transaction throughput on ext2, ranging from 1KB to
64KB:

BS notpm % Change from default
-- ----- ----------1 14673 -4.8%2 15864 2.9%4 15774 2.3%8 15413 (default)
16 16118 4.6%
32 16051 4.1%
64 14874 -3.5%

Pointers to raw data:

BS url
-- ---1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/2
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/4
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/8
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.2/
16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/


It appears for this workload using a 16KB or 32KB gets more than 4%
throughput improvement, but some of that could be noise.  Nothing
quite jaw dropping yet.  It'll be interesting to see if the
combination of changing the table block size can further improve the
performance.  It will probably be interesting to try different
filesystems and filesystem blocksizes too.

Regards,
Mark Wong


Re: survey of WAL blocksize changes

From
Simon Riggs
Date:
On Tue, 2009-05-26 at 19:51 -0700, Mark Wong wrote:
> It appears for this workload using a 16KB or 32KB gets more than 4%
> throughput improvement, but some of that could be noise. 

The baseline appears to have a significant jump in txn response time
after 77 mins on the baseline test. I think you should rerun that. My
guess would be it will reduce any gains shown with higher settings.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: survey of WAL blocksize changes

From
Greg Stark
Date:
Why on earth would there be a dip precisely at 8k with both smaller  
and larger block sizes being faster??

-- 
Greg


On 27 May 2009, at 03:51, Mark Wong <markwkm@gmail.com> wrote:

> Hi all,
>
> A long long time ago (in 2004) I ran a series of tests surveying the
> results of changing BLCKSZ when it was used for both the WAL logs and
> the rest of the database system:
>
> http://archives.postgresql.org/pgsql-hackers/2004-03/msg01194.php
>
> Now more than 5 years later and now being able to set the WAL log and
> the rest of the database to different block sizes, I have a set of
> test results with DBT-2 showing the effects of changing the WAL log
> block size on OLTP transaction throughput on ext2, ranging from 1KB to
> 64KB:
>
> BS notpm % Change from default
> -- ----- ----------
> 1 14673 -4.8%
> 2 15864 2.9%
> 4 15774 2.3%
> 8 15413 (default)
> 16 16118 4.6%
> 32 16051 4.1%
> 64 14874 -3.5%
>
> Pointers to raw data:
>
> BS url
> -- ---
> 1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/
> 2 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/
> 4 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/
> 8 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.2/
> 16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
> 32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
> 64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/
>
>
> It appears for this workload using a 16KB or 32KB gets more than 4%
> throughput improvement, but some of that could be noise.  Nothing
> quite jaw dropping yet.  It'll be interesting to see if the
> combination of changing the table block size can further improve the
> performance.  It will probably be interesting to try different
> filesystems and filesystem blocksizes too.
>
> Regards,
> Mark Wong
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: survey of WAL blocksize changes

From
Peter Eisentraut
Date:
On Wednesday 27 May 2009 05:51:05 Mark Wong wrote:
> BS notpm % Change from default
> -- ----- ----------
>  1 14673 -4.8%
>  2 15864 2.9%
>  4 15774 2.3%
>  8 15413 (default)
> 16 16118 4.6%
> 32 16051 4.1%
> 64 14874 -3.5%

This means that both somewhat larger and somewhat smaller than 8 give better 
results?!?



Re: survey of WAL blocksize changes

From
Mark Wong
Date:
On Wed, May 27, 2009 at 1:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Tue, 2009-05-26 at 19:51 -0700, Mark Wong wrote:
>> It appears for this workload using a 16KB or 32KB gets more than 4%
>> throughput improvement, but some of that could be noise.
>
> The baseline appears to have a significant jump in txn response time
> after 77 mins on the baseline test. I think you should rerun that. My
> guess would be it will reduce any gains shown with higher settings.

Oopsies.  I've rerun, but now that there is no dip, the average
throughput still didn't change much:

BS notpm % Change from default
-- ----- ----------1 14673 -5.1%2 15864 2.7%4 15774 2.1%8 15454 (default)
16 16118 4.3%
32 16051 3.9%
64 14874 -3.8%

Pointers to raw data:

BS url
-- ---1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/2
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/4
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/8
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.8/report/
16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/


Regards,
Mark Wong


Re: survey of WAL blocksize changes

From
Tom Lane
Date:
Mark Wong <markwkm@gmail.com> writes:
> Oopsies.  I've rerun, but now that there is no dip, the average
> throughput still didn't change much:

> BS notpm % Change from default
> -- ----- ----------
>  1 14673 -5.1%
>  2 15864 2.7%
>  4 15774 2.1%
>  8 15454 (default)
> 16 16118 4.3%
> 32 16051 3.9%
> 64 14874 -3.8%

So, if we assume that these numbers are real and not artifacts, it seems
we have to postulate at least four distinct block-size-dependent
performance effects:

1. A strong penalty for smaller block sizes, which becomes dominant
below 2KB.

2. A strong penalty for larger block sizes, which becomes dominant
above 32KB.

3. A weak benefit for smaller block sizes, which is visible at 2-4KB
but fades away at 8KB.

4. A weak benefit for larger block sizes, which only becomes visible
above 8KB.

It's not too hard to believe any of those individually, and even to
think of plausible mechanisms.  But it seems a bit unlikely that effects
3 and 4 would exist but consistently cross over right at our traditional
choice of block size.

I'm suspecting that this curve is heavily dependent on details of the
DBT2 test and/or the hardware used.  It would be interesting to see if
anyone can replicate it using a different benchmark.
        regards, tom lane


Re: survey of WAL blocksize changes

From
Greg Stark
Date:
I find it pretty hard to beleive that 8k is precisely where a drop in  
performance shows up unless there's some peculiar reason.

The only peculiar reason I can imagine is full page writes. If the  
dbt2 workload is modifying already full pages then the full page  
writes will always be just shy of a complete page and with the xlog  
record might consistently be just larger than a full block.

I'm not immediately sure why that would cause a problem but it's been  
a while since I traced through the xlog code.

-- 
Greg


On 28 May 2009, at 02:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Mark Wong <markwkm@gmail.com> writes:
>> Oopsies.  I've rerun, but now that there is no dip, the average
>> throughput still didn't change much:
>
>> BS notpm % Change from default
>> -- ----- ----------
>> 1 14673 -5.1%
>> 2 15864 2.7%
>> 4 15774 2.1%
>> 8 15454 (default)
>> 16 16118 4.3%
>> 32 16051 3.9%
>> 64 14874 -3.8%
>
> So, if we assume that these numbers are real and not artifacts, it  
> seems
> we have to postulate at least four distinct block-size-dependent
> performance effects:
>
> 1. A strong penalty for smaller block sizes, which becomes dominant
> below 2KB.
>
> 2. A strong penalty for larger block sizes, which becomes dominant
> above 32KB.
>
> 3. A weak benefit for smaller block sizes, which is visible at 2-4KB
> but fades away at 8KB.
>
> 4. A weak benefit for larger block sizes, which only becomes visible
> above 8KB.
>
> It's not too hard to believe any of those individually, and even to
> think of plausible mechanisms.  But it seems a bit unlikely that  
> effects
> 3 and 4 would exist but consistently cross over right at our  
> traditional
> choice of block size.
>
> I'm suspecting that this curve is heavily dependent on details of the
> DBT2 test and/or the hardware used.  It would be interesting to see if
> anyone can replicate it using a different benchmark.
>
>            regards, tom lane
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: survey of WAL blocksize changes

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 21:09 -0400, Tom Lane wrote:

> So, if we assume that these numbers are real and not artifacts, it seems
> we have to postulate at least four distinct block-size-dependent
> performance effects:

Two performance effects would be sufficient to explain the results.

* Optimal performance for small WAL changes is reached at around 4kB.
Anything smaller or larger lessens the benefit from this.

* Optimal performance for full page writes is reached at a WAL block
size 2-4 times larger than db block size, corresponding to sizes of WAL
records generated by test.

The two effects have a tail off on either side, giving the four effects
you spoke of.

> It's not too hard to believe any of those individually, and even to
> think of plausible mechanisms.  But it seems a bit unlikely that effects
> 3 and 4 would exist but consistently cross over right at our traditional
> choice of block size.

I could believe two, but we would need some careful instrumentation to
reveal at what times we got benefit. We will never achieve improvements
if we look at figures averaged over longer periods. 

We should be trying to improve specific parts of the checkpoint cycle,
which I would break down like this:
* ramp-up
* checkpoint spike
* post-checkpoint trough
* normal running
There is very clear modal behaviour showing in the tests and we should
look at the effects of patches in each case. I could well believe that
we make a gain at one stage and lose on another.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: survey of WAL blocksize changes

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 17:51 -0700, Mark Wong wrote:
> On Wed, May 27, 2009 at 1:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > On Tue, 2009-05-26 at 19:51 -0700, Mark Wong wrote:
> >> It appears for this workload using a 16KB or 32KB gets more than 4%
> >> throughput improvement, but some of that could be noise.
> >
> > The baseline appears to have a significant jump in txn response time
> > after 77 mins on the baseline test. I think you should rerun that. My
> > guess would be it will reduce any gains shown with higher settings.
> 
> Oopsies.  I've rerun, but now that there is no dip, the average
> throughput still didn't change much:
> 
> BS notpm % Change from default
> -- ----- ----------
>  1 14673 -5.1%
>  2 15864 2.7%
>  4 15774 2.1%
>  8 15454 (default)
> 16 16118 4.3%
> 32 16051 3.9%
> 64 14874 -3.8%
> 
> Pointers to raw data:
> 
> BS url
> -- ---
>  1 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.1/
>  2 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/
>  4 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.4/
>  8 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.8/report/
> 16 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/
> 32 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.32/
> 64 http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.64/

Look at these graphs, in this order
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.16/report/rt_d.png
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.2/report/rt_d.png
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.wal.8/report/rt_d.png

BS 16 and 2 look very similar, though with 16 clearly a better curve.
BS=8 looks very strange in comparison. Still something wrong, I suspect.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support