Thread: wal_level=minimal produces more data than archive level

wal_level=minimal produces more data than archive level

From

Tomas Vondra

Date:

02 April 2011, 20:06:00

Hi everyone,

I've been playing with various wal_level settings, mainly to find out
what is the overhead of hot standby, and I've noticed a strange thing.
In some cases the wal_level=minimal produces signigicantly more xlog
data than wal_level=archive (and hot_standby).

=====================================================================

Example:

1) with wal_level=minimal

$ psql -c "select pg_xlogfile_name_offset(pg_current_xlog_location())"

      pg_xlogfile_name_offset
------------------------------------
 (000000010000000000000001,5799664)

$ pgbench -i -s 10

NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
creating tables...
10000 tuples done.
20000 tuples done.
...
990000 tuples done.
1000000 tuples done.
set primary key...
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_branches_pkey" for table "pgbench_branches"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_tellers_pkey" for table "pgbench_tellers"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_accounts_pkey" for table "pgbench_accounts"
vacuum...done.

$ psql -c "select pg_xlogfile_name_offset(pg_current_xlog_location())"

      pg_xlogfile_name_offset
------------------------------------
 (000000010000000000000001,5945832)

$ pgbench -c 1 -t 10000

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 10000/10000
tps = 515.508932 (including connections establishing)
tps = 515.623415 (excluding connections establishing)

$ psql -c "select pg_xlogfile_name_offset(pg_current_xlog_location())"

      pg_xlogfile_name_offset
------------------------------------
 (000000010000000000000006,3395840)

=====================================================================

2) with wal_level=archive

$ psql -c "select pg_xlogfile_name_offset(pg_current_xlog_location())"

      pg_xlogfile_name_offset
------------------------------------
 (000000010000000000000001,5799708)

$ pgbench -i -s 10

NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
creating tables...
10000 tuples done.
20000 tuples done.
...
990000 tuples done.
1000000 tuples done.
set primary key...
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_branches_pkey" for table "pgbench_branches"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_tellers_pkey" for table "pgbench_tellers"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_accounts_pkey" for table "pgbench_accounts"
vacuum...done.

$ psql -c "select pg_xlogfile_name_offset(pg_current_xlog_location())"

      pg_xlogfile_name_offset
------------------------------------
 (00000001000000000000000B,8772044)

$ pgbench -c 1 -t 10000

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 10000/10000
tps = 748.476327 (including connections establishing)
tps = 748.664607 (excluding connections establishing)

$ psql -c "select pg_xlogfile_name_offset(pg_current_xlog_location())"

       pg_xlogfile_name_offset
-------------------------------------
 (00000001000000000000000B,13937304)

=====================================================================

Which means the pgbench init takes about 142kB with 'minimal' level and
about 163MB with 'archive' level (which is expected).

But the actual pgbench run produces much more xlog data with minimal wal
level compared to archive level. With minimal level it produces about
90MB and with archive level it produces just about 5MB.

I've check the docs and the only possible explanation I've noticed when
reading the docs is that while wal_level=minimal allows to skip logging
of some bulk operations (e.g. COPY, which is exactly what pgbench does
when initializing the DB), it may need to log more data later (when
actually running the bench).

Is that a correct conclusion, or am I missing something?

regards
Tomas

Re: wal_level=minimal produces more data than archive level

From

Tom Lane

Date:

02 April 2011, 21:45:39

Tomas Vondra <tv@fuzzy.cz> writes:
> I've been playing with various wal_level settings, mainly to find out
> what is the overhead of hot standby, and I've noticed a strange thing.
> In some cases the wal_level=minimal produces signigicantly more xlog
> data than wal_level=archive (and hot_standby).
> ...
> Which means the pgbench init takes about 142kB with 'minimal' level and
> about 163MB with 'archive' level (which is expected).

> But the actual pgbench run produces much more xlog data with minimal wal
> level compared to archive level. With minimal level it produces about
> 90MB and with archive level it produces just about 5MB.

> I've check the docs and the only possible explanation I've noticed when
> reading the docs is that while wal_level=minimal allows to skip logging
> of some bulk operations (e.g. COPY, which is exactly what pgbench does
> when initializing the DB), it may need to log more data later (when
> actually running the bench).

I wonder whether the discrepancy relates to having to produce
full-page-image WAL entries during the first touch of a page during the
pgbench run, versus having already done so in initialization.  If you
force a checkpoint after the init step, do the results change?

            regards, tom lane

Re: wal_level=minimal produces more data than archive level

From

Tomas Vondra

Date:

02 April 2011, 22:27:02

Dne 3.4.2011 02:45, Tom Lane napsal(a):
> Tomas Vondra <tv@fuzzy.cz> writes:
>> I've been playing with various wal_level settings, mainly to find out
>> what is the overhead of hot standby, and I've noticed a strange thing.
>> In some cases the wal_level=minimal produces signigicantly more xlog
>> data than wal_level=archive (and hot_standby).
>> ...
>> Which means the pgbench init takes about 142kB with 'minimal' level and
>> about 163MB with 'archive' level (which is expected).
>
>> But the actual pgbench run produces much more xlog data with minimal wal
>> level compared to archive level. With minimal level it produces about
>> 90MB and with archive level it produces just about 5MB.
>
>> I've check the docs and the only possible explanation I've noticed when
>> reading the docs is that while wal_level=minimal allows to skip logging
>> of some bulk operations (e.g. COPY, which is exactly what pgbench does
>> when initializing the DB), it may need to log more data later (when
>> actually running the bench).
>
> I wonder whether the discrepancy relates to having to produce
> full-page-image WAL entries during the first touch of a page during the
> pgbench run, versus having already done so in initialization.  If you
> force a checkpoint after the init step, do the results change?

Yes, a forced CHECKPOINT results in a much more xlog data in case of the
archive level. Without the checkpoint there was about 5MB, now there is
about about 90MB (a fet kB more than with the minimal wal level).

Hmmm, I'm wondering which of these two cases is more apropriate when
comparing wal levels - with the checkpoint or without it? I believe the
one with checkpoint, as checkpoints happen all the time anyway.

regards
Tomas

Re: wal_level=minimal produces more data than archive level

From

Tom Lane

Date:

03 April 2011, 03:04:11

Tomas Vondra <tv@fuzzy.cz> writes:
> Dne 3.4.2011 02:45, Tom Lane napsal(a):
>> I wonder whether the discrepancy relates to having to produce
>> full-page-image WAL entries during the first touch of a page during the
>> pgbench run, versus having already done so in initialization.  If you
>> force a checkpoint after the init step, do the results change?

> Yes, a forced CHECKPOINT results in a much more xlog data in case of the
> archive level. Without the checkpoint there was about 5MB, now there is
> about about 90MB (a fet kB more than with the minimal wal level).

> Hmmm, I'm wondering which of these two cases is more apropriate when
> comparing wal levels - with the checkpoint or without it? I believe the
> one with checkpoint, as checkpoints happen all the time anyway.

Well, both of them are boundary cases --- in a realistic situation you'd
have some but not all pages already dirty since the last checkpoint.

The important point here is that you had non-comparable starting
conditions.

            regards, tom lane