Thread: data on devel code perf dip

data on devel code perf dip

From

Mary Edie Meredith

Date:

11 August 2005, 19:18:09

I have an example of runs that illustrate a performance 
problem that occurred between installing the 7/18 and 8/1 
development release codes.  I'm running on a PPC64 8-way system, 
with 16GB of memory (on a 15GB virtual machine), with a DBT2 
workload configured as a 16 warehouse DBT2 with 16 db connections, 
no think time.  Using Linux 2.6.11/gentoo.


Overview:

Before picture: 
run 42 code updated from cvs July 18 12:18pm pacific
metric: 8374.71 NOTPM

After picture (same config params, db size, etc):
run 59 code downloaded from cvs (all of it) Aug1 9:58AM pacific
metric: 4787.61 NOTPM

Oprofile data in 59 (top part copied at the end of this message) but
not in 42 (although the binary is still around and I could in theory
rerun and get it).  

I'd like to first get back to the throughput of run 42, and then
remove what's keeping me from getting higher than 45% user with 
the original July 18 drop.

************************************************************
Here are more details:

(RUN 42) the previous performance 8374.71 new-order transactions per
minute (NOTPM) and %user around 45% on 8 processors.  Even this level of
user utilization is not good, as there is still some bandwidth left in
the log device (not much, but some).

http://developer.osdl.org/maryedie/DBT2_PGSQL/42/


This development code we got on July 18 12:18pm pacific. Tuned it and got
to run 42 on July 26. 

(RUN 59)  This is an example of a run with code downloaded (whole) on
Aug1 9:58am Pacific.  4787.61 new-order transactions per minute (NOTPM)
and ~22% user over 8 cpus.  This has the same config parameters as 42,
with autovacuum off.  However, I turned on oprofile for this run. (runs
without oprofile are equivalent).

http://developer.osdl.org/maryedie/DBT2_PGSQL/59/


Top things for postgreSQL in the oprofile are copied below.

CPU: ppc64 POWER5, speed 1656.38 MHz (estimated)
Counted CYCLES events (Processor cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        app name                 symbol name
164623113 70.5372  kernel-2.6.11.3          .shared_idle
6641569   2.8458  libc-2.3.4.so            (no symbols)
4011874   1.7190  postgres                 .AllocSetAlloc
2798355   1.1990  postgres                 .CreateLWLocks
1731687   0.7420  postgres                 .SearchCatCache
1701936   0.7292  postgres                 .MemoryContextAllocZeroAligned
1509903   0.6470  postgres                 .FunctionCall2
1320467   0.5658  postgres                 .hash_search
1275278   0.5464  oprofiled                (no symbols)
1271477   0.5448  postgres                 .yyparse
1099396   0.4711  postgres                 .OpernameGetCandidates
794802    0.3406  postgres                 .expression_tree_walker
731969    0.3136  postgres                 .MemoryContextAlloc
730266    0.3129  postgres                 ._bt_compare
709793    0.3041  postgres                 .DirectFunctionCall1
689405    0.2954  postgres                 .base_yylex
683776    0.2930  postgres                 .hash_any
635680    0.2724  postgres                 .DLMoveToFront
632098    0.2708  kernel-2.6.11.3          .___dst_free
562111    0.2409  kernel-2.6.11.3          .sys_setrlimit
508608    0.2179  postgres                 .AtEOXact_CatCache
499318    0.2139  postgres                 .heap_release_fetch
498136    0.2134  kernel-2.6.11.3          .tty_write
494020    0.2117  postgres                 .XLogInsert


-- 
Mary Edie Meredith
Initiative Manager
Open Source Development Labs
maryedie@hotmail.com
503-906-1942

Re: data on devel code perf dip

From

Tom Lane

Date:

11 August 2005, 22:02:27

Mary Edie Meredith <maryedie@osdl.org> writes:
> I have an example of runs that illustrate a performance 
> problem that occurred between installing the 7/18 and 8/1 
> development release codes.

I dug through the CVS logs to see what had changed, and I'm afraid there
is just one plausible-looking candidate:

2005-07-28 23:22  momjian
* src/backend/access/transam/xlog.c: Use O_DIRECT if available when using O_SYNC for wal_sync_method.Also, write
multipleWAL buffers out in one write() operation.ITAGAKI Takahiro

Most of the CVS activity in that time period had to with stuff like
roles and the interval datatype.  It's conceivable that these things
had some marginal performance cost, but if so I'd have expected it to
show up as extra CPU effort (more time checking permissions, say).
This figure:

> samples  %        app name                 symbol name
> 164623113 70.5372  kernel-2.6.11.3          .shared_idle

says pretty clearly that your problem is all I/O wait, and there are
no other commits that might have increased our tendency to wait for I/O.

I am sure I will get some pushback if I propose reverting the O_DIRECT
patch, so could you try to get some more-specific evidence?  Like pull
the CVS tree from just before and just after this patch and compare
performance?

BTW I did check that both runs are using wal_sync_method = fdatasync
and wal_buffers = 1000, so it's not a problem of those parameters having
been changed by the patch.
        regards, tom lane

Re: data on devel code perf dip

From

Alvaro Herrera

Date:

11 August 2005, 22:30:54

On Thu, Aug 11, 2005 at 09:02:03PM -0400, Tom Lane wrote:

> I am sure I will get some pushback if I propose reverting the O_DIRECT
> patch, so could you try to get some more-specific evidence?  Like pull
> the CVS tree from just before and just after this patch and compare
> performance?

Quoth the open(2) manpage:
      O_DIRECT             Try to minimize cache effects of the I/O to  and  from  this             file.   In  general
thiswill degrade performance, but it is             useful in special situations, such as when  applications  do
    their  own  caching.  File I/O is done directly to/from user             space buffers.  The I/O is synchronous,
i.e.,at the comple-             tion of the read(2) or write(2) system call, data is guaran-             teed to have
beentransferred.
 

In light of this, may I ask whether it makes sense to compare the
performance of two runs with similar shared_buffer settings?  With
O_DIRECT, I understand from this manpage that the OS is going to do
little or no page caching, so shared_buffers should be increased to
account for this fact.

Am I missing something?

-- 
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Hay que recordar que la existencia en el cosmos, y particularmente la
elaboración de civilizaciones dentre de él no son, por desgracia,
nada idílicas" (Ijon Tichy)

Re: data on devel code perf dip

From

Tom Lane

Date:

11 August 2005, 22:38:40

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> In light of this, may I ask whether it makes sense to compare the
> performance of two runs with similar shared_buffer settings?  With
> O_DIRECT, I understand from this manpage that the OS is going to do
> little or no page caching, so shared_buffers should be increased to
> account for this fact.

> Am I missing something?

O_DIRECT is only being used for WAL page writes (or I sure hope so
anyway), so shared_buffers should be irrelevant.
        regards, tom lane

Re: data on devel code perf dip

From

Bruce Momjian

Date:

11 August 2005, 22:40:46

Tom Lane wrote:
> Most of the CVS activity in that time period had to with stuff like
> roles and the interval datatype.  It's conceivable that these things
> had some marginal performance cost, but if so I'd have expected it to
> show up as extra CPU effort (more time checking permissions, say).
> This figure:
> 
> > samples  %        app name                 symbol name
> > 164623113 70.5372  kernel-2.6.11.3          .shared_idle
> 
> says pretty clearly that your problem is all I/O wait, and there are
> no other commits that might have increased our tendency to wait for I/O.
> 
> I am sure I will get some pushback if I propose reverting the O_DIRECT
> patch, so could you try to get some more-specific evidence?  Like pull
> the CVS tree from just before and just after this patch and compare
> performance?
> 
> BTW I did check that both runs are using wal_sync_method = fdatasync
> and wal_buffers = 1000, so it's not a problem of those parameters having
> been changed by the patch.

We can supply a patch with just the O_DIRECT for you to test.  The
O_DIRECT patch also had grouped WAL writes, so that might be an issue
too.  Also, O_DIRECT is only used for open_* wal sync methods, so I
don't see how it would affect this, but the grouped WAL writes might.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: data on devel code perf dip

From

Andrew - Supernews

Date:

11 August 2005, 22:45:05

On 2005-08-12, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> In light of this, may I ask whether it makes sense to compare the
> performance of two runs with similar shared_buffer settings?  With
> O_DIRECT, I understand from this manpage that the OS is going to do
> little or no page caching, so shared_buffers should be increased to
> account for this fact.
>
> Am I missing something?

O_DIRECT should only be being used for the WAL, not for buffer I/O.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: data on devel code perf dip

From

Bruce Momjian

Date:

11 August 2005, 22:45:29

Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > In light of this, may I ask whether it makes sense to compare the
> > performance of two runs with similar shared_buffer settings?  With
> > O_DIRECT, I understand from this manpage that the OS is going to do
> > little or no page caching, so shared_buffers should be increased to
> > account for this fact.
> 
> > Am I missing something?
> 
> O_DIRECT is only being used for WAL page writes (or I sure hope so
> anyway), so shared_buffers should be irrelevant.

Uh, O_DIRECT really just enables when open_sync is used, and I assume
that is not used for writing dirty buffers during a checkpoint.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: data on devel code perf dip

From

Bruce Momjian

Date:

11 August 2005, 23:01:30

Andrew - Supernews wrote:
> On 2005-08-12, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > In light of this, may I ask whether it makes sense to compare the
> > performance of two runs with similar shared_buffer settings?  With
> > O_DIRECT, I understand from this manpage that the OS is going to do
> > little or no page caching, so shared_buffers should be increased to
> > account for this fact.
> >
> > Am I missing something?
> 
> O_DIRECT should only be being used for the WAL, not for buffer I/O.

And only when open_sync or open_datasync are used.  But the group write
patch is used in all cases, but again only WAL.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: data on devel code perf dip

From

Tom Lane

Date:

11 August 2005, 23:09:30

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> O_DIRECT is only being used for WAL page writes (or I sure hope so
>> anyway), so shared_buffers should be irrelevant.

> Uh, O_DIRECT really just enables when open_sync is used, and I assume
> that is not used for writing dirty buffers during a checkpoint.

I double-checked that O_DIRECT is really just used for WAL, and only
when the sync mode is open_sync or open_datasync.  So it seems
impossible that it affected a run with mode fdatasync.  What seems the
best theory at the moment is that the grouped-WAL-write part of the
patch doesn't work so well as we thought.
        regards, tom lane

Re: data on devel code perf dip

From

Bruce Momjian

Date:

11 August 2005, 23:11:47

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> O_DIRECT is only being used for WAL page writes (or I sure hope so
> >> anyway), so shared_buffers should be irrelevant.
> 
> > Uh, O_DIRECT really just enables when open_sync is used, and I assume
> > that is not used for writing dirty buffers during a checkpoint.
> 
> I double-checked that O_DIRECT is really just used for WAL, and only
> when the sync mode is open_sync or open_datasync.  So it seems
> impossible that it affected a run with mode fdatasync.  What seems the
> best theory at the moment is that the grouped-WAL-write part of the
> patch doesn't work so well as we thought.

Yes, that's my only guess.  Let us know if you want the patch to test,
rather than pulling CVS before and after the patch was applied.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: data on devel code perf dip

From

Mark Wong

Date:

12 August 2005, 12:43:03

On Thu, 11 Aug 2005 22:11:42 -0400 (EDT)
Bruce Momjian <pgman@candle.pha.pa.us> wrote:

> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > >> O_DIRECT is only being used for WAL page writes (or I sure hope so
> > >> anyway), so shared_buffers should be irrelevant.
> > 
> > > Uh, O_DIRECT really just enables when open_sync is used, and I assume
> > > that is not used for writing dirty buffers during a checkpoint.
> > 
> > I double-checked that O_DIRECT is really just used for WAL, and only
> > when the sync mode is open_sync or open_datasync.  So it seems
> > impossible that it affected a run with mode fdatasync.  What seems the
> > best theory at the moment is that the grouped-WAL-write part of the
> > patch doesn't work so well as we thought.
> 
> Yes, that's my only guess.  Let us know if you want the patch to test,
> rather than pulling CVS before and after the patch was applied.

Yeah, a patch would be a little easier. :)

Thanks,
Mark

Re: data on devel code perf dip

From

Tom Lane

Date:

16 August 2005, 19:57:23

Mary Edie Meredith <maryedie@osdl.org> writes:
> I'm still very concerned about what I'm seeing in the oprofile:
> namely:  .CreateLWLocks is the second highest entry for postgres.
> http://developer.osdl.org/maryedie/DBT2_PGSQL/59/oprofile.txt

This says there's something wrong with your oprofile setup.
CreateLWLocks is run *once* at postmaster startup, and it doesn't
take all that long.  Check to see if maybe your executables are
out of sync with what oprofile is looking at?
        regards, tom lane

Re: data on devel code perf dip

From

Mary Edie Meredith

Date:

16 August 2005, 20:00:32

On Thu, 2005-08-11 at 22:11 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > >> O_DIRECT is only being used for WAL page writes (or I sure hope so
> > >> anyway), so shared_buffers should be irrelevant.
> > 
> > > Uh, O_DIRECT really just enables when open_sync is used, and I assume
> > > that is not used for writing dirty buffers during a checkpoint.
> > 
> > I double-checked that O_DIRECT is really just used for WAL, and only
> > when the sync mode is open_sync or open_datasync.  So it seems
> > impossible that it affected a run with mode fdatasync.  What seems the
> > best theory at the moment is that the grouped-WAL-write part of the
> > patch doesn't work so well as we thought.
> 
> Yes, that's my only guess.  Let us know if you want the patch to test,
> rather than pulling CVS before and after the patch was applied.

I tried the pull we have from 8/28/05, which should not have the Wal
O-DIRECT patch or the WAL grouping patch (I believe that markw check
this for me).

Unfortunately the performance is just as bad as run 59.  So it looks
like those patches are not my immediate problem.

While you think about that, I'm going to pull the development code
from July 18, and run it again with oprofile.

I'm still very concerned about what I'm seeing in the oprofile:
namely:  .CreateLWLocks is the second highest entry for postgres.
http://developer.osdl.org/maryedie/DBT2_PGSQL/59/oprofile.txt

It is still the second highest entry in the run I just did.
Why are there so many "Creates"?  Shouldn't there just be mostly
calls that gab a LW lock and release it?  Mark and I looked 
through the code at what the CreateLWLocks subroutine does
and found the note inserted below.  Since the connections are 
established at the very beginning, and this run went on for
30 more minutes, I dont understand why so many creates happened.
Can you explain this?  

* This is called by the postmaster or by a standalone backend.* It is also called by a backend forked from the
postmasterin the* EXEC_BACKEND case.  In the latter case, the shared memory segment* already exists and has been
physicallyattached to, but we have to* initialize pointers in local memory that reference the shared

structures,* because we didn't inherit the correct pointer values from the
postmaster* as we do in the fork() scenario.  The easiest way to do that is to
run* through the same code as before.  (Note that the called routines
mostly* check IsUnderPostmaster, rather than EXEC_BACKEND, to detect this
case.* This is a bit code-wasteful and could be cleaned up.)** If "makePrivate" is true then we only need private
memory,not shared* memory.  This is true for a standalone backend, false for a

postmaster.

TIA.

> 
-- 
Mary Edie Meredith
Initiative Manager
Open Source Development Labs
maryedie@hotmail.com
503-906-1942

Re: data on devel code perf dip

From

Mary Edie Meredith

Date:

16 August 2005, 20:06:40

On Tue, 2005-08-16 at 18:53 -0400, Tom Lane wrote:
> Mary Edie Meredith <maryedie@osdl.org> writes:
> > I'm still very concerned about what I'm seeing in the oprofile:
> > namely:  .CreateLWLocks is the second highest entry for postgres.
> > http://developer.osdl.org/maryedie/DBT2_PGSQL/59/oprofile.txt
> 
> This says there's something wrong with your oprofile setup.
> CreateLWLocks is run *once* at postmaster startup, and it doesn't
> take all that long.  Check to see if maybe your executables are
> out of sync with what oprofile is looking at?

Ok, Tom, sounds correct. thanks.
> 
>             regards, tom lane

Re: data on devel code perf dip

From

Mark Wong

Date:

16 August 2005, 20:25:40

On Tue, 16 Aug 2005 18:53:55 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Mary Edie Meredith <maryedie@osdl.org> writes:
> > I'm still very concerned about what I'm seeing in the oprofile:
> > namely:  .CreateLWLocks is the second highest entry for postgres.
> > http://developer.osdl.org/maryedie/DBT2_PGSQL/59/oprofile.txt
> 
> This says there's something wrong with your oprofile setup.
> CreateLWLocks is run *once* at postmaster startup, and it doesn't
> take all that long.  Check to see if maybe your executables are
> out of sync with what oprofile is looking at?

It is a POWER5 platform, if that has anything to do with it.  It
certainly doesn't look sane, but for user apps oprofile is supposed to
be getting the symbols directly from the binaries its using.  I'm
inclined to blame the platform support. ;)

Mark

Re: data on devel code perf dip

From

Mary Edie Meredith

Date:

16 August 2005, 20:45:52

On Tue, 2005-08-16 at 16:07 -0700, Mark Wong wrote:
> On Tue, 16 Aug 2005 18:53:55 -0400
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > Mary Edie Meredith <maryedie@osdl.org> writes:
> > > I'm still very concerned about what I'm seeing in the oprofile:
> > > namely:  .CreateLWLocks is the second highest entry for postgres.
> > > http://developer.osdl.org/maryedie/DBT2_PGSQL/59/oprofile.txt
> > 
> > This says there's something wrong with your oprofile setup.
> > CreateLWLocks is run *once* at postmaster startup, and it doesn't
> > take all that long.  Check to see if maybe your executables are
> > out of sync with what oprofile is looking at?
> 
> It is a POWER5 platform, if that has anything to do with it.  It
> certainly doesn't look sane, but for user apps oprofile is supposed to
> be getting the symbols directly from the binaries its using.  I'm
> inclined to blame the platform support. ;)

Could it have to do with the fact that I'm running in a Virtual Machine?
Maybe they forgot to change the symbols to reflect the VM's "virtual
physical address" if you get my drift.  Who do you know on the IBM side,
Mark, who could help us sort this out?
> 
> Mark

Re: data on devel code perf dip

From

markw@osdl.org

Date:

16 August 2005, 21:07:49

> On Tue, 2005-08-16 at 16:07 -0700, Mark Wong wrote:
>> On Tue, 16 Aug 2005 18:53:55 -0400
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> > Mary Edie Meredith <maryedie@osdl.org> writes:
>> > > I'm still very concerned about what I'm seeing in the oprofile:
>> > > namely:  .CreateLWLocks is the second highest entry for postgres.
>> > > http://developer.osdl.org/maryedie/DBT2_PGSQL/59/oprofile.txt
>> >
>> > This says there's something wrong with your oprofile setup.
>> > CreateLWLocks is run *once* at postmaster startup, and it doesn't
>> > take all that long.  Check to see if maybe your executables are
>> > out of sync with what oprofile is looking at?
>>
>> It is a POWER5 platform, if that has anything to do with it.  It
>> certainly doesn't look sane, but for user apps oprofile is supposed to
>> be getting the symbols directly from the binaries its using.  I'm
>> inclined to blame the platform support. ;)
>
> Could it have to do with the fact that I'm running in a Virtual Machine?
> Maybe they forgot to change the symbols to reflect the VM's "virtual
> physical address" if you get my drift.  Who do you know on the IBM side,
> Mark, who could help us sort this out?

Not likely, that would imply the system knew it was running in a virtual
machine.  You're not supposed to know you're in a virtual machine.  I
don't know who at IBM to ask about the hypervisor.  As for oprofile, the
oprofile list would be a start.

Mark

Re: data on devel code perf dip

From

Tom Lane

Date:

21 August 2005, 21:38:08

I wrote:
> Mary Edie Meredith <maryedie@osdl.org> writes:
>> I have an example of runs that illustrate a performance 
>> problem that occurred between installing the 7/18 and 8/1 
>> development release codes.

> I dug through the CVS logs to see what had changed, and I'm afraid there
> is just one plausible-looking candidate:

>     * src/backend/access/transam/xlog.c: 
>     Use O_DIRECT if available when using O_SYNC for wal_sync_method.
>     Also, write multiple WAL buffers out in one write() operation.

I've been sniffing around that patch and not really finding any smoking
gun about why it would make things slower when you're not using O_DIRECT.
I noticed that it forces 8K alignment of the WAL buffers on any machine
that has O_DIRECT defined, whether you use O_DIRECT or not --- but it's
pretty hard to see why that would make things slower.  (Indeed, the
older code only guaranteed MAXALIGN alignment of the WAL buffers, and
we *know* that's wrong --- on Intel machines you want cache-line
alignment to make kernel-to-userspace transfers fast.  BTW, does anyone
know if the current definition of BUFFERALIGN == 32 bytes needs to be
increased for newer Intel machines?)

I've got a bunch of minor gripes about the coding style of the
gather-write part of the patch, but I really can't see that it's
causing any performance issue.  AFAICS it just replaces several
closely spaced write() syscalls with one larger one.  It's not
possible that the kernel you're using is less efficient for larger
transfers than smaller ones, is it?

The whole thing's pretty bizarre.
        regards, tom lane

Re: data on devel code perf dip

From

"Jeffrey W. Baker"

Date:

21 August 2005, 23:30:37

On Sun, 2005-08-21 at 20:37 -0400, Tom Lane wrote:
> The whole thing's pretty bizarre.

I hate to sound obvious, but does the missing performance return if you
back the patch out?  It seemed to have been decided on Tue, 16 Aug 2005
15:45:30 -0700 that the performance was the same before and after.
However, there also seems to be version confusion, as maryedie@osdl.org
also claimed to be testing a tree from the future.

Basically I think there's more confusion here than evidence.  On an
admittedly smaller x86 configuration, source trees with this patch are
faster than without, not slower.

And finally, not to start a flamewar, but the initial report was on a
machine running gentoo, the distribution that should be renamed
Irreproduceable Benchmarks Linux.

-jwb

Re: data on devel code perf dip

From

Tom Lane

Date:

22 August 2005, 21:23:41

I wrote:
> I've been sniffing around that patch and not really finding any smoking
> gun about why it would make things slower when you're not using O_DIRECT.

While rewriting the patch to fit more naturally into xlog.c, I found a
bug that might possibly be related to your performance issue. The
if-test in this fragment is wrong:
/* * If we just wrote the whole last page of a logfile segment, * fsync the segment immediately.
Thisavoids having to go back * and re-open prior segments when an fsync request comes along * later.
Doingit here ensures that one and only one backend will * perform this fsync. * * This is also the
rightplace to notify the Archiver that the * segment is ready to copy to archival storage. */ if
(openLogOff+ pages.size >= XLogSegSize && !ispartialpage) { XLogPageFlush(&pages, currentIndex);
issue_xlog_fsync(); ...

Because of the patch's logic changes, openLogOff is not the right thing
to be examining here --- we have not yet done a seek to ensure that it
matches the write start point. (pages.offset + pages.size would have
worked instead.)

The implication of this is that the code might either fail to do a write
and fsync when it needs to, or do extra fsyncs that it doesn't need to
do. I am wondering if the latter could explain your issue. Right
offhand it doesn't seem like this would lead to enough extra fsyncs
to account for the amount of slowdown you report --- but there just
doesn't seem to be anything else in there that could account for it.

I've committed a patch that fixes this and makes some other minor
improvements (which you probably won't notice given that you're using
such a large wal_buffers setting). You might like to update to CVS
tip (make sure you get xlog.c 1.218 or later) and see if things are
any better or not.
regards, tom lane

Re: data on devel code perf dip

From

Mary Edie Meredith

Date:

29 August 2005, 18:42:28

Thanks, Tom for keeping on this.  I was on vacation last week.  Will try
the latest  ASAP.


On Mon, 2005-08-22 at 20:20 -0400, Tom Lane wrote:
> I wrote:
> > I've been sniffing around that patch and not really finding any smoking
> > gun about why it would make things slower when you're not using O_DIRECT.
> 
> While rewriting the patch to fit more naturally into xlog.c, I found a
> bug that might possibly be related to your performance issue.  The
> if-test in this fragment is wrong:
> 
>         /*
>          * If we just wrote the whole last page of a logfile segment,
>          * fsync the segment immediately.  This avoids having to go back
>          * and re-open prior segments when an fsync request comes along
>          * later. Doing it here ensures that one and only one backend will
>          * perform this fsync.
>          *
>          * This is also the right place to notify the Archiver that the
>          * segment is ready to copy to archival storage.
>          */
>         if (openLogOff + pages.size >= XLogSegSize && !ispartialpage)
>         {
>             XLogPageFlush(&pages, currentIndex);
>             issue_xlog_fsync();
>             ...
> 
> Because of the patch's logic changes, openLogOff is not the right thing
> to be examining here --- we have not yet done a seek to ensure that it
> matches the write start point.  (pages.offset + pages.size would have
> worked instead.)
> 
> The implication of this is that the code might either fail to do a write
> and fsync when it needs to, or do extra fsyncs that it doesn't need to
> do.  I am wondering if the latter could explain your issue.  Right
> offhand it doesn't seem like this would lead to enough extra fsyncs
> to account for the amount of slowdown you report --- but there just
> doesn't seem to be anything else in there that could account for it.
> 
> I've committed a patch that fixes this and makes some other minor
> improvements (which you probably won't notice given that you're using
> such a large wal_buffers setting).  You might like to update to CVS
> tip (make sure you get xlog.c 1.218 or later) and see if things are
> any better or not.
> 
>             regards, tom lane