Re: measuring lwlock-related latency spikes - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: measuring lwlock-related latency spikes
Date
Msg-id CAMkU=1xa9DBbscMUVyD+2KNAFLAAmTO+1dqQwPzJvei97SA7YQ@mail.gmail.com
Whole thread Raw
In response to Re: measuring lwlock-related latency spikes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Apr 2, 2012 at 12:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Long story short, when a CLOG-related stall happens,
>> essentially all the time is being spent in this here section of code:
>
>>     /*
>>      * If not part of Flush, need to fsync now.  We assume this happens
>>      * infrequently enough that it's not a performance issue.
>>      */
>>     if (!fdata) // fsync and close the file
>
> Seems like basically what you've proven is that this code path *is* a
> performance issue, and that we need to think a bit harder about how to
> avoid doing the fsync while holding locks.

And why is the fsync needed at all upon merely evicting a dirty page
so a replacement can be loaded?

If the system crashes between the write and the (eventual) fsync, you
are in the same position as if the system crashed while the page was
dirty in shared memory.  Either way, you have to be able to recreate
it from WAL, right?


Cheers,

Jeff


pgsql-hackers by date:

Previous
From: "Greg Sabino Mullane"
Date:
Subject: Re: libxml related crash on git head
Next
From: Robert Haas
Date:
Subject: Re: measuring lwlock-related latency spikes