Thread: posix_fadvise() and pg_receivexlog
Hi, The WAL files that pg_receivexlog writes will not be re-read soon basically, so we can advise the OS to release any cached pages when WAL file is closed. I feel inclined to change pg_receivexlog that way. Thought? Regards, -- Fujii Masao
On Wed, Aug 6, 2014 at 1:39 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > The WAL files that pg_receivexlog writes will not be re-read soon basically, > so we can advise the OS to release any cached pages when WAL file is > closed. I feel inclined to change pg_receivexlog that way. Thought? How do we know that the user doesn't plan to read them soon? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 08/06/2014 08:39 PM, Fujii Masao wrote: > Hi, > > The WAL files that pg_receivexlog writes will not be re-read soon basically, > so we can advise the OS to release any cached pages when WAL file is > closed. I feel inclined to change pg_receivexlog that way. Thought? -1. The OS should be smart enough to not thrash the cache by files that are written sequentially and never read. If we go down this path, we'd need to sprinkle posix_fadvises into many, many places. Anyway, who are we to say that they won't be re-read soon? You might e.g have a secondary backup site where you copy the files received by pg_receivexlog, as soon as they're completed. - Heikki
On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 08/06/2014 08:39 PM, Fujii Masao wrote: >> >> Hi, >> >> The WAL files that pg_receivexlog writes will not be re-read soon >> basically, >> so we can advise the OS to release any cached pages when WAL file is >> closed. I feel inclined to change pg_receivexlog that way. Thought? > > > -1. The OS should be smart enough to not thrash the cache by files that are > written sequentially and never read. Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe not, so I was thinking that posix_fadvise is called when the server closes WAL file. > If we go down this path, we'd need to > sprinkle posix_fadvises into many, many places. Yes, that's valid concern. But if we can prove that adding posix_fadvise to a certain place can improve the performance well, I'm inclined to do that. > Anyway, who are we to say that they won't be re-read soon? You might e.g > have a secondary backup site where you copy the files received by > pg_receivexlog, as soon as they're completed. So whether posix_fadvise is called or not needs to be exposed as an user-configurable option. We would need to measure how useful exposing that is, though. Regards, -- Fujii Masao
Hi,
On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 08/06/2014 08:39 PM, Fujii Masao wrote:
>> The WAL files that pg_receivexlog writes will not be re-read soon
>> basically,
>> so we can advise the OS to release any cached pages when WAL file is
>> closed. I feel inclined to change pg_receivexlog that way. Thought?
>
>
> -1. The OS should be smart enough to not thrash the cache by files that are
> written sequentially and never read.
OS's buffer strategy is optimized for general situation. Do you forget OS hackers discussion last a half of year?
Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe not,
so I was thinking that posix_fadvise is called when the server closes WAL file.
That's right.
> If we go down this path, we'd need to
> sprinkle posix_fadvises into many, many places.
Why do you aim to be perfect at the beginning?
It is as same as history of postgres, your concern doesn't make sense.
So whether posix_fadvise is called or not needs to be exposed as an
> Anyway, who are we to say that they won't be re-read soon? You might e.g
> have a secondary backup site where you copy the files received by
> pg_receivexlog, as soon as they're completed.
user-configurable option. We would need to measure how useful exposing
that is, though.
By the way, does pg_receivexlog process have fsync() in every WAL commit?
If yes, I think that we need no or less fsync() option for the better performance. It is general in NOSQL storages.
If no, we need fsync() option for more getting reliability and data integrarity.
--
Mitsumasa KONDO
On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote: > 2014-08-07 13:47 GMT+09:00 Fujii Masao <masao.fujii@gmail.com>: > >> On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas >> <hlinnakangas@vmware.com> wrote: >>> On 08/06/2014 08:39 PM, Fujii Masao wrote: >>>> The WAL files that pg_receivexlog writes will not be re-read soon >>>> basically, >>>> so we can advise the OS to release any cached pages when WAL file is >>>> closed. I feel inclined to change pg_receivexlog that way. Thought? >>> >>> >>> -1. The OS should be smart enough to not thrash the cache by files that >> are >>> written sequentially and never read. >> > OS's buffer strategy is optimized for general situation. Do you forget OS > hackers discussion last a half of year? > >> Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe >> not, >> so I was thinking that posix_fadvise is called when the server closes WAL >> file. > > That's right. Well, I'd like to hear someone from the field complaining that pg_receivexlog is thrashing the cache and thus reducing the performance of some other process. Or a least a synthetic test case that demonstrates that happening. > By the way, does pg_receivexlog process have fsync() in every WAL commit? It fsync's each file after finishing to write it. Ie. each WAL file is fsync'd once. > If yes, I think that we need no or less fsync() option for the better > performance. It is general in NOSQL storages. > If no, we need fsync() option for more getting reliability and data > integrarity. Hmm. An fsync=off style option might make sense, although I doubt the one fsync at end of file is causing a performance problem for anyone in practice. Haven't heard any complaints, anyway. An option to fsync after every commit record might make sense if you use pg_receivexlog with synchronous replication. Doing that would require parsing the WAL, though, to see where the commit records are. But then again, the fsync's wouldn't need to correspond to commit records. We could fsync just before we go to sleep to wait for more WAL to be received. - Heikki
On Thu, Aug 7, 2014 at 5:02 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote: >> >> 2014-08-07 13:47 GMT+09:00 Fujii Masao <masao.fujii@gmail.com>: >> >>> On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas >>> <hlinnakangas@vmware.com> wrote: >>>> >>>> On 08/06/2014 08:39 PM, Fujii Masao wrote: >>>>> >>>>> The WAL files that pg_receivexlog writes will not be re-read soon >>>>> basically, >>>>> so we can advise the OS to release any cached pages when WAL file is >>>>> closed. I feel inclined to change pg_receivexlog that way. Thought? >>>> >>>> >>>> >>>> -1. The OS should be smart enough to not thrash the cache by files that >>> >>> are >>>> >>>> written sequentially and never read. >>> >>> >> OS's buffer strategy is optimized for general situation. Do you forget OS >> hackers discussion last a half of year? >> >>> Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe >>> not, >>> so I was thinking that posix_fadvise is called when the server closes WAL >>> file. >> >> >> That's right. > > > Well, I'd like to hear someone from the field complaining that > pg_receivexlog is thrashing the cache and thus reducing the performance of > some other process. Or a least a synthetic test case that demonstrates that > happening. Yeah, I will test that by seeing the performance of PostgreSQL which is running in the same server as pg_receivexlog is running. We can just compare that performance with normal pg_receivexlog and that with the patched one (i.e., posix_fadvise is called). > > >> By the way, does pg_receivexlog process have fsync() in every WAL commit? > > > It fsync's each file after finishing to write it. Ie. each WAL file is > fsync'd once. > > >> If yes, I think that we need no or less fsync() option for the better >> performance. It is general in NOSQL storages. >> If no, we need fsync() option for more getting reliability and data >> integrarity. > > > Hmm. An fsync=off style option might make sense, although I doubt the one > fsync at end of file is causing a performance problem for anyone in > practice. Haven't heard any complaints, anyway. > > An option to fsync after every commit record might make sense if you use > pg_receivexlog with synchronous replication. Doing that would require > parsing the WAL, though, to see where the commit records are. But then > again, the fsync's wouldn't need to correspond to commit records. We could > fsync just before we go to sleep to wait for more WAL to be received. That's what Furuya-san proposed in last CommitFest. Regards, -- Fujii Masao
Hi > Well, I'd like to hear someone from the field complaining that > pg_receivexlog is thrashing the cache and thus reducing the performance of > some other process. Or a least a synthetic test case that demonstrates that > happening. It's not with pg_receivexlog but it's related. On a small box without replication server connected perfs were good enough but not so with a replication server connected, there was 1GB worth of WAL sitting in RAM vs next to nothing without slave! setup: 8GB RAM 2GB shared_buffers (smaller has other issues) checkpoint_segments 40 (smaller value trigger too much xlog checkpoint) checkpoints spread over 10 mn and write 30 to 50% of shared buffers. live data set fit in RAM. constant load. On startup (1 or 2/hour) applications were running requests on cold data which were now saturating IO. I'm not sure it's an OS bug as the WAL were 'hotter' than the cold data. A cron task every minute with vmtouch -e for evicting old WAL files from memory has solved the issue. Regards
On Tue, Sep 9, 2014 at 9:07 PM, didier <did447@gmail.com> wrote: > Hi > >> Well, I'd like to hear someone from the field complaining that >> pg_receivexlog is thrashing the cache and thus reducing the performance of >> some other process. Or a least a synthetic test case that demonstrates that >> happening. > It's not with pg_receivexlog but it's related. > > On a small box without replication server connected perfs were good > enough but not so with a replication server connected, there was 1GB > worth of WAL sitting in RAM vs next to nothing without slave! After WAL file is filled up and closed, it will not be re-read if wal_level is set to minimal (i.e., neither archiving nor replication is enabled). So, in this case, PostgreSQL advises the OS to release any cached pages of that WAL file. But not if archiving or replication is enabled, and then WAL file keeps being cached even after it's closed. Probably this is the cause of what you observed, I guess. Regards, -- Fujii Masao
On Tue, Sep 9, 2014 at 8:07 AM, didier <did447@gmail.com> wrote: >> Well, I'd like to hear someone from the field complaining that >> pg_receivexlog is thrashing the cache and thus reducing the performance of >> some other process. Or a least a synthetic test case that demonstrates that >> happening. > It's not with pg_receivexlog but it's related. > > On a small box without replication server connected perfs were good > enough but not so with a replication server connected, there was 1GB > worth of WAL sitting in RAM vs next to nothing without slave! > setup: > 8GB RAM > 2GB shared_buffers (smaller has other issues) > checkpoint_segments 40 (smaller value trigger too much xlog checkpoint) > checkpoints spread over 10 mn and write 30 to 50% of shared buffers. > live data set fit in RAM. > constant load. > > On startup (1 or 2/hour) applications were running requests on cold > data which were now saturating IO. > I'm not sure it's an OS bug as the WAL were 'hotter' than the cold data. > > A cron task every minute with vmtouch -e for evicting old WAL files > from memory has solved the issue. That seems like pretty good evidence that it might be worth doing something here. But I still think maybe it should be optional, because if the user plans to reread those files and, say, copy them somewhere else, then they won't want this behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company