Re: Streaming base backups - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: Streaming base backups
Date
Msg-id 4D2B709E.4000507@kaltenbrunner.cc
Whole thread Raw
In response to Re: Streaming base backups  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Responses Re: Streaming base backups  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
List pgsql-hackers
On 01/10/2011 08:13 PM, Cédric Villemain wrote:
> 2011/1/10 Magnus Hagander<magnus@hagander.net>:
>> On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
>> <cedric.villemain.debian@gmail.com>  wrote:
>>> 2011/1/7 Magnus Hagander<magnus@hagander.net>:
>>>> On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
>>>> <cedric.villemain.debian@gmail.com>  wrote:
>>>>> 2011/1/5 Magnus Hagander<magnus@hagander.net>:
>>>>>> On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine<dimitri@2ndquadrant.fr>  wrote:
>>>>>>> Magnus Hagander<magnus@hagander.net>  writes:
>>>>>>>> * Stefan mentiond it might be useful to put some
>>>>>>>> posix_fadvise(POSIX_FADV_DONTNEED)
>>>>>>>>    in the process that streams all the files out. Seems useful, as long as that
>>>>>>>>    doesn't kick them out of the cache *completely*, for other backends as well.
>>>>>>>>    Do we know if that is the case?
>>>>>>>
>>>>>>> Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
>>>>>>> not already in SHM?
>>>>>>
>>>>>> I think that's way more complex than we want to go here.
>>>>>>
>>>>>
>>>>> DONTNEED will remove the block from OS buffer everytime.
>>>>
>>>> Then we definitely don't want to use it - because some other backend
>>>> might well want the file. Better leave it up to the standard logic in
>>>> the kernel.
>>>
>>> Looking at the patch, it is (very) easy to add the support for that in
>>> basebackup.c
>>> That supposed allowing mincore(), so mmap(), and so probably switch
>>> the fopen() to an open() (or add an open() just for mmap
>>> requirement...)
>>>
>>> Let's go ?
>>
>> Per above, I still don't think we *should* do this. We don't want to
>> kick things out of the cache underneath other backends, and since we
>
> we are dropping stuff underneath other backends  anyway but I
> understand your point.
>
>> can't control that. Either way, it shouldn't happen in the beginning,
>> and if it does, should be backed with proper benchmarks.
>
> I agree.

well I want to point out that the link I provided upthread actually 
provides a (linux centric) way to do get the property of interest for this:

* if the datablocks are in the OS buffercache just leave them alone, if 
the are NOT tell the OS that "this current user" is not interested in 
having it there

I would like to see something like that implemented in the backend 
sometime and maybe even as a guc of some sort, that way we actually 
could use that for say a pg_dump run as well, I have seen the 
responsetimes of big boxes tank not because of the CPU and lock-load 
pg_dump imposes but because of the way that it can cause the 
OS-buffercache to get spoiled with not-really-important data.



anyway I agree that the (positive and/or negative) effect of something 
like that needs to be measured but this effect is not too easy to see in 
very simple setups...


Stefan


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: walsender parser patch
Next
From: "Kevin Grittner"
Date:
Subject: Re: Compatibility GUC for serializable