Re: Streaming base backups - Mailing list pgsql-hackers

From Cédric Villemain
Subject Re: Streaming base backups
Date
Msg-id AANLkTi=xtpjd2_Wd4C+YBK3vJxPqYFAgdy3f_v1ygybN@mail.gmail.com
Whole thread Raw
In response to Re: Streaming base backups  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-hackers
2011/1/10 Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>:
> On 01/10/2011 08:13 PM, Cédric Villemain wrote:
>>
>> 2011/1/10 Magnus Hagander<magnus@hagander.net>:
>>>
>>> On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
>>> <cedric.villemain.debian@gmail.com>  wrote:
>>>>
>>>> 2011/1/7 Magnus Hagander<magnus@hagander.net>:
>>>>>
>>>>> On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
>>>>> <cedric.villemain.debian@gmail.com>  wrote:
>>>>>>
>>>>>> 2011/1/5 Magnus Hagander<magnus@hagander.net>:
>>>>>>>
>>>>>>> On Wed, Jan 5, 2011 at 22:58, Dimitri
>>>>>>> Fontaine<dimitri@2ndquadrant.fr>  wrote:
>>>>>>>>
>>>>>>>> Magnus Hagander<magnus@hagander.net>  writes:
>>>>>>>>>
>>>>>>>>> * Stefan mentiond it might be useful to put some
>>>>>>>>> posix_fadvise(POSIX_FADV_DONTNEED)
>>>>>>>>>   in the process that streams all the files out. Seems useful, as
>>>>>>>>> long as that
>>>>>>>>>   doesn't kick them out of the cache *completely*, for other
>>>>>>>>> backends as well.
>>>>>>>>>   Do we know if that is the case?
>>>>>>>>
>>>>>>>> Maybe have a look at pgfincore to only tag DONTNEED for blocks that
>>>>>>>> are
>>>>>>>> not already in SHM?
>>>>>>>
>>>>>>> I think that's way more complex than we want to go here.
>>>>>>>
>>>>>>
>>>>>> DONTNEED will remove the block from OS buffer everytime.
>>>>>
>>>>> Then we definitely don't want to use it - because some other backend
>>>>> might well want the file. Better leave it up to the standard logic in
>>>>> the kernel.
>>>>
>>>> Looking at the patch, it is (very) easy to add the support for that in
>>>> basebackup.c
>>>> That supposed allowing mincore(), so mmap(), and so probably switch
>>>> the fopen() to an open() (or add an open() just for mmap
>>>> requirement...)
>>>>
>>>> Let's go ?
>>>
>>> Per above, I still don't think we *should* do this. We don't want to
>>> kick things out of the cache underneath other backends, and since we
>>
>> we are dropping stuff underneath other backends  anyway but I
>> understand your point.
>>
>>> can't control that. Either way, it shouldn't happen in the beginning,
>>> and if it does, should be backed with proper benchmarks.
>>
>> I agree.
>
> well I want to point out that the link I provided upthread actually provides
> a (linux centric) way to do get the property of interest for this:

yes, it is exactly what we are talking about here.
mincore and posix_fadvise.

freeBSD should allow that later, at least it is in the todo list
Windows may allow that too with different API.

>
> * if the datablocks are in the OS buffercache just leave them alone, if the
> are NOT tell the OS that "this current user" is not interested in having it
> there

my experience is that posix_fadvise on a specific block behave more
brutaly than flaging a whole file. In the later case it may not do
what you want if it estimates it is not welcome (because of other IO
request)

What Magnus point out is that other backends execute queries and
request blocks (and load them in shared buffers of postgresql) and it
is *hard* to be sure we don't remove blocks just loaded by another
backend ( the worst case beeing flushing prefeteched blocks not yet in
shared buffers, cf effective_io_concurrency )

>
> I would like to see something like that implemented in the backend sometime
> and maybe even as a guc of some sort, that way we actually could use that
> for say a pg_dump run as well, I have seen the responsetimes of big boxes
> tank not because of the CPU and lock-load pg_dump imposes but because of the
> way that it can cause the OS-buffercache to get spoiled with
> not-really-important data.

Glad to here that, pgfincore is also a POC about those topics.
The best solution is to mmap in postgres, but it is not posible, so we
have to do snapshot of objects and restore them afterwards (again *it
is* what tobias do with is rsync). Side note : because of readahead,
inspect block by block while you read the file provide bad results (or
you need to fadvise POSIX_FADV_RANDOM to remove readahead behavior,
which is not good at all).

>
> anyway I agree that the (positive and/or negative) effect of something like
> that needs to be measured but this effect is not too easy to see in very
> simple setups...

yes. and with pgbase_backup, copying 1GB over the network is longer
than  2 seconds, we will probably need to have a specific strategy.


--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: pl/python custom exceptions for SPI
Next
From: Josh Berkus
Date:
Subject: Re: Bug in pg_describe_object