Re: Streaming base backups - Mailing list pgsql-hackers

From Garick Hamlin
Subject Re: Streaming base backups
Date
Msg-id 20110107154746.GB10074@isc.upenn.edu
Whole thread Raw
In response to Re: Streaming base backups  (Garick Hamlin <ghamlin@isc.upenn.edu>)
List pgsql-hackers
On Fri, Jan 07, 2011 at 10:26:29AM -0500, Garick Hamlin wrote:
> On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cédric Villemain wrote:
> > 2011/1/5 Magnus Hagander <magnus@hagander.net>:
> > > On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> > >> Magnus Hagander <magnus@hagander.net> writes:
> > >>> * Stefan mentiond it might be useful to put some
> > >>> posix_fadvise(POSIX_FADV_DONTNEED)
> > >>>   in the process that streams all the files out. Seems useful, as long as that
> > >>>   doesn't kick them out of the cache *completely*, for other backends as well.
> > >>>   Do we know if that is the case?
> > >>
> > >> Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
> > >> not already in SHM?
> > >
> > > I think that's way more complex than we want to go here.
> > >
> > 
> > DONTNEED will remove the block from OS buffer everytime.
> > 
> > It should not be that hard to implement a snapshot(it needs mincore())
> > and to restore previous state. I don't know how basebackup is
> > performed exactly...so perhaps I am wrong.
> > 
> > posix_fadvise support is already in postgresql core...we can start by
> > just doing a snapshot of the files before starting, or at some point
> > in the basebackup, it will need only 256kB per GB of data...
> 
> It is actually possible to be more scalable than the simple solution you
> outline here (although that solution works pretty well).  
> 
> I've written a program that syncronizes the OS cache state using
> mmap()/mincore() between two computers.  It haven't actually tested its
> impact on performance yet, but I was surprised by how fast it actually runs
> and how compact cache maps can be.
> 
> If one encodes the data so one remembers the number of zeros between 1s 
> one, storage scale by the amount of memory in each size rather than the 

Sorry for the typos, that should read:

the storage scales by the number of pages resident in memory rather than the 
total dataset size.

> dataset size.  I actually played with doing that, then doing huffman 
> encoding of that.  I get around 1.2-1.3 bits / page of _physical memory_ 
> on my tests.
> 
> I don't have my notes handy, but here are some numbers from memory...
> 
> The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page
> of physical memory in the machine.  The latter limit get better, however,
> since there are < 1024 symbols possible for the encoder (since in this 
> case symbols are spans of zeros that need to fit in a file that is 1 GB in
> size).  So is actually real worst case is much closer to 1 bit per page of 
> the dataset or ~10 bits per page of physical memory.  The real performance
> I see with huffman is more like 1.3 bits per page of physical memory.  All the 
> encoding decoding is actually very fast.  zlib would actually compress even 
> better than huffman, but huffman encoder/decoder is actually pretty good and
> very straightforward code.
> 
> I would like to integrate something like this into PG or perhaps even into
> something like rsync, but its was written as proof of concept and I haven't 
> had time work on it recently.
> 
> Garick
> 
> > -- 
> > Cédric Villemain               2ndQuadrant
> > http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
> > 
> > -- 
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


pgsql-hackers by date:

Previous
From: Garick Hamlin
Date:
Subject: Re: Streaming base backups
Next
From: Stephen Frost
Date:
Subject: Re: DISCARD ALL ; stored procedures