Re: Streaming base backups - Mailing list pgsql-hackers

From Garick Hamlin
Subject Re: Streaming base backups
Date
Msg-id 20110107152629.GA10074@isc.upenn.edu
Whole thread Raw
In response to Re: Streaming base backups  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Responses Re: Streaming base backups  (Garick Hamlin <ghamlin@isc.upenn.edu>)
Re: Streaming base backups  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
List pgsql-hackers
On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cédric Villemain wrote:
> 2011/1/5 Magnus Hagander <magnus@hagander.net>:
> > On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> >> Magnus Hagander <magnus@hagander.net> writes:
> >>> * Stefan mentiond it might be useful to put some
> >>> posix_fadvise(POSIX_FADV_DONTNEED)
> >>>   in the process that streams all the files out. Seems useful, as long as that
> >>>   doesn't kick them out of the cache *completely*, for other backends as well.
> >>>   Do we know if that is the case?
> >>
> >> Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
> >> not already in SHM?
> >
> > I think that's way more complex than we want to go here.
> >
> 
> DONTNEED will remove the block from OS buffer everytime.
> 
> It should not be that hard to implement a snapshot(it needs mincore())
> and to restore previous state. I don't know how basebackup is
> performed exactly...so perhaps I am wrong.
> 
> posix_fadvise support is already in postgresql core...we can start by
> just doing a snapshot of the files before starting, or at some point
> in the basebackup, it will need only 256kB per GB of data...

It is actually possible to be more scalable than the simple solution you
outline here (although that solution works pretty well).  

I've written a program that syncronizes the OS cache state using
mmap()/mincore() between two computers.  It haven't actually tested its
impact on performance yet, but I was surprised by how fast it actually runs
and how compact cache maps can be.

If one encodes the data so one remembers the number of zeros between 1s 
one, storage scale by the amount of memory in each size rather than the 
dataset size.  I actually played with doing that, then doing huffman 
encoding of that.  I get around 1.2-1.3 bits / page of _physical memory_ 
on my tests.

I don't have my notes handy, but here are some numbers from memory...

The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page
of physical memory in the machine.  The latter limit get better, however,
since there are < 1024 symbols possible for the encoder (since in this 
case symbols are spans of zeros that need to fit in a file that is 1 GB in
size).  So is actually real worst case is much closer to 1 bit per page of 
the dataset or ~10 bits per page of physical memory.  The real performance
I see with huffman is more like 1.3 bits per page of physical memory.  All the 
encoding decoding is actually very fast.  zlib would actually compress even 
better than huffman, but huffman encoder/decoder is actually pretty good and
very straightforward code.

I would like to integrate something like this into PG or perhaps even into
something like rsync, but its was written as proof of concept and I haven't 
had time work on it recently.

Garick

> -- 
> Cédric Villemain               2ndQuadrant
> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: New system view pg_stat_replication displays activity of wal sen
Next
From: Garick Hamlin
Date:
Subject: Re: Streaming base backups