Re: finding changed blocks using WAL scanning - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: finding changed blocks using WAL scanning
Date
Msg-id 20190415203114.pb4e2vgbtbhopcdw@momjian.us
Whole thread Raw
In response to Re: finding changed blocks using WAL scanning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: finding changed blocks using WAL scanning
List pgsql-hackers
On Wed, Apr 10, 2019 at 08:11:11PM -0400, Robert Haas wrote:
> On Wed, Apr 10, 2019 at 5:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > There is one thing that does worry me about the file-per-LSN-range
> > approach, and that is memory consumption when trying to consume the
> > information.  Suppose you have a really high velocity system.  I don't
> > know exactly what the busiest systems around are doing in terms of
> > data churn these days, but let's say just for kicks that we are
> > dirtying 100GB/hour.  That means, roughly 12.5 million block
> > references per hour.  If each block reference takes 12 bytes, that's
> > maybe 150MB/hour in block reference files.  If you run a daily
> > incremental backup, you've got to load all the block references for
> > the last 24 hours and deduplicate them, which means you're going to
> > need about 3.6GB of memory.  If you run a weekly incremental backup,
> > you're going to need about 25GB of memory.  That is not ideal.  One
> > can keep the memory consumption to a more reasonable level by using
> > temporary files.  For instance, say you realize you're going to need
> > 25GB of memory to store all the block references you have, but you
> > only have 1GB of memory that you're allowed to use.  Well, just
> > hash-partition the data 32 ways by dboid/tsoid/relfilenode/segno,
> > writing each batch to a separate temporary file, and then process each
> > of those 32 files separately.  That does add some additional I/O, but
> > it's not crazily complicated and doesn't seem too terrible, at least
> > to me.  Still, it's something not to like.
> 
> Oh, I'm being dumb.  We should just have the process that writes out
> these files sort the records first.  Then when we read them back in to
> use them, we can just do a merge pass like MergeAppend would do.  Then
> you never need very much memory at all.

Can I throw out a simple idea?  What if, when we finish writing a WAL
file, we create a new file 000000010000000000000001.modblock which
lists all the heap/index files and block numbers modified in that WAL
file?  How much does that help with the list I posted earlier?

    I think there is some interesting complexity brought up in this thread.
    Which options are going to minimize storage I/O, network I/O, have only
    background overhead, allow parallel operation, integrate with
    pg_basebackup.  Eventually we will need to evaluate the incremental
    backup options against these criteria.

I am thinking tools could retain modblock files along with WAL, could
pull full-page-writes from WAL, or from PGDATA.  It avoids the need to
scan 16MB WAL files, and the WAL files and modblock files could be
expired independently.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Zedstore - compressed in-core columnar storage
Next
From: Tom Lane
Date:
Subject: Re: COLLATE: Hash partition vs UPDATE