Re: finding changed blocks using WAL scanning - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: finding changed blocks using WAL scanning |
Date | |
Msg-id | 20190415203114.pb4e2vgbtbhopcdw@momjian.us Whole thread Raw |
In response to | Re: finding changed blocks using WAL scanning (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: finding changed blocks using WAL scanning
|
List | pgsql-hackers |
On Wed, Apr 10, 2019 at 08:11:11PM -0400, Robert Haas wrote: > On Wed, Apr 10, 2019 at 5:49 PM Robert Haas <robertmhaas@gmail.com> wrote: > > There is one thing that does worry me about the file-per-LSN-range > > approach, and that is memory consumption when trying to consume the > > information. Suppose you have a really high velocity system. I don't > > know exactly what the busiest systems around are doing in terms of > > data churn these days, but let's say just for kicks that we are > > dirtying 100GB/hour. That means, roughly 12.5 million block > > references per hour. If each block reference takes 12 bytes, that's > > maybe 150MB/hour in block reference files. If you run a daily > > incremental backup, you've got to load all the block references for > > the last 24 hours and deduplicate them, which means you're going to > > need about 3.6GB of memory. If you run a weekly incremental backup, > > you're going to need about 25GB of memory. That is not ideal. One > > can keep the memory consumption to a more reasonable level by using > > temporary files. For instance, say you realize you're going to need > > 25GB of memory to store all the block references you have, but you > > only have 1GB of memory that you're allowed to use. Well, just > > hash-partition the data 32 ways by dboid/tsoid/relfilenode/segno, > > writing each batch to a separate temporary file, and then process each > > of those 32 files separately. That does add some additional I/O, but > > it's not crazily complicated and doesn't seem too terrible, at least > > to me. Still, it's something not to like. > > Oh, I'm being dumb. We should just have the process that writes out > these files sort the records first. Then when we read them back in to > use them, we can just do a merge pass like MergeAppend would do. Then > you never need very much memory at all. Can I throw out a simple idea? What if, when we finish writing a WAL file, we create a new file 000000010000000000000001.modblock which lists all the heap/index files and block numbers modified in that WAL file? How much does that help with the list I posted earlier? I think there is some interesting complexity brought up in this thread. Which options are going to minimize storage I/O, network I/O, have only background overhead, allow parallel operation, integrate with pg_basebackup. Eventually we will need to evaluate the incremental backup options against these criteria. I am thinking tools could retain modblock files along with WAL, could pull full-page-writes from WAL, or from PGDATA. It avoids the need to scan 16MB WAL files, and the WAL files and modblock files could be expired independently. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
pgsql-hackers by date: