Re: finding changed blocks using WAL scanning - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: finding changed blocks using WAL scanning |
Date | |
Msg-id | 20190422234445.s7mxt6xwfmumzlge@momjian.us Whole thread Raw |
In response to | Re: finding changed blocks using WAL scanning (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: finding changed blocks using WAL scanning
|
List | pgsql-hackers |
On Tue, Apr 23, 2019 at 01:21:27AM +0200, Tomas Vondra wrote: > On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: > > On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost <sfrost@snowman.net> wrote: > > > > Oh. Well, I already explained my algorithm for doing that upthread, > > > > which I believe would be quite cheap. > > > > > > > > 1. When you generate the .modblock files, stick all the block > > > > references into a buffer. qsort(). Dedup. Write out in sorted > > > > order. > > > > > > Having all of the block references in a sorted order does seem like it > > > would help, but would also make those potentially quite a bit larger > > > than necessary (I had some thoughts about making them smaller elsewhere > > > in this discussion). That might be worth it though. I suppose it might > > > also be possible to line up the bitmaps suggested elsewhere to do > > > essentially a BitmapOr of them to identify the blocks changed (while > > > effectively de-duping at the same time). > > > > I don't see why this would make them bigger than necessary. If you > > sort by relfilenode/fork/blocknumber and dedup, then references to > > nearby blocks will be adjacent in the file. You can then decide what > > format will represent that most efficiently on output. Whether or not > > a bitmap is better idea than a list of block numbers or something else > > depends on what percentage of blocks are modified and how clustered > > they are. > > > > Not sure I understand correctly - do you suggest to deduplicate and sort > the data before writing them into the .modblock files? Because that the > the sorting would make this information mostly useless for the recovery > prefetching use case I mentioned elsewhere. For that to work we need > information about both the LSN and block, in the LSN order. > > So if we want to allow that use case to leverage this infrastructure, we > need to write the .modfiles kinda "raw" and do this processing in some > later step. > > Now, maybe the incremental backup use case is so much more important the > right thing to do is ignore this other use case, and I'm OK with that - > as long as it's a conscious choice. I think the concern is that the more graunular the modblock files are (with less de-duping), the larger they will be. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
pgsql-hackers by date: