Re: finding changed blocks using WAL scanning - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: finding changed blocks using WAL scanning |
Date | |
Msg-id | 20190423001329.ibbvdqg2f7totn35@development Whole thread Raw |
In response to | Re: finding changed blocks using WAL scanning (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: finding changed blocks using WAL scanning
|
List | pgsql-hackers |
On Mon, Apr 22, 2019 at 07:44:45PM -0400, Bruce Momjian wrote: >On Tue, Apr 23, 2019 at 01:21:27AM +0200, Tomas Vondra wrote: >> On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: >> > On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost <sfrost@snowman.net> wrote: >> > > > Oh. Well, I already explained my algorithm for doing that upthread, >> > > > which I believe would be quite cheap. >> > > > >> > > > 1. When you generate the .modblock files, stick all the block >> > > > references into a buffer. qsort(). Dedup. Write out in sorted >> > > > order. >> > > >> > > Having all of the block references in a sorted order does seem like it >> > > would help, but would also make those potentially quite a bit larger >> > > than necessary (I had some thoughts about making them smaller elsewhere >> > > in this discussion). That might be worth it though. I suppose it might >> > > also be possible to line up the bitmaps suggested elsewhere to do >> > > essentially a BitmapOr of them to identify the blocks changed (while >> > > effectively de-duping at the same time). >> > >> > I don't see why this would make them bigger than necessary. If you >> > sort by relfilenode/fork/blocknumber and dedup, then references to >> > nearby blocks will be adjacent in the file. You can then decide what >> > format will represent that most efficiently on output. Whether or not >> > a bitmap is better idea than a list of block numbers or something else >> > depends on what percentage of blocks are modified and how clustered >> > they are. >> > >> >> Not sure I understand correctly - do you suggest to deduplicate and sort >> the data before writing them into the .modblock files? Because that the >> the sorting would make this information mostly useless for the recovery >> prefetching use case I mentioned elsewhere. For that to work we need >> information about both the LSN and block, in the LSN order. >> >> So if we want to allow that use case to leverage this infrastructure, we >> need to write the .modfiles kinda "raw" and do this processing in some >> later step. >> >> Now, maybe the incremental backup use case is so much more important the >> right thing to do is ignore this other use case, and I'm OK with that - >> as long as it's a conscious choice. > >I think the concern is that the more graunular the modblock files are >(with less de-duping), the larger they will be. > Well, I understand that concern - all I'm saying is that makes this useless for some use cases (that may or may not be important enough). However, it seems to me those files are guaranteed to be much smaller than the WAL segments, so I don't see how size alone could be an issue as long as we do the merging and deduplication when recycling the segments. At that point the standby can't request the WAL from the primary anyway, so it won't need the raw .mdblock files either. And we probably only care about the size of the data we need to keep for a long time. And that we can deduplicate/reorder any way we want. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: