On 10 October 2017 at 23:50, Stephen Frost <sfrost@snowman.net> wrote:
> Yeah, it sounds interesting, but I was just chatting w/ David about it
> and we were thinking about how checkpoints are really rather often done,
> so you end up with quite a few of these lists being out there.
>
> Now, if the lists were always kept in a sorted fashion, then perhaps we
> would be able to essentially merge-sort them all back together and
> de-dup that way but even then, you're talking about an awful lot if
> you're looking at daily incrementals- that's 288 standard 5-minute
> checkpoints, each with some 128k pages changed, assuming max_wal_size of
> 1GB, and I think we can all agree that the default max_wal_size is for
> rather small systems. That ends up being something around 2MB per
> checkpoint to store the pages in or half a gig per day just to keep
> track of the pages which changed in each checkpoint across that day.
I was actually imagining a bitmap, probably for each 1GB piece of each
table. That's probably how you would maintain this data in memory
anyways. After compression it should be fairly small. You'll probably
be modifying the same blocks frequently or doing bulk loads which will
touch a consecutive range of blocks.
But that's still about the same amount of data. But probably you don't
want to actually keep every checkpoint anyways. The nice thing about
the changelists is that they will tend to reach a maximum size
regardless of how long a time range they span so if you keep one
changelist for every 10 checkpoints or every 100 checkpoints you could
reduce the storage needs and only lose the time precision.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers