On Mon, Aug 28, 2023 at 3:09 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I've long emphasized the importance of designs that just try to avoid
> disaster. With that in mind, I wonder: have you thought about
> conditioning page freezing on whether or not there are already some
> frozen tuples on the page? You could perhaps give some weight to
> whether or not the page already has at least one or two preexisting
> frozen tuples when deciding on whether to freeze it once again now.
> You'd be more eager about freezing pages that have no frozen tuples
> whatsoever, compared to what you'd do with an otherwise equivalent
> page that has no unfrozen tuples.
I'm sure this could be implemented, but it's unclear to me why you
would expect it to perform well. Freezing a page that has no frozen
tuples yet isn't cheaper than freezing one that does, so for this idea
to be a win, the presence of frozen tuples on the page would have to
be a signal that the page is likely to be modified again in the near
future. In general, I don't see any reason why we should expect that
to be the case. One could easily construct a workload where it is the
case -- for instance, set up one table T1 where 90% of the tuples are
repeatedly updated and the other 10% are never touched, and another
table T2 that is insert-only. Once frozen, the never-updated tuples in
T1 become sentinels that we can use to know that the table isn't
insert-only. But I don't think that's very interesting: you can
construct a test case like this for any proposed criterion, just by
structuring the test workload so that whatever criterion is being
tested is a perfect predictor of whether the page will be modified
soon.
What really matters here is finding a criterion that is likely to
perform well in general, on a test case not known to us beforehand.
This isn't an entirely feasible goal, because just as you can
construct a test case where any given criterion performs well, so you
can also construct one where any given criterion performs poorly. But
I think a rule that has a clear theory of operation must be preferable
to one that doesn't. The theory that Melanie and Andres are advancing
is that a page that has been modified recently (in insert-LSN-time) is
more likely to be modified again soon than one that has not i.e. the
near future will be like the recent past. I'm not sure what the theory
behind the rule you propose here might be; if you articulated it
somewhere in your email, I seem to have missed it.
--
Robert Haas
EDB: http://www.enterprisedb.com