Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CA+TgmoZoCWXoZ8uu8QimTwM3e_b7wCz=EwCvPKpLng=14rR-8g@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Eager page freeze criteria clarification
Re: Eager page freeze criteria clarification |
List | pgsql-hackers |
On Mon, Aug 28, 2023 at 10:00 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > For the second goal, I've relied on past data to predict future > behavior, so I tried several criteria to estimate the likelihood that a > page will not be imminently modified. What was most effective was > Andres' suggestion of comparing the page LSN to the insert LSN at the > end of the last vacuum of that table; this approximates whether the page > has been recently modified, which is a decent proxy for whether it'll be > modified in the future. To do this, we need to save that insert LSN > somewhere. In the attached WIP patch, I saved it in the table stats, for > now -- knowing that those are not crash-safe. I wonder what the real plan here is for where to store this. It's not obvious that we need this to be crash-safe; it's after all only for use by a heuristic, and there's no actual breakage if the heuristic goes wrong. At the same time, it doesn't exactly feel like a statistic. Then there's the question of whether it's the right metric. My first reaction is to think that it sounds pretty good. One thing I really like about it is that if the table is being vacuumed frequently, then we freeze less aggressively, and if the table is being vacuumed infrequently, then we freeze more aggressively. That seems like a very desirable property. It also seems broadly good that this metric doesn't really care about reads. If there are a lot of reads on the system, or no reads at all, it doesn't really change the chances that a certain page is going to be written again soon, and since reads don't change the insert LSN, here again it seems to do the right thing. I'm a little less clear about whether it's good that it doesn't really depend on wall-clock time. Certainly, that's desirable from the point of view of not wanting to have to measure wall-clock time in places where we otherwise wouldn't have to, which tends to end up being expensive. However, if I were making all of my freezing decisions manually, I might be more freeze-positive on a low-velocity system where writes are more stretched out across time than on a high-velocity system where we're blasting through the LSN space at a higher rate. But maybe that's not a very important consideration, and I don't know what we'd do about it anyway. > Page Freezes/Page Frozen (less is better) > > | | Master | (1) | (2) | (3) | (4) | (5) | > |---+--------+---------+---------+---------+---------+---------| > | A | 28.50 | 3.89 | 1.08 | 1.15 | 1.10 | 1.10 | > | B | 1.00 | 1.06 | 1.65 | 1.03 | 1.59 | 1.00 | > | C | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | > | D | 2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 | > | E | 7.90 | 3.21 | 2.73 | 2.70 | 2.69 | 2.43 | > | F | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | > | G | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | > | H | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | > | I | N/A | 42.00 | 42.00 | N/A | 41.00 | N/A | Hmm. I would say that the interesting rows here are A, D, and I, with rows C and E deserving honorable mention. In row A, master is bad. In row D, your algorithms are all bad, really bad. I don't quite understand how it can be that bad, actually. Row I looks bad for algorithms 1, 2, and 4: they freeze pages because it looks cheap, but the work doesn't really pay off. > % Frozen at end of run > > | | Master | (1) | (2) | (3) | (4) | (5) | > |---+--------+-----+-----+-----+------+-----+ > | A | 0 | 1 | 99 | 0 | 81 | 0 | > | B | 71 | 96 | 99 | 3 | 98 | 2 | > | C | 0 | 9 | 100 | 6 | 92 | 5 | > | D | 0 | 1 | 1 | 1 | 1 | 1 | > | E | 0 | 63 | 100 | 68 | 100 | 67 | > | F | 0 | 5 | 14 | 6 | 14 | 5 | > | G | 0 | 100 | 100 | 92 | 100 | 67 | > | H | 0 | 11 | 100 | 9 | 86 | 5 | > | I | 0 | 100 | 100 | 0 | 100 | 0 | So all of the algorithms here, but especially 1, 2, and 4, freeze a lot more often than master. If I understand correctly, we'd like to see small numbers for B, D, and I, and large numbers for the other workloads. None of the algorithms seem to achieve that. (3) and (5) seem like they always behave as well or better than master, but they produce small numbers for A, C, F, and H. (1), (2), and (4) regress B and I relative to master but do better than (3) and (5) on A, C, and the latter two also on E. B is such an important benchmarking workload that I'd be loathe to regress it, so if I had to pick on the basis of this data, my vote would be (3) or (5), provided whatever is happening with (D) in the previous metric is not as bad as it looks. What's your reason for preferring (4) and (5) over (2) and (3)? I'm not clear that these numbers give us much of an idea whether 10% or 33% or something else is better in general. To be honest, having now spent more time looking at the benchmark results, I feel slightly less good about using the LSN as a metric here. These results, to me, clearly suggest that some recency metric is needed. But they don't seem to make a compelling case for this particular one. Neither do they make a case that this is the wrong one. They just don't seem that revealing either way. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: