Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Eager page freeze criteria clarification
Date
Msg-id CAAKRu_YfyOUK8Ne9=6CrqiNPNTfsP76-Gmcv-0p=KQiN1nM14A@mail.gmail.com
Whole thread Raw
In response to Re: Eager page freeze criteria clarification  (Joe Conway <mail@joeconway.com>)
Responses Re: Eager page freeze criteria clarification
Re: Eager page freeze criteria clarification
List pgsql-hackers
On Sat, Dec 9, 2023 at 9:24 AM Joe Conway <mail@joeconway.com> wrote:
>
> On 12/8/23 23:11, Melanie Plageman wrote:
> >
> > I'd be delighted to receive any feedback, ideas, questions, or review.
>
>
> This is well thought out, well described, and a fantastic improvement in
> my view -- well done!

Thanks, Joe! That means a lot! I see work done by hackers on the
mailing list a lot that makes me think, "hey, that's
cool/clever/awesome!" but I don't give that feedback. I appreciate you
doing that!

> I do think we will need to consider distributions other than normal, but
> I don't know offhand what they will be.

Agreed. I plan to test with another distribution. Though, the exercise
of determining which ones are useful is probably more challenging.
I imagine we will have to choose one distribution (as opposed to
supporting different distributions and choosing based on data access
patterns for a table). Though, even with a normal distribution, I
think it should be an improvement.

> However, even if we assume a more-or-less normal distribution, we should
> consider using subgroups in a way similar to Statistical Process
> Control[1]. The reasoning is explained in this quote:
>
>      The Math Behind Subgroup Size
>
>      The Central Limit Theorem (CLT) plays a pivotal role here. According
>      to CLT, as the subgroup size (n) increases, the distribution of the
>      sample means will approximate a normal distribution, regardless of
>      the shape of the population distribution. Therefore, as your
>      subgroup size increases, your control chart limits will narrow,
>      making the chart more sensitive to special cause variation and more
>      prone to false alarms.

I haven't read anything about statistical process control until you
mentioned this. I read the link you sent and also googled around a
bit. I was under the impression that the more samples we have, the
better. But, it seems like this may not be the assumption in
statistical process control?

It may help us to get more specific. I'm not sure what the
relationship between "unsets" in my code and subgroup members would
be.  The article you linked suggests that each subgroup should be of
size 5 or smaller. Translating that to my code, were you imagining
subgroups of "unsets" (each time we modify a page that was previously
all-visible)?

Thanks for the feedback!

- Melanie



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ci: Build standalone INSTALL file
Next
From: Andres Freund
Date:
Subject: Re: ci: Build standalone INSTALL file