On Wed, Mar 30, 2022 at 2:20 AM Anton A. Melnikov <aamelnikov@inbox.ru> wrote:
> > Can the test failures be encountered without such an elaborate setup? If not,
> > then I don't really see why we need to do anything here?
>
> There was a real bug report from our test department. They do long time
> repetitive tests and sometimes met this failure.
> So i suppose there is a non-zero probability that such error can occur
> in the one-shot test as well.
> The sequence given in the first letter helps to catch this failure quickly.
I don't think that the idea of "extra" WAL records is very principled.
It's pretty vague what "extra" means, and your definition seems to be
basically "whatever would be needed to make this test case pass." I
think the problem is basically with the test cases's idea that # of
WAL records and # of table rows ought to be equal. I think that's just
false. In general, we'd also have to worry about index insertions,
which would provoke variable numbers of WAL records depending on
whether they cause a page split. And we'd have to worry about TOAST
table insertions, which could produce different numbers of records
depending on the size of the data, the configured block size and TOAST
threshold, and whether the TOAST table index incurs a page split. So
even if we added a mechanism like what you propose here, we would only
be fixing this particular test case, not creating infrastructure of
any general utility.
If it's true that this test case sometimes randomly fails, then we
ought to fix that somehow, maybe by just removing this particular
check from the test case, or changing it to >=, or something like
that. But I don't think adding a new counter is the right idea.
--
Robert Haas
EDB: http://www.enterprisedb.com