Re: Sync scan & regression tests - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Sync scan & regression tests |
Date | |
Msg-id | 3a69be6b-ecda-7cc0-2aef-f28de49693d3@iki.fi Whole thread Raw |
In response to | Re: Sync scan & regression tests (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Sync scan & regression tests
Re: Sync scan & regression tests |
List | pgsql-hackers |
On 05/09/2023 06:16, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> With shared_buffers='20MB', the tests passed. I'm going to change it >> back to 10MB now, so that we continue to cover that case. > > So chipmunk is getting through the core tests now, but instead it > is failing in contrib/pg_visibility [1]: > > diff -U3 /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out > --- /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out 2022-10-08 19:00:15.905074105+0300 > +++ /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out 2023-09-02 00:25:51.814148116+0300 > @@ -218,7 +218,8 @@ > 0 | t | t > 1 | t | t > 2 | t | t > -(3 rows) > + 3 | f | f > +(4 rows) > > select * from pg_check_frozen('copyfreeze'); > t_ctid > > I find this easily reproducible by setting shared_buffers=10MB. > But I'm confused about why, because the affected test case > dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk > passed many times after that. Might be worth bisecting in > the interval where chipmunk wasn't reporting? I bisected it to this: commit 82a4edabd272f70d044faec8cf7fd1eab92d9991 (HEAD) Author: Andres Freund <andres@anarazel.de> Date: Mon Aug 14 09:54:03 2023 -0700 hio: Take number of prior relation extensions into account The new relation extension logic, introduced in 00d1e02be24, could lead to slowdowns in some scenarios. E.g., when loading narrow rows into a table using COPY, the caller of RelationGetBufferForTuple() will only request a small number of pages. Without concurrency, we just extended using pwritev() in that case. However, if there is *some* concurrency, we switched between extending by a small number of pages and a larger number of pages, depending on the number of waiters for the relation extension logic. However, some filesystems, XFS in particular, do not perform well when switching between extending files using fallocate() and pwritev(). To avoid that issue, remember the number of prior relation extensions in BulkInsertState and extend more aggressively if there were prior relation extensions. That not just avoids the aforementioned slowdown, but also leads to noticeable performance gains in other situations, primarily due to extending more aggressively when there is no concurrency. I should have done it this way from the get go. Reported-by: Masahiko Sawada <sawada.mshk@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com Backpatch: 16-, where the new relation extension code was added Before this patch, the test table was 3 pages long, now it is 4 pages with a small shared_buffers setting. In this test, the relation needs to be at least 3 pages long to hold all the COPYed rows. With a larger shared_buffers, the table is extended to three pages in a single call to heap_multi_insert(). With shared_buffers='10 MB', the table is extended twice, because LimitAdditionalPins() restricts how many pages are extended in one go to two pages. With the logic that that commit added, we first extend the table with 2 pages, then with 2 pages again. I think the behavior is fine. The reasons given in the commit message make sense. But it would be nice to silence the test failure. Some alternatives: a) Add an alternative expected output file b) Change the pg_visibilitymap query so that it passes even if the table has four pages. "select * from pg_visibility_map('copyfreeze') where blkno <= 3"; c) Change the extension logic so that we don't extend so much when the table is small. The efficiency of bulk extension doesn't matter when the table is tiny, so arguably we should rather try to minimize the table size. If you have millions of tiny tables, allocating one extra block on each adds up. d) Copy fewer rows to the table in the test. If we copy only 6 rows, for example, the table will have only two pages, regardless of shared_buffers. I'm leaning towards d). The whole test is a little fragile, it will also fail with a non-default block size, for example. But c) seems like a simple fix and wouldn't look too out of place in the test. -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: