Thread: Sync scan & regression tests

Sync scan & regression tests

From

Konstantin Knizhnik

Date:

06 August 2023, 19:20:57

Hi hackers,

Is it is ok, that regression tests do not pass with small value of 
shared buffers (for example 1Mb)?

Two tests are failed because of sync scan - this tests cluster.sql and 
portals.sql perform seqscan without explicit order by and expect that 
data will be returned in particular order. But because of sync scan it 
doesn't happen. Small shared buffers are needed to satisfy seqscan 
criteria in heapam.c: `scan->rs_nblocks > NBuffers / 4` for tenk1 table.

More general question - is it really good that in situation when there 
is actually no concurrent queries, seqscan is started not from the first 
page?

Re: Sync scan & regression tests

From

Tom Lane

Date:

06 August 2023, 22:30:49

Konstantin Knizhnik <knizhnik@garret.ru> writes:
> Is it is ok, that regression tests do not pass with small value of 
> shared buffers (for example 1Mb)?

There are quite a few GUC settings with which you can break the
regression tests.  I'm not especially bothered by this one.

> More general question - is it really good that in situation when there 
> is actually no concurrent queries, seqscan is started not from the first 
> page?

You're going to have race-condition issues if you try to make that
(not) happen, I should think.

            regards, tom lane

Re: Sync scan & regression tests

From

Thomas Munro

Date:

06 August 2023, 23:29:58

On Mon, Aug 7, 2023 at 7:21 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> Two tests are failed because of sync scan - this tests cluster.sql and
> portals.sql perform seqscan without explicit order by and expect that
> data will be returned in particular order. But because of sync scan it
> doesn't happen. Small shared buffers are needed to satisfy seqscan
> criteria in heapam.c: `scan->rs_nblocks > NBuffers / 4` for tenk1 table.

I wondered the same thing while working on the tests in commit
8ab0ebb9a84, which explicitly care about physical order, so they *say
so* with ORDER BY ctid.  But the problem seems quite widespread, so I
didn't volunteer to try to do something like that everywhere, when Tom
committed cbf4177f for 027_stream_regress.pl.

FWIW here's another discussion of that cluster test, in which I was
still figuring out some surprising ways this feature can introduce
non-determinism even without concurrent access to the same table.

https://www.postgresql.org/message-id/flat/CA%2BhUKGLTK6ZuEkpeJ05-MEmvmgZveCh%2B_w013m7%2ByKWFSmRcDA%40mail.gmail.com

Re: Sync scan & regression tests

From

Tom Lane

Date:

07 August 2023, 00:55:36

Thomas Munro <thomas.munro@gmail.com> writes:
> On Mon, Aug 7, 2023 at 7:21 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>> Two tests are failed because of sync scan - this tests cluster.sql and
>> portals.sql perform seqscan without explicit order by and expect that
>> data will be returned in particular order. But because of sync scan it
>> doesn't happen. Small shared buffers are needed to satisfy seqscan
>> criteria in heapam.c: `scan->rs_nblocks > NBuffers / 4` for tenk1 table.

> I wondered the same thing while working on the tests in commit
> 8ab0ebb9a84, which explicitly care about physical order, so they *say
> so* with ORDER BY ctid.  But the problem seems quite widespread, so I
> didn't volunteer to try to do something like that everywhere, when Tom
> committed cbf4177f for 027_stream_regress.pl.

Our customary theory about that is explained in regress.sgml:

  You might wonder why we don't order all the regression test queries explicitly
  to get rid of this issue once and for all.  The reason is that that would
  make the regression tests less useful, not more, since they'd tend
  to exercise query plan types that produce ordered results to the
  exclusion of those that don't.

Having said that ... I just noticed that chipmunk, which I'd been
ignoring because it had been having configuration-related failures
ever since it came back to life about three months ago, has gotten
past those problems and is now failing with what looks suspiciously
like syncscan problems:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-08-06%2008%3A20%3A21

This is possibly explained by the fact that it uses (per its
extra_config)
                  'shared_buffers = 10MB',
although it's done that for a long time and portals.out hasn't changed
since before chipmunk's latest success.  Perhaps some change in an
earlier test script affected this?

I think it'd be worth running to ground exactly what is causing these
failures and when they started.  But assuming that they are triggered
by syncscan, my first thought about dealing with it is to disable
syncscan within the affected tests.  However ... do we exercise the
syncscan logic at all within the regression tests, normally?  Is there
a coverage issue to be dealt with?

            regards, tom lane

Re: Sync scan & regression tests

From

Heikki Linnakangas

Date:

29 August 2023, 10:35:44

(noticed this thread just now)

On 07/08/2023 03:55, Tom Lane wrote:
> Having said that ... I just noticed that chipmunk, which I'd been
> ignoring because it had been having configuration-related failures
> ever since it came back to life about three months ago, has gotten
> past those problems

Yes, I finally got around to fix the configuration...

> and is now failing with what looks suspiciously like syncscan
> problems:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-08-06%2008%3A20%3A21
> 
> This is possibly explained by the fact that it uses (per its
> extra_config)
>                    'shared_buffers = 10MB',
> although it's done that for a long time and portals.out hasn't changed
> since before chipmunk's latest success.  Perhaps some change in an
> earlier test script affected this?

I changed the config yesterday to 'shared_buffers = 20MB', before seeing 
this thread. If we expect the regression tests to work with 
'shared_buffers=10MB' - and I think that would be nice - I'll change it 
back. But let's see what happens with 'shared_buffers=20MB'. It will 
take a few days for the tests to complete.

> I think it'd be worth running to ground exactly what is causing these
> failures and when they started.

I bisected it to this commit:

commit 7ae0ab0ad9704b10400a611a9af56c63304c27a5
Author: David Rowley <drowley@postgresql.org>
Date:   Fri Feb 3 16:20:43 2023 +1300

     Reduce code duplication between heapgettup and heapgettup_pagemode

     The code to get the next block number was exactly the same between 
these
     two functions, so let's just put it into a helper function and call 
that
     from both locations.

     Author: Melanie Plageman
     Reviewed-by: Andres Freund, David Rowley
     Discussion: 
https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com

 From that description, that was supposed to be just code refactoring, 
with no change in behavior.

Looking the new heapgettup_advance_block() function and the code that it 
replaced, it's now skipping this ss_report_location() on the last call, 
when it has reached the end of the scan:

> 
>     /*
>      * Report our new scan position for synchronization purposes. We
>      * don't do that when moving backwards, however. That would just
>      * mess up any other forward-moving scanners.
>      *
>      * Note: we do this before checking for end of scan so that the
>      * final state of the position hint is back at the start of the
>      * rel.  That's not strictly necessary, but otherwise when you run
>      * the same query multiple times the starting position would shift
>      * a little bit backwards on every invocation, which is confusing.
>      * We don't guarantee any specific ordering in general, though.
>      */
>     if (scan->rs_base.rs_flags & SO_ALLOW_SYNC)
>         ss_report_location(scan->rs_base.rs_rd, block);

The comment explicitly says that we do that before checking for 
end-of-scan, but that commit moved it to after. I propose the attached 
to move it back to before the end-of-scan checks.

Melanie, David, any thoughts?

> But assuming that they are triggered by syncscan, my first thought
> about dealing with it is to disable syncscan within the affected
> tests.  However ... do we exercise the syncscan logic at all within
> the regression tests, normally?  Is there a coverage issue to be
> dealt with?
Good question..

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

fix-syncscan-regress-failures.patch

Re: Sync scan & regression tests

From

David Rowley

Date:

30 August 2023, 21:15:03

On Tue, 29 Aug 2023 at 22:35, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Looking the new heapgettup_advance_block() function and the code that it
> replaced, it's now skipping this ss_report_location() on the last call,
> when it has reached the end of the scan:
>
> >
> >       /*
> >        * Report our new scan position for synchronization purposes. We
> >        * don't do that when moving backwards, however. That would just
> >        * mess up any other forward-moving scanners.
> >        *
> >        * Note: we do this before checking for end of scan so that the
> >        * final state of the position hint is back at the start of the
> >        * rel.  That's not strictly necessary, but otherwise when you run
> >        * the same query multiple times the starting position would shift
> >        * a little bit backwards on every invocation, which is confusing.
> >        * We don't guarantee any specific ordering in general, though.
> >        */
> >       if (scan->rs_base.rs_flags & SO_ALLOW_SYNC)
> >               ss_report_location(scan->rs_base.rs_rd, block);
>
> The comment explicitly says that we do that before checking for
> end-of-scan, but that commit moved it to after. I propose the attached
> to move it back to before the end-of-scan checks.
>
> Melanie, David, any thoughts?

I just looked at v15's code and I agree that the ss_report_location()
would be called even when the scan is finished.  It wasn't intentional
that that was changed in v16, so I'm happy for your patch to be
applied and backpatched to 16.  Thanks for digging into that.

David

Re: Sync scan & regression tests

From

Melanie Plageman

Date:

30 August 2023, 23:37:18

On Wed, Aug 30, 2023 at 5:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 29 Aug 2023 at 22:35, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Looking the new heapgettup_advance_block() function and the code that it
> > replaced, it's now skipping this ss_report_location() on the last call,
> > when it has reached the end of the scan:
> >
> > >
> > >       /*
> > >        * Report our new scan position for synchronization purposes. We
> > >        * don't do that when moving backwards, however. That would just
> > >        * mess up any other forward-moving scanners.
> > >        *
> > >        * Note: we do this before checking for end of scan so that the
> > >        * final state of the position hint is back at the start of the
> > >        * rel.  That's not strictly necessary, but otherwise when you run
> > >        * the same query multiple times the starting position would shift
> > >        * a little bit backwards on every invocation, which is confusing.
> > >        * We don't guarantee any specific ordering in general, though.
> > >        */
> > >       if (scan->rs_base.rs_flags & SO_ALLOW_SYNC)
> > >               ss_report_location(scan->rs_base.rs_rd, block);
> >
> > The comment explicitly says that we do that before checking for
> > end-of-scan, but that commit moved it to after. I propose the attached
> > to move it back to before the end-of-scan checks.
> >
> > Melanie, David, any thoughts?
>
> I just looked at v15's code and I agree that the ss_report_location()
> would be called even when the scan is finished.  It wasn't intentional
> that that was changed in v16, so I'm happy for your patch to be
> applied and backpatched to 16.  Thanks for digging into that.

Yes, thanks for catching my mistake.
LGTM.

Re: Sync scan & regression tests

From

Heikki Linnakangas

Date:

31 August 2023, 09:45:34

On 29/08/2023 13:35, Heikki Linnakangas wrote:
> On 07/08/2023 03:55, Tom Lane wrote:
>> This is possibly explained by the fact that it uses (per its
>> extra_config)
>>                     'shared_buffers = 10MB',
>> although it's done that for a long time and portals.out hasn't changed
>> since before chipmunk's latest success.  Perhaps some change in an
>> earlier test script affected this?
> 
> I changed the config yesterday to 'shared_buffers = 20MB', before seeing
> this thread. If we expect the regression tests to work with
> 'shared_buffers=10MB' - and I think that would be nice - I'll change it
> back. But let's see what happens with 'shared_buffers=20MB'. It will
> take a few days for the tests to complete.

With shared_buffers='20MB', the tests passed. I'm going to change it 
back to 10MB now, so that we continue to cover that case.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Sync scan & regression tests

From

Heikki Linnakangas

Date:

31 August 2023, 10:12:42

On 31/08/2023 02:37, Melanie Plageman wrote:
> On Wed, Aug 30, 2023 at 5:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> I just looked at v15's code and I agree that the ss_report_location()
>> would be called even when the scan is finished.  It wasn't intentional
>> that that was changed in v16, so I'm happy for your patch to be
>> applied and backpatched to 16.  Thanks for digging into that.
> 
> Yes, thanks for catching my mistake.
> LGTM.

Pushed, thanks for the reviews!

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Sync scan & regression tests

From

Tom Lane

Date:

05 September 2023, 03:16:48

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> With shared_buffers='20MB', the tests passed. I'm going to change it
> back to 10MB now, so that we continue to cover that case.

So chipmunk is getting through the core tests now, but instead it
is failing in contrib/pg_visibility [1]:

diff -U3 /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
--- /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out    2022-10-08
19:00:15.905074105+0300 
+++ /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out    2023-09-02
00:25:51.814148116+0300 
@@ -218,7 +218,8 @@
      0 | t           | t
      1 | t           | t
      2 | t           | t
-(3 rows)
+     3 | f           | f
+(4 rows)

 select * from pg_check_frozen('copyfreeze');
  t_ctid

I find this easily reproducible by setting shared_buffers=10MB.
But I'm confused about why, because the affected test case
dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk
passed many times after that.  Might be worth bisecting in
the interval where chipmunk wasn't reporting?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-09-01%2015%3A15%3A56

Re: Sync scan & regression tests

From

Heikki Linnakangas

Date:

18 September 2023, 10:49:24

On 05/09/2023 06:16, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> With shared_buffers='20MB', the tests passed. I'm going to change it
>> back to 10MB now, so that we continue to cover that case.
> 
> So chipmunk is getting through the core tests now, but instead it
> is failing in contrib/pg_visibility [1]:
> 
> diff -U3 /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
> --- /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out    2022-10-08
19:00:15.905074105+0300

> +++ /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out    2023-09-02
00:25:51.814148116+0300

> @@ -218,7 +218,8 @@
>        0 | t           | t
>        1 | t           | t
>        2 | t           | t
> -(3 rows)
> +     3 | f           | f
> +(4 rows)
>   
>   select * from pg_check_frozen('copyfreeze');
>    t_ctid
> 
> I find this easily reproducible by setting shared_buffers=10MB.
> But I'm confused about why, because the affected test case
> dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk
> passed many times after that.  Might be worth bisecting in
> the interval where chipmunk wasn't reporting?

I bisected it to this:

commit 82a4edabd272f70d044faec8cf7fd1eab92d9991 (HEAD)
Author: Andres Freund <andres@anarazel.de>
Date:   Mon Aug 14 09:54:03 2023 -0700

     hio: Take number of prior relation extensions into account

     The new relation extension logic, introduced in 00d1e02be24, could 
lead to
     slowdowns in some scenarios. E.g., when loading narrow rows into a 
table using
     COPY, the caller of RelationGetBufferForTuple() will only request a 
small
     number of pages. Without concurrency, we just extended using 
pwritev() in that
     case. However, if there is *some* concurrency, we switched between 
extending
     by a small number of pages and a larger number of pages, depending 
on the
     number of waiters for the relation extension logic.  However, some
     filesystems, XFS in particular, do not perform well when switching 
between
     extending files using fallocate() and pwritev().

     To avoid that issue, remember the number of prior relation 
extensions in
     BulkInsertState and extend more aggressively if there were prior 
relation
     extensions. That not just avoids the aforementioned slowdown, but 
also leads
     to noticeable performance gains in other situations, primarily due to
     extending more aggressively when there is no concurrency. I should 
have done
     it this way from the get go.

     Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
     Author: Andres Freund <andres@anarazel.de>
     Discussion: 
https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com
     Backpatch: 16-, where the new relation extension code was added

Before this patch, the test table was 3 pages long, now it is 4 pages 
with a small shared_buffers setting.

In this test, the relation needs to be at least 3 pages long to hold all 
the COPYed rows. With a larger shared_buffers, the table is extended to 
three pages in a single call to heap_multi_insert(). With 
shared_buffers='10 MB', the table is extended twice, because 
LimitAdditionalPins() restricts how many pages are extended in one go to 
two pages. With the logic that that commit added, we first extend the 
table with 2 pages, then with 2 pages again.

I think the behavior is fine. The reasons given in the commit message 
make sense. But it would be nice to silence the test failure. Some 
alternatives:

a) Add an alternative expected output file

b) Change the pg_visibilitymap query so that it passes even if the table 
has four pages. "select * from pg_visibility_map('copyfreeze') where 
blkno <= 3";

c) Change the extension logic so that we don't extend so much when the 
table is small. The efficiency of bulk extension doesn't matter when the 
table is tiny, so arguably we should rather try to minimize the table 
size. If you have millions of tiny tables, allocating one extra block on 
each adds up.

d) Copy fewer rows to the table in the test. If we copy only 6 rows, for 
example, the table will have only two pages, regardless of shared_buffers.

I'm leaning towards d). The whole test is a little fragile, it will also 
fail with a non-default block size, for example. But c) seems like a 
simple fix and wouldn't look too out of place in the test.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Sync scan & regression tests

From

Tom Lane

Date:

18 September 2023, 13:45:19

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 05/09/2023 06:16, Tom Lane wrote:
>> So chipmunk is getting through the core tests now, but instead it
>> is failing in contrib/pg_visibility [1]:

> I bisected it to this:
> commit 82a4edabd272f70d044faec8cf7fd1eab92d9991 (HEAD)
> Author: Andres Freund <andres@anarazel.de>
> Date:   Mon Aug 14 09:54:03 2023 -0700
>      hio: Take number of prior relation extensions into account

Yeah, I came to the same conclusion after discovering that I could
reproduce it locally with small shared_buffers.

> I think the behavior is fine. The reasons given in the commit message 
> make sense. But it would be nice to silence the test failure. Some 
> alternatives:
> ...
> c) Change the extension logic so that we don't extend so much when the 
> table is small. The efficiency of bulk extension doesn't matter when the 
> table is tiny, so arguably we should rather try to minimize the table 
> size. If you have millions of tiny tables, allocating one extra block on 
> each adds up.

I think your alternative (c) is plenty attractive.  IIUC, the current
change has at one stroke doubled the amount of disk space eaten by
a table that formerly needed two pages.  I don't think we should be
adding more than one page at a time until the table size reaches
perhaps 10 pages.

            regards, tom lane

Re: Sync scan & regression tests

From

Andres Freund

Date:

18 September 2023, 22:57:03

Hi,

On 2023-09-18 13:49:24 +0300, Heikki Linnakangas wrote:
> On 05/09/2023 06:16, Tom Lane wrote:
> > Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > > With shared_buffers='20MB', the tests passed. I'm going to change it
> > > back to 10MB now, so that we continue to cover that case.
> > 
> > So chipmunk is getting through the core tests now, but instead it
> > is failing in contrib/pg_visibility [1]:
> > 
> > diff -U3 /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
> > --- /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out    2022-10-08
19:00:15.905074105+0300
 
> > +++ /home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out    2023-09-02
00:25:51.814148116+0300
 
> > @@ -218,7 +218,8 @@
> >        0 | t           | t
> >        1 | t           | t
> >        2 | t           | t
> > -(3 rows)
> > +     3 | f           | f
> > +(4 rows)
> >   select * from pg_check_frozen('copyfreeze');
> >    t_ctid
> > 
> > I find this easily reproducible by setting shared_buffers=10MB.
> > But I'm confused about why, because the affected test case
> > dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk
> > passed many times after that.  Might be worth bisecting in
> > the interval where chipmunk wasn't reporting?
> 
> I bisected it to this:
> 
> commit 82a4edabd272f70d044faec8cf7fd1eab92d9991 (HEAD)
> Author: Andres Freund <andres@anarazel.de>
> Date:   Mon Aug 14 09:54:03 2023 -0700
> 
>     hio: Take number of prior relation extensions into account
> 
>     The new relation extension logic, introduced in 00d1e02be24, could lead
> to
>     slowdowns in some scenarios. E.g., when loading narrow rows into a table
> using
>     COPY, the caller of RelationGetBufferForTuple() will only request a
> small
>     number of pages. Without concurrency, we just extended using pwritev()
> in that
>     case. However, if there is *some* concurrency, we switched between
> extending
>     by a small number of pages and a larger number of pages, depending on
> the
>     number of waiters for the relation extension logic.  However, some
>     filesystems, XFS in particular, do not perform well when switching
> between
>     extending files using fallocate() and pwritev().
> 
>     To avoid that issue, remember the number of prior relation extensions in
>     BulkInsertState and extend more aggressively if there were prior
> relation
>     extensions. That not just avoids the aforementioned slowdown, but also
> leads
>     to noticeable performance gains in other situations, primarily due to
>     extending more aggressively when there is no concurrency. I should have
> done
>     it this way from the get go.
> 
>     Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
>     Author: Andres Freund <andres@anarazel.de>
>     Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com
>     Backpatch: 16-, where the new relation extension code was added

This issue is also discussed at:

https://www.postgresql.org/message-id/20230916000011.2ugpkkkp7bpp4cfh%40awork3.anarazel.de


> Before this patch, the test table was 3 pages long, now it is 4 pages with a
> small shared_buffers setting.
> 
> In this test, the relation needs to be at least 3 pages long to hold all the
> COPYed rows. With a larger shared_buffers, the table is extended to three
> pages in a single call to heap_multi_insert(). With shared_buffers='10 MB',
> the table is extended twice, because LimitAdditionalPins() restricts how
> many pages are extended in one go to two pages. With the logic that that
> commit added, we first extend the table with 2 pages, then with 2 pages
> again.
> 
> I think the behavior is fine. The reasons given in the commit message make
> sense. But it would be nice to silence the test failure. Some alternatives:
> 
> a) Add an alternative expected output file
> 
> b) Change the pg_visibilitymap query so that it passes even if the table has
> four pages. "select * from pg_visibility_map('copyfreeze') where blkno <=
> 3";

I think the test already encodes the tuple and page size too much, this'd go
further down that road.

It's too bad we don't have an easy way for testing if a page is empty - if we
did, it'd be simple to write this in a robust way. Instead of the query I came
up with in the other thread:
select *
from pg_visibility_map('copyfreeze')
where
  (not all_visible or not all_frozen)
  -- deal with trailing empty pages due to potentially bulk-extending too aggressively
  and exists(SELECT * FROM copyfreeze WHERE ctid >= ('('||blkno||', 0)')::tid)
;


> c) Change the extension logic so that we don't extend so much when the table
> is small. The efficiency of bulk extension doesn't matter when the table is
> tiny, so arguably we should rather try to minimize the table size. If you
> have millions of tiny tables, allocating one extra block on each adds up.

We could also adjust LimitAdditionalPins() to be a bit more aggressive, by
actually counting instead of using REFCOUNT_ARRAY_ENTRIES (only when it might
matter, for efficiency reasons).



> d) Copy fewer rows to the table in the test. If we copy only 6 rows, for
> example, the table will have only two pages, regardless of shared_buffers.
> 
> I'm leaning towards d). The whole test is a little fragile, it will also
> fail with a non-default block size, for example. But c) seems like a simple
> fix and wouldn't look too out of place in the test.

Hm, what do you mean with the last sentence? Oh, is the test you're
referencing the relation-extension logic?

Greetings,

Andres Freund

Re: Sync scan & regression tests

From

Heikki Linnakangas

Date:

19 September 2023, 09:06:57

On 19/09/2023 01:57, Andres Freund wrote:
> On 2023-09-18 13:49:24 +0300, Heikki Linnakangas wrote:
>> d) Copy fewer rows to the table in the test. If we copy only 6 rows, for
>> example, the table will have only two pages, regardless of shared_buffers.
>>
>> I'm leaning towards d). The whole test is a little fragile, it will also
>> fail with a non-default block size, for example. But c) seems like a simple
>> fix and wouldn't look too out of place in the test.
> 
> Hm, what do you mean with the last sentence? Oh, is the test you're
> referencing the relation-extension logic?

Sorry, I said "c) seems like a simple fix ...", but I meant "d) seems 
like a simple fix ..."

I meant the attached.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

silence-small-shared_buffers-fail-by-changing-test-to-copy-less.patch

Re: Sync scan & regression tests

From

Tom Lane

Date:

24 March 2024, 15:28:12

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 19/09/2023 01:57, Andres Freund wrote:
>> On 2023-09-18 13:49:24 +0300, Heikki Linnakangas wrote:
>>> d) Copy fewer rows to the table in the test. If we copy only 6 rows, for
>>> example, the table will have only two pages, regardless of shared_buffers.
>>>
>>> I'm leaning towards d). The whole test is a little fragile, it will also
>>> fail with a non-default block size, for example. But c) seems like a simple
>>> fix and wouldn't look too out of place in the test.

>> Hm, what do you mean with the last sentence? Oh, is the test you're
>> referencing the relation-extension logic?

> Sorry, I said "c) seems like a simple fix ...", but I meant "d) seems
> like a simple fix ..."
> I meant the attached.

This thread stalled out months ago, but chipmunk is still failing in
HEAD and v16.  Can we please have a fix?  I'm good with Heikki's
adjustment to the pg_visibility test case.

            regards, tom lane

Re: Sync scan & regression tests

From

Andres Freund

Date:

26 March 2024, 03:28:34

Hi,

On 2024-03-24 11:28:12 -0400, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > On 19/09/2023 01:57, Andres Freund wrote:
> >> On 2023-09-18 13:49:24 +0300, Heikki Linnakangas wrote:
> >>> d) Copy fewer rows to the table in the test. If we copy only 6 rows, for
> >>> example, the table will have only two pages, regardless of shared_buffers.
> >>>
> >>> I'm leaning towards d). The whole test is a little fragile, it will also
> >>> fail with a non-default block size, for example. But c) seems like a simple
> >>> fix and wouldn't look too out of place in the test.
>
> >> Hm, what do you mean with the last sentence? Oh, is the test you're
> >> referencing the relation-extension logic?
>
> > Sorry, I said "c) seems like a simple fix ...", but I meant "d) seems
> > like a simple fix ..."
> > I meant the attached.
>
> This thread stalled out months ago, but chipmunk is still failing in
> HEAD and v16.  Can we please have a fix?  I'm good with Heikki's
> adjustment to the pg_visibility test case.

I pushed Heikki's adjustment. Thanks for the "fix" and the reminder.

Greetings,

Andres Freund