On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra <tomas@vondra.me> wrote:
> This seems rather bizarre, considering the two tables are exactly the
> same, except that in t2 the first column is negative, and the rows are
> fixed-length. Even heap_page_items says the tables are exactly the same.
>
> So why would the index get so different like this?
In the past, when I required *perfectly* deterministic results for
INSERT INTO test_table ... SELECT * FROM source_table bulk inserts
(which was important during the Postgres 12 and 13 nbtree work), I
found it necessary to "set synchronize_seqscans=off". If I was writing
a test such as this, I'd probably do that defensively, even if it
wasn't clear that it mattered. (I'm also in the habit of using
unlogged tables, because VACUUM tends to set their pages all-visible
more reliably than equivalent logged tables, which I notice that
you're also doing here.)
That said, I *think* that the "locally shuffled" heap TID pattern that
we see with "t2"/"idx2" is mostly (perhaps entirely) caused by the way
that you're inverting the indexed column's value when initially
generating "t2". A given range of values such as "1 through to 4"
becomes "-4 through to -1" as their tuples are inserted into t2.
You're effectively inverting the order of the bigint indexed column
"a" -- but you're *not* inverting the order of the imaginary
tie-breaker heap column (it *remains* in ASC heap TID order in "t2").
In general, when doing this sort of analysis, I find it useful to
manually verify that the data that I generated matches my
expectations. Usually a quick check with pageinspect is enough. I'll
just randomly select 2 - 3 leaf pages, and make sure that they all
more or less match my expectations.
--
Peter Geoghegan