On 8/13/25 23:36, Peter Geoghegan wrote:
> On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra <tomas@vondra.me> wrote:
>> This seems rather bizarre, considering the two tables are exactly the
>> same, except that in t2 the first column is negative, and the rows are
>> fixed-length. Even heap_page_items says the tables are exactly the same.
>>
>> So why would the index get so different like this?
>
> In the past, when I required *perfectly* deterministic results for
> INSERT INTO test_table ... SELECT * FROM source_table bulk inserts
> (which was important during the Postgres 12 and 13 nbtree work), I
> found it necessary to "set synchronize_seqscans=off". If I was writing
> a test such as this, I'd probably do that defensively, even if it
> wasn't clear that it mattered. (I'm also in the habit of using
> unlogged tables, because VACUUM tends to set their pages all-visible
> more reliably than equivalent logged tables, which I notice that
> you're also doing here.)
>
The tables are *exactly* the same, block by block. I double checked that
by looking at a couple pages, and the only difference is the inverted
value of the "a" column.
> That said, I *think* that the "locally shuffled" heap TID pattern that
> we see with "t2"/"idx2" is mostly (perhaps entirely) caused by the way
> that you're inverting the indexed column's value when initially
> generating "t2". A given range of values such as "1 through to 4"
> becomes "-4 through to -1" as their tuples are inserted into t2.
Right.
> You're effectively inverting the order of the bigint indexed column
> "a" -- but you're *not* inverting the order of the imaginary
> tie-breaker heap column (it *remains* in ASC heap TID order in "t2").
>
I have no idea what I'm supposed to do about that. As you say the
tie-breaker is imaginary, selected by the system on my behalf. If it
works like this, doesn't that mean it'll have this unfortunate effect on
all data sets with negative correlation?
> In general, when doing this sort of analysis, I find it useful to
> manually verify that the data that I generated matches my
> expectations. Usually a quick check with pageinspect is enough. I'll
> just randomly select 2 - 3 leaf pages, and make sure that they all
> more or less match my expectations.
>
I did that for the heap, and that's just as I expected. But the effect
on the index surprised me.
regards
--
Tomas Vondra