Re: Yet another fast GiST build - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Yet another fast GiST build
Date
Msg-id 7386285b-0e2f-e89e-81f4-f63775becb2e@iki.fi
Whole thread Raw
In response to Re: Yet another fast GiST build  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: Yet another fast GiST build
List pgsql-hackers
On 07/04/2021 15:12, Andrey Borodin wrote:
>> 7 апр. 2021 г., в 14:56, Heikki Linnakangas <hlinnaka@iki.fi>
>> написал(а):
>> 
>> Ok, I think I understand that now. In btree_gist, the *_cmp()
>> function operates on non-leaf values, and *_lt(), *_gt() et al
>> operate on leaf values. For all other datatypes, the leaf and
>> non-leaf representation is the same, but for bit/varbit, the
>> non-leaf representation is different. The leaf representation is
>> VarBit, and non-leaf is just the bits without the 'bit_len' field.
>> That's why it is indeed correct for gbt_bitcmp() to just use
>> byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len' field
>> separately. That's subtle, and 100% uncommented.
>> 
>> What that means for this patch is that gbt_bit_sort_build_cmp()
>> should *not* call byteacmp(), but bitcmp(). Because it operates on
>> the original datatype stored in the table.
> 
> +1 Thanks for investigating this. If I understand things right,
> adding test values with different lengths of bit sequences would not
> uncover the problem anyway?

That's right, the only consequence of a "wrong" sort order is that the 
quality of the tree suffers, and scans need to scan more pages 
unnecessarily.

I tried to investigate this by creating a varbit index with and without 
sorting, and compared them with pageinspect, but in quick testing, I 
wasn't able to find cases where the sorted version was badly ordered. I 
guess I didn't find the right data set yet.

- Heikki



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Next
From: Bharath Rupireddy
Date:
Subject: Re: CREATE SEQUENCE with RESTART option