Re: Yet another fast GiST build - Mailing list pgsql-hackers

From Andrey M. Borodin
Subject Re: Yet another fast GiST build
Date
Msg-id 3520A18A-5C38-4697-A2E3-F3BDE3496CD5@yandex-team.ru
Whole thread Raw
In response to Re: Yet another fast GiST build  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Yet another fast GiST build  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: Yet another fast GiST build  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers

> 28 сент. 2020 г., в 13:12, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> On 21/09/2020 17:19, Andrey M. Borodin wrote:
>>> 21 сент. 2020 г., в 18:29, Andrey M. Borodin <x4mmm@yandex-team.ru> написал(а):
>>>
>>> It was a conscious decision with incorrect motivation. I was thinking that it will help to reduce number of "false
positive"inspecting right pages. But now I see that: 
>>> 1. There should be no such "false positives" that we can avoid
>>> 2. Valid rightlinks could help to do amcheck verification in future
>> Well, point number 2 here is invalid. There exist one leaf page p, so that if we start traversing rightlink from p
wewill reach all leaf pages. But we practically have no means to find this page. This makes rightlinks not very helpful
inamcheck for GiST. 
>
> Well, if you store all the right links in a hash table or something, you can "connect the dots" after you have
scannedall the pages to see that the chain is unbroken. Probably would not be worth the trouble, since the rightlinks
arenot actually needed after concurrent scans have completed. 
>
>> But for consistency I think it worth to install them.
>
> I agree. I did some testing with your patch. It seems that the rightlinks are still not always set. I didn't try to
debugwhy. 
>
> I wrote a couple of 'pageinspect' function to inspect GiST pages for this. See attached. I then created a test table
andindex like this: 
>
> create table points (p point);
> insert into points select point(x,y) from generate_series(-2000, 2000) x, generate_series(-2000, 2000) y;
> create index points_idx on points using gist (p);
>
> And this is what the root page looks like:
>
> postgres=# select * from gist_page_items(get_raw_page('points_idx', 0));
> itemoffset |     ctid      | itemlen
> ------------+---------------+---------
>          1 | (27891,65535) |      40
>          2 | (55614,65535) |      40
>          3 | (83337,65535) |      40
>          4 | (97019,65535) |      40
> (4 rows)
>
> And the right links on the next level:
>
> postgres=# select * from (VALUES (27891), (55614), (83337), (97019)) b (blkno), lateral
gist_page_opaque_info(get_raw_page('points_idx',blkno)); 
> blkno | lsn | nsn | rightlink  | flags
> -------+-----+-----+------------+-------
> 27891 | 0/1 | 0/0 | 4294967295 | {}
> 55614 | 0/1 | 0/0 | 4294967295 | {}
> 83337 | 0/1 | 0/0 |      27891 | {}
> 97019 | 0/1 | 0/0 |      55614 | {}
> (4 rows)
>
> I expected there to be only one page with invalid right link, but there are two.

Yes, there is a bug. Now it seems to me so obvious, yet it took some time to understand that links were shifted by one
extrajump. PFA fixed rightlinks installation. 

BTW some one more small thing: we initialise page buffers with palloc() and palloc0(), while first one is sufficient
forgistinitpage(). 
Also I'm working on btree_gist opclasses and found out that new sortsupport function is not mentioned in
gistadjustmembers().I think this can be fixed with gist_btree patch. 

Your pageinspect patch seems very useful. How do you think, should we provide a way to find invalid tuples in GiST
withingist_page_items()? At some point we will have to ask user to reindex GiSTs with invalid tuples. 


> 28 сент. 2020 г., в 11:53, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> On 21/09/2020 16:29, Andrey M. Borodin wrote:
>> But thing that bothers me now: when we vacuum leaf page, we bump it's
>> NSN. But we do not bump internal page LSN. Does this means we will
>> follow rightlinks after vacuum? It seems superflous.
>
> Sorry, I did not understand what you said above. Vacuum doesn't update any NSNs, only LSNs. Can you elaborate?
I've misunderstood difference between NSN and LSN. Seems like everything is fine.

>
>> And btw we did not adjust internal page tuples after vacuum...
> What do you mean by that?

When we delete rows from table internal tuples in GiST stay wide.


Thanks!

Best regards, Andrey Borodin.


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16419: wrong parsing BC year in to_date() function
Next
From: Tom Lane
Date:
Subject: Re: BUG #16419: wrong parsing BC year in to_date() function