RE: Index Skip Scan - Mailing list pgsql-hackers

From Floris Van Nee
Subject RE: Index Skip Scan
Date
Msg-id 1e59d800b38141c5af2ad3a64cf9758e@opammb0562.comp.optiver.com
Whole thread Raw
In response to Index Skip Scan  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Responses Re: Index Skip Scan  (Jesper Pedersen <jesper.pedersen@redhat.com>)
List pgsql-hackers
Hi all,

I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below.

- I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me:
create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b,
generate_series(1,10000)c); create index on t2 (a,b,c desc); 

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..216.25 rows=500 width=12)
   Skip scan: true
   Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
(3 rows)

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Unique  (cost=0.43..8361.56 rows=500 width=12)
   ->  Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..8361.56 rows=9807 width=12)
         Index Cond: ((a = 2) AND (b = 5))
(3 rows)

When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index
scaneven though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather
thanwriting b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in
thefuture. I think for the sake of consistency it would make sense if this eventually gets addressed. 

- nodeIndexScan.c, line 126
This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the
side-effectsthis has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin) 

- nbsearch.c, _bt_skip, line 1440
_bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately after
thatoutside of the else. The second call can be removed I think. 

- nbtsearch.c _bt_skip line 1597
                LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
                scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);

This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?

- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
        _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))

Is it allowed to look at the high key / low key of the page without have a read lock on it?

- nbtsearch.c line 1634
if (_bt_readpage(scan, indexdir, offnum))  ...
else
 error()

Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index condition,
notpart of the scan key, makes none of the keys on the page match? 

- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never
releasethe buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1]
I'vebeen trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario
-so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where page
splitsand/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page check
suchthat it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or cursor
scans.Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear example
readywhen it would go wrong. It may well be fine, but it was one of the things on my mind. 

[1] https://postgrespro.com/list/id/1566683972147.11682@Optiver.com

-Floris



pgsql-hackers by date:

Previous
From: Imai Yoshikazu
Date:
Subject: Re: [Proposal] Add accumulated statistics for wait event
Next
From: Mahendra Singh Thalor
Date:
Subject: Re: [HACKERS] Block level parallel vacuum