Re: Index Skip Scan - Mailing list pgsql-hackers

From Jesper Pedersen
Subject Re: Index Skip Scan
Date
Msg-id 802e46fa-dc2c-4c32-e55b-884e04aa7f12@redhat.com
Whole thread Raw
In response to RE: Index Skip Scan  (Floris Van Nee <florisvannee@Optiver.com>)
Responses Re: Index Skip Scan  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi Floris,

On 1/15/20 8:33 AM, Floris Van Nee wrote:
> I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below.
>

Thanks for your review !

> - I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me:
> create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b,
generate_series(1,10000)c); create index on t2 (a,b,c desc);
 
> 
> postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc;
>                                     QUERY PLAN
> ---------------------------------------------------------------------------------
>   Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..216.25 rows=500 width=12)
>     Skip scan: true
>     Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
> (3 rows)
> 
> postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc;
>                                         QUERY PLAN
> -----------------------------------------------------------------------------------------
>   Unique  (cost=0.43..8361.56 rows=500 width=12)
>     ->  Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..8361.56 rows=9807 width=12)
>           Index Cond: ((a = 2) AND (b = 5))
> (3 rows)
> 
> When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index
scaneven though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather
thanwriting b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in
thefuture. I think for the sake of consistency it would make sense if this eventually gets addressed.
 
> 

Agreed, that it is an improvement that should be made. I would like 
David's view on this since it relates to the UniqueKey patch.

> - nodeIndexScan.c, line 126
> This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the
side-effectsthis has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin)
 
>

Correct - fixed.

> - nbsearch.c, _bt_skip, line 1440
> _bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately
afterthat outside of the else. The second call can be removed I think.
 
> 

Yes, removed the "else" call site.

> - nbtsearch.c _bt_skip line 1597
>                 LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
>                 scan->xs_itup = (IndexTuple) PageGetItem(page, itemid);
> 
> This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?
> 

Yes, that needed to be changed.

> - nbtsearch.c _bt_skip line 1440
> if (BTScanPosIsValid(so->currPos) &&
>         _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir))
> 
> Is it allowed to look at the high key / low key of the page without have a read lock on it?
> 

In case of a split the page will still contain a high key and a low key, 
so this should be ok.

> - nbtsearch.c line 1634
> if (_bt_readpage(scan, indexdir, offnum))  ...
> else
>   error()
> 
> Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index
condition,not part of the scan key, makes none of the keys on the page match?
 
> 

The logic for this has been changed.

> - nbtsearch.c in general
> Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to
neverrelease the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin?
[1]I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a
scenario- so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where
pagesplits and/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page
checksuch that it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or
cursorscans. Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear
exampleready when it would go wrong. It may well be fine, but it was one of the things on my mind.
 
> 

There is a BT_READ lock in place when finding the correct leaf page, or 
searching within the leaf page itself. _bt_vacuum_one_page deletes only 
LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do 
you have some feedback for this ?


Please, find the updated patches attached that Dmitry and I made.

Thanks again !

Best regards,
  Jesper

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Physical replication slot advance is not persistent
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] kqueue