Thread: Don't overwrite scan key in systable_beginscan()
When systable_beginscan() and systable_beginscan_ordered() choose an index scan, they remap the attribute numbers in the passed-in scan keys to the attribute numbers of the index, and then write those remapped attribute numbers back into the scan key passed by the caller. This second part is surprising and gratuitous. It means that a scan key cannot safely be used more than once (but it might sometimes work, depending on circumstances). Also, there is no value in providing these remapped attribute numbers back to the caller, since they can't do anything with that. I propose to fix that by making a copy of the scan keys passed by the caller and make the modifications there. In order to prove to myself that there are no other cases where caller-provided scan keys are modified, I went through and const-qualified all the APIs. This works out correctly. Several levels down in the stack, the access methods make their own copy of the scan keys that they store in their scan descriptors, and they use those in non-const-clean ways, but that's ok, that's their business. As far as the top-level callers are concerned, they can rely on their scan keys to be const after this. I'm not proposing this second patch for committing at this time, since that would modify the public access method APIs in an incompatible way. I've made a proposal of a similar nature in [0]. At some point, it might be worth batching these and other changes together and make the change. I might come back to that later. [0]: https://www.postgresql.org/message-id/flat/14c31f4a-0347-0805-dce8-93a9072c05a5%40eisentraut.org While researching how the scan keys get copied around, I noticed that the index access methods all use memmove() to make the above-mentioned copy into their own scan descriptor. This is fine, but memmove() is usually only used when something special is going on that would prevent memcpy() from working, which is not the case there. So to avoid the confusion for future readers, I changed those to memcpy(). I suspect that this code has been copied between the different index AM over time. (The nbtree version of this code is literally unchanged since July 1996.)
Attachment
On Thu, Aug 8, 2024 at 2:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: > When systable_beginscan() and systable_beginscan_ordered() choose an > index scan, they remap the attribute numbers in the passed-in scan keys > to the attribute numbers of the index, and then write those remapped > attribute numbers back into the scan key passed by the caller. This > second part is surprising and gratuitous. It means that a scan key > cannot safely be used more than once (but it might sometimes work, > depending on circumstances). Also, there is no value in providing these > remapped attribute numbers back to the caller, since they can't do > anything with that. > > I propose to fix that by making a copy of the scan keys passed by the > caller and make the modifications there. This does have the disadvantage of adding more palloc overhead. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 8, 2024 at 2:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> I propose to fix that by making a copy of the scan keys passed by the >> caller and make the modifications there. > This does have the disadvantage of adding more palloc overhead. It seems hard to believe that one more palloc + memcpy is going to be noticeable in the context of an index scan. regards, tom lane
On Thu, Aug 08, 2024 at 08:46:35AM +0200, Peter Eisentraut wrote: > When systable_beginscan() and systable_beginscan_ordered() choose an index > scan, they remap the attribute numbers in the passed-in scan keys to the > attribute numbers of the index, and then write those remapped attribute > numbers back into the scan key passed by the caller. This second part is > surprising and gratuitous. It means that a scan key cannot safely be used > more than once (but it might sometimes work, depending on circumstances). > Also, there is no value in providing these remapped attribute numbers back > to the caller, since they can't do anything with that. > > I propose to fix that by making a copy of the scan keys passed by the caller > and make the modifications there. No objection, but this would obsolete at least some of these comments (the catcache.c ones if nothing else): $ git grep -in scankey | grep -i copy src/backend/access/gist/gistscan.c:257: * Copy consistent support function to ScanKey structure instead src/backend/access/gist/gistscan.c:330: * Copy distance support function to ScanKey structure instead of src/backend/access/nbtree/nbtutils.c:281: ScanKey arrayKeyData; /* modified copy of scan->keyData */ src/backend/access/nbtree/nbtutils.c:3303: * We copy the appropriate indoption value into the scankey sk_flags src/backend/access/nbtree/nbtutils.c:3318: * It's a bit ugly to modify the caller's copy of the scankey but in practice src/backend/access/spgist/spgscan.c:385: /* copy scankeys into local storage */ src/backend/utils/cache/catcache.c:1474: * Ok, need to make a lookup in the relation, copy the scankey and src/backend/utils/cache/catcache.c:1794: * Ok, need to make a lookup in the relation, copy the scankeyand src/backend/utils/cache/relfilenumbermap.c:192: /* copy scankey to local copy, it will be modified during the scan*/ > In order to prove to myself that there are no other cases where > caller-provided scan keys are modified, I went through and const-qualified > all the APIs. This works out correctly. Several levels down in the stack, > the access methods make their own copy of the scan keys that they store in > their scan descriptors, and they use those in non-const-clean ways, but > that's ok, that's their business. As far as the top-level callers are > concerned, they can rely on their scan keys to be const after this. We do still have code mutating IndexScanDescData.keyData, but that's fine. We must be copying function args to form IndexScanDescData.keyData, or else your proof patch would have noticed an assignment of const to non-const.
Noah Misch <noah@leadboat.com> writes: > On Thu, Aug 08, 2024 at 08:46:35AM +0200, Peter Eisentraut wrote: >> I propose to fix that by making a copy of the scan keys passed by the caller >> and make the modifications there. > No objection, but this would obsolete at least some of these comments (the > catcache.c ones if nothing else): That ties into something I forgot to ask: aren't there any copy steps or other overhead that we could remove, given this new API constraint? That would help address Robert's concern. regards, tom lane
On 09.08.24 06:55, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: >> On Thu, Aug 08, 2024 at 08:46:35AM +0200, Peter Eisentraut wrote: >>> I propose to fix that by making a copy of the scan keys passed by the caller >>> and make the modifications there. > >> No objection, but this would obsolete at least some of these comments (the >> catcache.c ones if nothing else): > > That ties into something I forgot to ask: aren't there any copy steps > or other overhead that we could remove, given this new API constraint? > That would help address Robert's concern. I added two more patches to the series here. First (or 0004), some additional cleanup for code that had to workaround systable_beginscan() overwriting the scan keys, along the lines suggested by Noah. Second (or 0005), an alternative to palloc is to make the converted scan keys a normal local variable. Then it's just a question of whether a smaller palloc is preferred over an over-allocated local variable. I think I still prefer the palloc version, but it doesn't matter much either way I think.
Attachment
- v2-0001-Don-t-overwrite-scan-key-in-systable_beginscan.patch
- v2-0002-Augment-ScanKey-arguments-with-const-qualifiers.patch
- v2-0003-Replace-gratuitous-memmove-with-memcpy.patch
- v2-0004-Update-some-code-that-handled-systable_beginscan-.patch
- v2-0005-Alternative-Store-converted-key-in-local-variable.patch