Re: Bogus collation version recording in recordMultipleDependencies - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Bogus collation version recording in recordMultipleDependencies |
Date | |
Msg-id | 20210418112333.5cbcamht2b7grfnm@nol Whole thread Raw |
In response to | Re: Bogus collation version recording in recordMultipleDependencies (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Bogus collation version recording in recordMultipleDependencies
Re: Bogus collation version recording in recordMultipleDependencies |
List | pgsql-hackers |
On Sat, Apr 17, 2021 at 10:01:53AM +1200, Thomas Munro wrote: > On Sat, Apr 17, 2021 at 8:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Per the changes in collate.icu.utf8.out, this gets rid of > > a lot of imaginary collation dependencies, but it also gets > > rid of some arguably-real ones. In particular, calls of > > record_eq and its siblings will be considered not to have > > any collation dependencies, although we know that internally > > those will look up per-column collations of their input types. > > We could imagine special-casing record_eq etc here, but that > > sure seems like a hack. > > [...] > > > So I wonder if, rather than continuing to pursue this right now, > > we shouldn't revert 257836a75 and try again later with a new design > > that doesn't try to commandeer the existing dependency infrastructure. > > We might have a better idea about what to do on Windows by the time > > that's done, too. > > It seems to me that there are two things that would be needed to > salvage this for PG14: (1) deciding that we're unlikely to come up > with a better idea than using pg_depend for this (following the > argument that it'd only create duplication to have a parallel > dedicated catalog), (2) fixing any remaining flaws in the dependency > analyser code. I'll look into the details some more on Monday. So IIUC the issue here is that the code could previously record useless collation version dependencies in somes cases, which could lead to false positive possible corruption messages (and of course additional bloat on pg_depend). False positive messages can't be avoided anyway, as a collation version update may not corrupt the actually indexed set of data, especially for glibc. But with the infrastructure as-is advanced user can look into the new version changes and choose to ignore changes for a specific set of collation, which is way easier to do with the recorded dependencies. The new situation is now that the code can record too few version dependencies leading to false negative detection, which is way more problematic. This was previously discussed around [1]. Quoting Thomas: > To state more explicitly what's happening here, we're searching the > expression trees for subexpresions that have a collation as part of > their static type. We don't know which functions or operators are > actually affected by the collation, though. For example, if an > expression says "x IS NOT NULL" and x happens to be a subexpression of > a type with a particular collation, we don't now that this > expression's value can't possibly be affected by the collation version > changing. So, the system will nag you to rebuild an index just > because you mentioned it, even though the index can't be corrupted. > To do better than that, I suppose we'd need declarations in the > catalog to say which functions/operators are collation sensitive. We agreed that having possible false positive dependencies was acceptable for the initial implementation and that we will improve it in later versions, as otherwise the alternative is to reindex everything without getting any warning, which clearly isn't better anyway. FTR was had the same agreement to not handle specific AMs that don't care about collation (like hash or bloom) in [2], even though I provided a patch to handle that case ([3]) which was dropped later on ([4]). Properly and correctly handling collation version dependency in expressions is a hard problem and will definitely require additional fields in pg_proc, so we clearly can't add that in pg14. So yes we have to decide whether we want to keep the feature in pg14 with the known limitations (and in that case probably revert f24b15699, possibly improving documentation on the possibility of false positive) or revert it entirely. Unsurprisingly, I think that the feature as-is is already a significant improvement, which can be easily improved, so my vote is to keep it in pg14. And just to be clear I'm volunteering to work on the expression problem and all other related improvements for the next version, whether the current feature is reverted or not. [1]: https://www.postgresql.org/message-id/CA%2BhUKGK8CwBcTcXWL2kUjpHT%2B6t2hEFCzkcZ-Z7xXbz%3DC4NLCQ%40mail.gmail.com [2]: https://www.postgresql.org/message-id/13b0c950-80f9-4c10-7e0f-f59feac56a98%402ndquadrant.com [3]: https://www.postgresql.org/message-id/20200908144507.GA57691%40nol [4]: https://www.postgresql.org/message-id/CA%2BhUKGKHj4aYmmwKZdZjkD%3DCWRmn%3De6UsS7S%2Bu6oLrrp0orgsw%40mail.gmail.com
pgsql-hackers by date: