Thread: Hide exposed impl detail of wchar.c
Hello, I work on pgrx, a Rust crate (really, a set of them) that allows people to use Rust to write extensions against Postgres, exporting what Postgres sees as ordinary "C" dynamic libraries. Unfortunately, the build system for this is a touch complicated, as it cannot simply run pgxs.mk, and as-of Postgres 16 it has been periodically failing on platforms it used to do fine on, due to troubles involved with the SIMD extension headers. I have root-caused the exact problem, but the bug is... social, rather than technical in nature: people with inadequate options at their disposal have implemented increasingly clever educated guesses that are increasingly prone to going wrong, rather than just asking anyone to help them increase their options. Rather than continuing this trend, I figured I would simply start doing things to hopefully draw the line here. I will be looking to follow up with the bindgen tools that fail to handle this correctly, but it would be nice if this stopped shipping in Postgres 16."${PG_NEXT_MINOR}", as pgrx does need the definitions in pg_wchar.h to have enough data to correctly determine database encoding and preserve certain Rust library invariants ("all Rust strings are correctly-formed UTF-8, anything else is just a sequence of bytes") without also obliterating performance. On the off-chance that everyone agrees with me without reserve, the attached patch moves the relevant code around (and includes the gory details). This seems to be unlikely to be the only mildly-exotic build system failure caused by such an overexposed implementation detail, so while I'm not married to this particular code motion, it seems best to improve this some way.
Attachment
Jubilee Young <workingjubilee@gmail.com> writes: > I have root-caused the exact problem, but the bug is... social, rather > than technical in nature: people with inadequate options at their > disposal have implemented increasingly clever educated guesses that > are increasingly prone to going wrong, rather than just asking anyone > to help them increase their options. Rather than continuing this > trend, I figured I would simply start doing things to hopefully draw > the line here. I will be looking to follow up with the bindgen tools > that fail to handle this correctly, but it would be nice if this > stopped shipping in Postgres 16."${PG_NEXT_MINOR}", as pgrx does need > the definitions in pg_wchar.h to have enough data to correctly > determine database encoding and preserve certain Rust library > invariants ("all Rust strings are correctly-formed UTF-8, anything > else is just a sequence of bytes") without also obliterating > performance. It would be nice if you would state your problem straightforwardly, rather than burying us in irrelevant-to-us details; but apparently what you are unhappy about is that pg_wchar.h now #include's simd.h. That evidently stems from commit 121d2d3d7 trying to make is_valid_ascii() faster. Currently the only caller of is_valid_ascii() is in wchar.c, and so we could easily fix this by moving is_valid_ascii() into wchar.c as your patch proposes. However ... I suppose the point of having it as a "static inline" in a header file is to be able to optimize other call sites too. So I wonder if there used to be some, or this was just somebody's over-eagerness to expose stuff they thought possibly might be useful. And maybe more to the point, are we worried about there being other callers in future? I'm really not sure. regards, tom lane
On Thu, Nov 16, 2023 at 12:10:59PM -0800, Jubilee Young wrote: > On the off-chance that everyone agrees with me without reserve, the > attached patch moves the relevant code around (and includes the gory > details). This seems to be unlikely to be the only mildly-exotic build > system failure caused by such an overexposed implementation detail, so > while I'm not married to this particular code motion, it seems best to > improve this some way. It looks like is_valid_ascii() was originally added to pg_wchar.h so that it could easily be used elsewhere [0] [1], but that doesn't seem to have happened yet. Would moving this definition to a separate header file be a viable option? That'd still break any existing projects that are using it, but at least there'd be an easy fix. I'm not sure there _are_ any other projects using it, anyway. However, both of these proposals feel like they might be slightly beyond what we'd ordinarily consider back-patching. That being said, it's not unheard of for Postgres to make adjustments for third-party code (search for "pljava" in commits 62aba76 and f4aa3a1). I read the description of the pgrx problem [2], and I'm still trying to understand the scope of the issue. I don't think it's reasonable to expect someone building an extension to always use the exact same compiler that was used by the packager, but I also don't fully understand why different compilations of an inline function would cause problems. [0] https://postgr.es/m/CAFBsxsHG%3Dg6W8Mie%2B_NO8dV6O0pO2stxrnS%3Dme5ZmGqk--fd5g%40mail.gmail.com [1] https://postgr.es/m/CAFBsxsH1Yutrmu%2B6LLHKK8iXY%2BvG--Do6zN%2B2900spHXQNNQKQ%40mail.gmail.com [2] https://github.com/pgcentralfoundation/pgrx/issues/1298 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > It looks like is_valid_ascii() was originally added to pg_wchar.h so that > it could easily be used elsewhere [0] [1], but that doesn't seem to have > happened yet. It seems to be new as of v15, so there wouldn't have been a lot of time for external code to adopt it. As far as I can tell from Debian Code Search, nobody has yet. > Would moving this definition to a separate header file be a viable option? > That'd still break any existing projects that are using it, but at least > there'd be an easy fix. That would provide a little bit of cover, at least, compared to just hiding it in the .c file. I'm generally sympathetic to the idea that simd.h was a rather large dependency to add to something as widely used as pg_wchar.h. So I'd favor getting it out of there just on compilation-time grounds, independently of whether it's causing active problems. That argument wouldn't justify a back-patch, but "it's causing problems" might. regards, tom lane
On Thu, Nov 16, 2023 at 2:54 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > That being said, it's not unheard of for Postgres to make adjustments for > third-party code (search for "pljava" in commits 62aba76 and f4aa3a1). I > read the description of the pgrx problem [2], and I'm still trying to > understand the scope of the issue. I don't think it's reasonable to expect > someone building an extension to always use the exact same compiler that > was used by the packager, but I also don't fully understand why different > compilations of an inline function would cause problems. > > [0] https://postgr.es/m/CAFBsxsHG%3Dg6W8Mie%2B_NO8dV6O0pO2stxrnS%3Dme5ZmGqk--fd5g%40mail.gmail.com > [1] https://postgr.es/m/CAFBsxsH1Yutrmu%2B6LLHKK8iXY%2BvG--Do6zN%2B2900spHXQNNQKQ%40mail.gmail.com > [2] https://github.com/pgcentralfoundation/pgrx/issues/1298 > We don't directly `#include` C into Rust, but use libclang to preprocess and compile a wrapping C header into a list of symbols Rust will look for at link time. Our failure is in libclang and how we steer it: - The Clang-C API (libclang.so) cannot determine where system headers are. - A clang executable can determine where system headers are, but our bindgen may be asked to use a libclang.so without a matching clang executable! - This is partly because system packagers do not agree on what clang parts must be installed together, nor even on the clang directory's layout. - Thus, it is currently impossible to, given a libclang.so, determine with 100% accuracy where version-appropriate system headers are and include them, nor does it do so implicitly. - Users cannot be expected to always have reasonable directory layouts, nor always have one clang + libclang.so + clang/"$MAJOR"/include on the system. - We have tried various solutions and have had several users report, by various channels, that their builds are breaking, even after they charitably try out the patches I offer in their CI. Especially after system updates. The clang-sys and rust-bindgen crates committed a series of unfortunate hacks that surprisingly work. But the only real solution is actually exposing the C++ API for header searches to Clang-C, and then move that up the deps chain. Perhaps libclang-18.so will not have this problem? - Jubilee
On Thu, Nov 16, 2023 at 06:06:30PM -0500, Tom Lane wrote: > I'm generally sympathetic to the idea that simd.h was a rather large > dependency to add to something as widely used as pg_wchar.h. So I'd > favor getting it out of there just on compilation-time grounds, > independently of whether it's causing active problems. That argument > wouldn't justify a back-patch, but "it's causing problems" might. Given the lack of evidence of anyone else using is_valid_ascii(), I'm leaning towards back-patching being the better option in this case. I don't know if it'll be feasible to keep simd.h out of all headers that third-party code might want to use forever, but that's not an argument against doing this right now for pgrx. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Nov 17, 2023 at 5:54 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > It looks like is_valid_ascii() was originally added to pg_wchar.h so that > it could easily be used elsewhere [0] [1], but that doesn't seem to have > happened yet. > > Would moving this definition to a separate header file be a viable option? Seems fine to me. (I believe the original motivation for making it an inline function was for in pg_mbstrlen_with_len(), but trying that hasn't been a priority.)
On Fri, Nov 17, 2023 at 2:26 AM John Naylor <johncnaylorls@gmail.com> wrote: > > On Fri, Nov 17, 2023 at 5:54 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > > It looks like is_valid_ascii() was originally added to pg_wchar.h so that > > it could easily be used elsewhere [0] [1], but that doesn't seem to have > > happened yet. > > > > Would moving this definition to a separate header file be a viable option? > > Seems fine to me. (I believe the original motivation for making it an > inline function was for in pg_mbstrlen_with_len(), but trying that > hasn't been a priority.) In that case, I took a look across the codebase and saw a utils/ascii.h that doesn't seem to have gotten much love, but I suppose one could argue that it's intended to be a backend-only header file? As the codebase is growing some enhanced UTF-8 support, you'll want somewhere that contains the optimized US-ASCII routines, because, as US-ASCII is a subset of UTF-8, and often faster to handle, it's typical for such codepaths to look like ```c while (i < len && no_multibyte_chars) { i = i + ascii_op_version(i, buffer, &no_multibyte_chars); } while (i < len) { i = i + utf8_op_version(i, buffer); } ``` So it should probably end up living somewhere near the UTF-8 support, and the easiest way to make it not go into something pgrx currently includes would be to make it a new header file, though there's a fair amount of API we don't touch. From the pgrx / Rust perspective, Postgres function calls are passed via callback to a "guard function" that guarantees that longjmp and setjmp don't cause trouble (and makes sure we participate in that). So we only want to call Postgres functions if we "can't replace" them, as the overhead is quite a lot. That means UTF-8-per-se functions aren't very interesting to us as the Rust language already supports it, but we do benefit from access to transcoding to/from UTF-8. —Jubilee
On Mon, Nov 20, 2023 at 10:50:36AM -0800, Jubilee Young wrote: > In that case, I took a look across the codebase and saw a > utils/ascii.h that doesn't > seem to have gotten much love, but I suppose one could argue that it's intended > to be a backend-only header file? That might work. It's not #included in very many files, so adding port/simd.h shouldn't be too bad. And ascii.h is also pretty inexpensive, so including it in wchar.c seems permissible, too. I'm not certain this doesn't cause problems with libpgcommon, but I don't see why it would, either. > So it should probably end up living somewhere near the UTF-8 support, and > the easiest way to make it not go into something pgrx currently > includes would be > to make it a new header file, though there's a fair amount of API we > don't touch. Does pgrx use ascii.h at all? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-11-16 17:11:03 -0800, Jubilee Young wrote: > We don't directly `#include` C into Rust, but use libclang to preprocess and > compile a wrapping C header into a list of symbols Rust will look for at link > time. Our failure is in libclang and how we steer it: > - The Clang-C API (libclang.so) cannot determine where system headers are. > - A clang executable can determine where system headers are, but our bindgen > may be asked to use a libclang.so without a matching clang executable! > - This is partly because system packagers do not agree on what clang parts > must be installed together, nor even on the clang directory's layout. > - Thus, it is currently impossible to, given a libclang.so, determine with > 100% accuracy where version-appropriate system headers are and include them, > nor does it do so implicitly. I remember battling this in the past, independent of rust :( What I don't quite get is why SIMD headers are particularly more problematic than others - there's other headers that are compiler specific? Greetings, Andres Freund
On Mon, Nov 20, 2023 at 05:14:17PM -0800, Jubilee Young wrote: > On Mon, Nov 20, 2023 at 2:52 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> Does pgrx use ascii.h at all? > > We don't use utils/ascii.h, no. Alright. The next minor release isn't until February, so I'll let this one sit a little while longer in case anyone objects to back-patching something like this [0]. [0] https://postgr.es/m/attachment/152305/move_is_valid_ascii_v2.patch -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
> On Nov 20, 2023, at 7:10 PM, Andres Freund <andres@anarazel.de> wrote: > > > What I don't quite get is why SIMD headers are particularly more problematic > than others - there's other headers that are compiler specific? The short answer is the rust-based bindings generation tool pgrx uses (bindgen) is a little brain dead and gets confusedwhen there’s multiple compiler builtins headers on the host. This simd header is the first of its kind we’ve run across that’s exposed via Postgres’ “public api”. And bindgen wants tofollow all the includes, it gets confused, picks the wrong one, and then errors happen. And I don’t know that it makes much sense for Postgres to include such a header into 3rd-party code anyways. I think Jubilee is also working with them to fix this, but we’re hoping Jubilee’s patch here (or similar) can get mergedso we can clear our build drama. eric
(I hope you don't mind I'm reposting your reply -- I accidentally replied directly to you b/c phone) > On Nov 21, 2023, at 11:56 AM, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-11-21 10:11:18 -0500, Eric Ridge wrote: >> On Mon, Nov 20, 2023 at 7:10 PM Andres Freund <andres@anarazel.de> wrote: <snip> >> And I don’t know that it makes much sense for Postgres to include such a >> header into 3rd-party code anyways. > > Well, we want to expose such functionality to extensions. For some cases using > full functions would to be too expensive, hence using static inlines. In case > of exposing simd stuff, that means we need to include headers. Sure. Probably not my place to make that kind of broad statement anyways. The "static inlines" are problematic for us inpgrx-land too, but that's a different problem for another day. > I'm not against splitting this out of pg_wchar.h, to be clear - that's a too > widely used header for, so there's a good independent reason for such a > change. I just don't really believe that moving simd.h out of there will end > the issue, we'll add more inlines using simd over time... Yeah and that's why Jubilee is working with the bindgen folks to tighten this up for good. (We tracked all of the pg16 betas and didn't run into this until after pg16 went gold. I personally haven't groveled throughthe git logs to see when this header/static inline was added, but we'd have reported this sooner had we found it sooner.) eric
On Mon, Nov 20, 2023 at 10:39:43PM -0600, Nathan Bossart wrote: > Alright. The next minor release isn't until February, so I'll let this one > sit a little while longer in case anyone objects to back-patching something > like this [0]. > > [0] https://postgr.es/m/attachment/152305/move_is_valid_ascii_v2.patch Barring objections, I plan to commit this and back-patch it to v16 in the next few days. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Jan 04, 2024 at 04:43:29PM -0600, Nathan Bossart wrote: > On Mon, Nov 20, 2023 at 10:39:43PM -0600, Nathan Bossart wrote: >> Alright. The next minor release isn't until February, so I'll let this one >> sit a little while longer in case anyone objects to back-patching something >> like this [0]. >> >> [0] https://postgr.es/m/attachment/152305/move_is_valid_ascii_v2.patch > > Barring objections, I plan to commit this and back-patch it to v16 in the > next few days. Apologies for the delay. We're getting close to the February release, so I should probably take care of this one soon... I see that I was planning on back-patching this to v16, but since is_valid_ascii() was introduced in v15, I'm wondering if it'd be better to back-patch it there so that is_valid_ascii() lives in the same file for all versions where it exists. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > I see that I was planning on back-patching this to v16, but since > is_valid_ascii() was introduced in v15, I'm wondering if it'd be better to > back-patch it there so that is_valid_ascii() lives in the same file for all > versions where it exists. Thoughts? Yeah, if we're going to back-patch at all, that probably makes sense. regards, tom lane
On Fri, Jan 26, 2024 at 01:24:19PM -0500, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> I see that I was planning on back-patching this to v16, but since >> is_valid_ascii() was introduced in v15, I'm wondering if it'd be better to >> back-patch it there so that is_valid_ascii() lives in the same file for all >> versions where it exists. Thoughts? > > Yeah, if we're going to back-patch at all, that probably makes sense. Committed/back-patched. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com