Home > mailing lists

Re: Hide exposed impl detail of wchar.c - Mailing list pgsql-hackers

From	Jubilee Young
Subject	Re: Hide exposed impl detail of wchar.c
Date	November 20, 2023 18:50:36
Msg-id	CAPNHn3pGK2cvg4xACegaXjbRkun26F--7d2T4aT5ASvExMZpjg@mail.gmail.com Whole thread Raw
In response to	Re: Hide exposed impl detail of wchar.c (John Naylor <johncnaylorls@gmail.com>)
Responses	Re: Hide exposed impl detail of wchar.c
List	pgsql-hackers

Tree view

On Fri, Nov 17, 2023 at 2:26 AM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Fri, Nov 17, 2023 at 5:54 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >
> > It looks like is_valid_ascii() was originally added to pg_wchar.h so that
> > it could easily be used elsewhere [0] [1], but that doesn't seem to have
> > happened yet.
> >
> > Would moving this definition to a separate header file be a viable option?
>
> Seems fine to me. (I believe the original motivation for making it an
> inline function was for in pg_mbstrlen_with_len(), but trying that
> hasn't been a priority.)

In that case, I took a look across the codebase and saw a
utils/ascii.h that doesn't
seem to have gotten much love, but I suppose one could argue that it's intended
to be a backend-only header file?

As the codebase is growing some enhanced UTF-8 support, you'll want somewhere
that contains the optimized US-ASCII routines, because, as US-ASCII is
a subset of
UTF-8, and often faster to handle, it's typical for such codepaths to look like

```c
while (i < len && no_multibyte_chars) {
   i = i + ascii_op_version(i, buffer, &no_multibyte_chars);
}

while (i < len) {
    i = i + utf8_op_version(i, buffer);
}
```

So it should probably end up living somewhere near the UTF-8 support, and
the easiest way to make it not go into something pgrx currently
includes would be
to make it a new header file, though there's a fair amount of API we
don't touch.

From the pgrx / Rust perspective, Postgres function calls are passed
via callback
to a "guard function" that guarantees that longjmp and setjmp don't
cause trouble
(and makes sure we participate in that). So we only want to call
Postgres functions
if we "can't replace" them, as the overhead is quite a lot. That means
UTF-8-per-se
functions aren't very interesting to us as the Rust language already
supports it, but
we do benefit from access to transcoding to/from UTF-8.

—Jubilee

pgsql-hackers by date:

From: Robert Haas
Date: 20 November 2023, 18:44:26
Subject: Re: Add recovery to pg_control and remove backup_label

From: Alvaro Herrera
Date: 20 November 2023, 19:03:01
Subject: Re: trying again to get incremental backup

Re: Hide exposed impl detail of wchar.c - Mailing list pgsql-hackers

Previous

Next