Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoBwxgfkMYxgPWyrLG-r8-ptVKjd=jhncY_QAaVJYhQQdw@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
On Wed, Jun 11, 2025 at 7:34 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, May 26, 2025 at 10:04:05AM +0900, Sutou Kouhei wrote: > > As I already said, I don't have a strong opinion on which > > approach is better. My opinion for the (important) second > > point is no. I feel that the pros of a. isn't realistic. If > > users want to improve text/csv/binary performance (or > > something), they should improve PostgreSQL itself instead of > > replacing it as an extension. (Or they should create another > > custom copy format such as "faster_text" not "text".) > > Patches welcome. Andres may have a TODO board regarding that, I > think. > > > So I'm OK with the approach b. > > Here is an opinion. Thank you for the comments. > > Approach (b), that uses _PG_init() a function to register a custom > format has the merit to be simple to implement and secure by "design", > because it depends only on the fact that we can do a lookup based on > the string defined in one or more DefElems. Adding a dependendy to > search_path as you say could lead to surprising results. > > Using a shared ID when a COPY method is registered (like extension > wait events) or an ID that's static in a backend (like EXPLAIN > extensibility does) is an implementation difference that can be useful > for monitoring, and only that AFAIK. If you want to implement > method-based statistics for COPY, you will want to allocate one stats > kind for each COPY method, because the stats stored will be aggregates > of the COPY methods. The stats kind ID is something that should not > be linked to the COPY method ID, because the stats kind ID is > registered in its own dedicated path, and it would be hardcoded in the > library where the COPY callbacks are defined. So you could have a > stats kind with a fixed ID, and a COPY method ID that's linked to each > backend like EXPLAIN does. Good point. > > One factor to take into account is how much freedom we are OK with > giving to users when it comes to the deployment of custom COPY > methods, and how popular these would be. Cloud is popular these days, > so folks may want to be able to define pointers to functions that are > run in something else than C, as long as the language is trusted. My > take on this part is that we are not going to see many formats out > there that would benefit from these callbacks, so asking for people to > deploy a .so on disk that can only be LOAD'ed or registered with one > of the preloading GUCs should be enough to satisfy most users, even if > the barrier entry to get that only a cloud instead like RDS or Azure > is higher. This has also the benefit in giving more control on the > COPY internals to cloud providers, as they are the ones who would be > in charge of saying if they're OK with a dedicated .so or not. Not > the users themselves. We've had a lot of bad PR and false CVEs in the > past with COPY FROM/TO PROGRAM and the fact that it requires > superusers. Having something in this area that gives more freedom to > the user with something like approach (a) (SQL functions allowed to > define the callback) will, I suspect, bite us back hard. That's a valid point and I agree. > > So, my opinion is to rely on _PG_init(), with a shared ID if you want > to expose the method used somewhere for monitoring tools. You could > as well implement the simpler set of APIs that allocates IDs local to > each backend, like EXPLAIN, then consider later if shared IDs are > really needed. The registration APIs don't have to be fixed in time > across releases, they can be always improved in steps as required. > What matters is ABI compatibility in the same major version once it is > released. +1 to start with a simpler set of APIs. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: