Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoAY_h-9nuhs14e3cyO_A2rH7==zuq+NPHkn9ggwyaXnPQ@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
List pgsql-hackers
On Fri, May 9, 2025 at 1:51 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoD9CBjh4u6jdiE0tG-jvejw-GJN8fUPoQSVhKh36HW2NQ@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 23:37:46 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > The progress information is stored in PgBackendStatus defined in
> > backend_status.h:
> >
> >     /*
> >      * Command progress reporting.  Any command which wishes can advertise
> >      * that it is running by setting st_progress_command,
> >      * st_progress_command_target, and st_progress_param[].
> >      * st_progress_command_target should be the OID of the relation which the
> >      * command targets (we assume there's just one, as this is meant for
> >      * utility commands), but the meaning of each element in the
> >      * st_progress_param array is command-specific.
> >      */
> >     ProgressCommandType st_progress_command;
> >     Oid         st_progress_command_target;
> >     int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> >
> > Then the progress view maps the numbers to the corresponding strings:
> >
> > CREATE VIEW pg_stat_progress_copy AS
> >     SELECT
> >         S.pid AS pid, S.datid AS datid, D.datname AS datname,
> >         S.relid AS relid,
> >         CASE S.param5 WHEN 1 THEN 'COPY FROM'
> >                       WHEN 2 THEN 'COPY TO'
> >                       END AS command,
> >         CASE S.param6 WHEN 1 THEN 'FILE'
> >                       WHEN 2 THEN 'PROGRAM'
> >                       WHEN 3 THEN 'PIPE'
> >                       WHEN 4 THEN 'CALLBACK'
> >                       END AS "type",
> >         S.param1 AS bytes_processed,
> >         S.param2 AS bytes_total,
> >         S.param3 AS tuples_processed,
> >         S.param4 AS tuples_excluded,
> >         S.param7 AS tuples_skipped
> >     FROM pg_stat_get_progress_info('COPY') AS S
> >         LEFT JOIN pg_database D ON S.datid = D.oid;
>
> Thanks. I didn't know about how to implement
> pg_stat_progress_copy.
>
> > So the idea is that the backend process sets the format ID somewhere
> > in st_progress_param, and then the progress view calls a SQL function,
> > say pg_stat_get_copy_format_name(), with the format ID that returns
> > the corresponding format name.
>
> Does it work when we use session_preload_libraries or the
> LOAD command? If we have 2 sessions and both of them load
> "jsonlines" COPY FORMAT extensions, what will be happened?
>
> For example:
>
> 1. Session 1: Register "jsonlines"
> 2. Session 2: Register "jsonlines"
>               (Should global format ID <-> format name mapping
>               be updated?)
> 3. Session 2: Close this session.
>               Unregister "jsonlines".
>               (Can we unregister COPY FORMAT extension?)
>               (Should global format ID <-> format name mapping
>               be updated?)
> 4. Session 1: Close this session.
>               Unregister "jsonlines".
>               (Can we unregister COPY FORMAT extension?)
>               (Should global format ID <-> format name mapping
>               be updated?)

I imagine that only for progress reporting purposes, I think session 1
and 2 can have different format IDs for the same 'jsonlines' if they
load it by LOAD command. They can advertise the format IDs on the
shmem and we can also provide a SQL function for the progress view
that can get the format name by the format ID.

Considering the possibility that we might want to use the format ID
also in the cumulative statistics, we might want to strictly provide
the unique format ID for each custom format as the format IDs are
serialized to the pgstat file. One possible way to implement it is
that we manage the custom format IDs in a wiki page like we do for
custom cumulative statistics and custom RMGR[1][2]. That is, a custom
format extension registers the format name along with the format ID
that is pre-registered in the wiki page or the format ID (e.g. 128)
indicating under development. If either the format name or format ID
conflict with an already registered custom format extension, the
registration function raises an error. And we preallocate enough
format IDs for built-in formats.

As for unregistration, I think that  even if we provide an
unregisteration API, it ultimately depends on whether or not custom
format extensions call it in _PG_fini().

Regards,

[1] https://wiki.postgresql.org/wiki/CustomCumulativeStats
[2] https://wiki.postgresql.org/wiki/CustomWALResourceManagers

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Amit Kapila
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart