Thread: Adding Support for Copy callback functionality on COPY TO api

Adding Support for Copy callback functionality on COPY TO api

From
"Sanaba, Bilva"
Date:

Hi hackers,

 

Currently, the COPY TO api does not support callback functions, while the COPY FROM api does. The COPY TO code does, however, include placeholders for supporting callbacks in the future. 

 

Rounding out the support of callback functions to both could be very beneficial for extension development. In particular, supporting callbacks for COPY TO will allow developers to utilize the preexisting command in order to create tools that give users more support for moving data for storage, backup, analytics, etc. 

 

We are aiming to get the support in core PostgreSQL and add COPY TO callback support in the next commitfest.  The attached patch contains a change to COPY TO api to support callbacks. 

 

Best,

Bilva

 

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Soumyadeep Chakraborty
Date:
Hi Bilva,

Thank you for registering this patch!

I had a few suggestions:

1. Please run pg_indent[1] on your code. Make sure you add
copy_data_destination_cb to src/tools/pgindent/typedefs.list. Please
run pg_indent on only the files you changed (it will take files as
command line args)

2. For features such as this, it is often helpful to find a use case
within backend/utility/extension code that demonstrate thes callback and
to include the code to exercise it with the patch. Refer how
copy_read_data() is used as copy_data_source_cb, to copy the data from
the query results from the WAL receiver (Refer: copy_table()). Finding
a similar use case in the source tree will make a stronger case
for this patch.

3. Wouldn't we want to return the number of bytes written from
copy_data_destination_cb? (Similar to copy_data_source_cb) We should
also think about how to represent failure. Looking at CopySendEndOfRow(),
we should error out like we do for the other copy_dests after checking the
return value for the callback invocation.

4.
> bool pipe = (cstate->filename == NULL) && (cstate->data_destination_cb == NULL);

I think a similar change should also be applied to BeginCopyFrom() and
CopyFrom(). Or even better, such code could be refactored to have a
separate destination type COPY_PIPE. This of course, will be a separate
patch. I think the above line is okay for this patch.

Regards,
Soumyadeep



Re: Adding Support for Copy callback functionality on COPY TO api

From
Michael Paquier
Date:
On Mon, Sep 14, 2020 at 04:28:12PM -0700, Soumyadeep Chakraborty wrote:
> I think a similar change should also be applied to BeginCopyFrom() and
> CopyFrom(). Or even better, such code could be refactored to have a
> separate destination type COPY_PIPE. This of course, will be a separate
> patch. I think the above line is okay for this patch.

This feedback has not been answered after two weeks, so I have marked
the patch as returned with feedback.
--
Michael

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
"Andrey V. Lepikhov"
Date:
On 7/2/20 2:41 AM, Sanaba, Bilva wrote:
> Hi hackers,
> 
> Currently, the COPY TO api does not support callback functions, while 
> the COPY FROM api does. The COPY TO code does, however, include 
> placeholders for supporting callbacks in the future.
> 
> Rounding out the support of callback functions to both could be very 
> beneficial for extension development. In particular, supporting 
> callbacks for COPY TO will allow developers to utilize the preexisting 
> command in order to create tools that give users more support for moving 
> data for storage, backup, analytics, etc.
> 
> We are aiming to get the support in core PostgreSQL and add COPY TO 
> callback support in the next commitfest.The attached patch contains a 
> change to COPY TO api to support callbacks.
> 
Your code almost exactly the same as proposed in [1] as part of 'Fast 
COPY FROM' command. But it seems there are differences.

[1] 
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: Adding Support for Copy callback functionality on COPY TO api

From
Nathan Bossart
Date:
On Wed, Sep 30, 2020 at 04:41:51PM +0900, Michael Paquier wrote:
> This feedback has not been answered after two weeks, so I have marked
> the patch as returned with feedback.

I've rebased this patch and will register it in the next commitfest
shortly.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Michael Paquier
Date:
On Tue, Aug 02, 2022 at 04:49:19PM -0700, Nathan Bossart wrote:
> I've rebased this patch and will register it in the next commitfest
> shortly.

Perhaps there should be a module in src/test/modules/ to provide a
short, still useful, example of what this could achieve?
--
Michael

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Nathan Bossart
Date:
On Fri, Oct 07, 2022 at 03:49:31PM +0900, Michael Paquier wrote:
> Perhaps there should be a module in src/test/modules/ to provide a
> short, still useful, example of what this could achieve?

Here is an attempt at adding such a test module.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Michael Paquier
Date:
On Fri, Oct 07, 2022 at 02:48:24PM -0700, Nathan Bossart wrote:
> Here is an attempt at adding such a test module.

Using an ereport(NOTICE) to show the data reported in the callback is
fine by me.  How about making the module a bit more modular, by
passing as argument a regclass and building a list of arguments with
it?  You may want to hold the ShareAccessLock on the relation until
the end of the transaction in this example.
--
Michael

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Nathan Bossart
Date:
On Sat, Oct 08, 2022 at 02:11:38PM +0900, Michael Paquier wrote:
> Using an ereport(NOTICE) to show the data reported in the callback is
> fine by me.  How about making the module a bit more modular, by
> passing as argument a regclass and building a list of arguments with
> it?  You may want to hold the ShareAccessLock on the relation until
> the end of the transaction in this example.

Yeah, that makes more sense.  It actually simplifies things a bit, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Nathan Bossart
Date:
On Sat, Oct 08, 2022 at 10:37:41AM -0700, Nathan Bossart wrote:
> Yeah, that makes more sense.  It actually simplifies things a bit, too.

Sorry for the noise.  There was an extra #include in v4 that I've removed
in v5.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Bharath Rupireddy
Date:
On Sun, Oct 9, 2022 at 2:44 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Sorry for the noise.  There was an extra #include in v4 that I've removed
> in v5.

IIUC, COPY TO callback helps move a table's data out of postgres
server. Just wondering, how is it different from existing solutions
like COPY TO ... PROGRAM/FILE, logical replication, pg_dump etc. that
can move a table's data out? I understandb that the COPY FROM callback
was needed for logical replication 7c4f52409. Mentioning a concrete
use-case helps here.

I'm not quite sure if we need a separate module to just tell how to
use this new callback. I strongly feel that it's not necessary. It
unnecessarily creates extra code (actual code is 25 LOC with v1 patch
but 150 LOC with v5 patch) and can cause maintenance burden. These
callback APIs are simple enough to understand for those who know
BeginCopyTo() or BeginCopyFrom() and especially for those who know how
to write extensions. These are not APIs that an end-user uses. The
best would be to document both COPY FROM and COPY TO callbacks,
perhaps with a pseudo code specifying just the essence [1], and their
possible usages somewhere here
https://www.postgresql.org/docs/devel/sql-copy.html.

The order of below NOTICE messages isn't guaranteed and it can change
depending on platforms. Previously, we've had to suppress such
messages in the test output 6adc5376d.

+SELECT test_copy_to_callback('public.test'::pg_catalog.regclass);
+NOTICE:  COPY TO callback called with data "1    2    3" and length 5
+NOTICE:  COPY TO callback called with data "12    34    56" and length 8
+NOTICE:  COPY TO callback called with data "123    456    789" and length 11
+ test_copy_to_callback

[1]
+    Relation    rel = table_open(PG_GETARG_OID(0), AccessShareLock);
+    CopyToState cstate;
+
+    cstate = BeginCopyTo(NULL, rel, NULL, RelationGetRelid(rel), NULL, NULL,
+                         to_cb, NIL, NIL);
+    (void) DoCopyTo(cstate);
+    EndCopyTo(cstate);
+
+    table_close(rel, AccessShareLock);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Adding Support for Copy callback functionality on COPY TO api

From
Nathan Bossart
Date:
On Mon, Oct 10, 2022 at 12:41:40PM +0530, Bharath Rupireddy wrote:
> IIUC, COPY TO callback helps move a table's data out of postgres
> server. Just wondering, how is it different from existing solutions
> like COPY TO ... PROGRAM/FILE, logical replication, pg_dump etc. that
> can move a table's data out? I understandb that the COPY FROM callback
> was needed for logical replication 7c4f52409. Mentioning a concrete
> use-case helps here.

This new callback allows the use of COPY TO's machinery in extensions.  A
couple of generic use-cases are listed upthread [0], and one concrete
use-case is the aws_s3 extension [1].

> I'm not quite sure if we need a separate module to just tell how to
> use this new callback. I strongly feel that it's not necessary. It
> unnecessarily creates extra code (actual code is 25 LOC with v1 patch
> but 150 LOC with v5 patch) and can cause maintenance burden. These
> callback APIs are simple enough to understand for those who know
> BeginCopyTo() or BeginCopyFrom() and especially for those who know how
> to write extensions. These are not APIs that an end-user uses. The
> best would be to document both COPY FROM and COPY TO callbacks,
> perhaps with a pseudo code specifying just the essence [1], and their
> possible usages somewhere here
> https://www.postgresql.org/docs/devel/sql-copy.html.
> 
> The order of below NOTICE messages isn't guaranteed and it can change
> depending on platforms. Previously, we've had to suppress such
> messages in the test output 6adc5376d.

I really doubt that this small test case is going to cause anything
approaching undue maintenance burden.  I think it's important to ensure
this functionality continues to work as expected long into the future.

[0] https://postgr.es/m/253C21D1-FCEB-41D9-A2AF-E6517015B7D7%40amazon.com
[1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/postgresql-s3-export.html#aws_s3.export_query_to_s3

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Adding Support for Copy callback functionality on COPY TO api

From
Michael Paquier
Date:
On Mon, Oct 10, 2022 at 09:38:59AM -0700, Nathan Bossart wrote:
> This new callback allows the use of COPY TO's machinery in extensions.  A
> couple of generic use-cases are listed upthread [0], and one concrete
> use-case is the aws_s3 extension [1].

FWIW, I understand that the proposal is to have an easier control of
how, what and where to the data is processed.  COPY TO PROGRAM
provides that with exactly the same kind of interface (data input, its
length) once you have a program able to process the data piped out the
same way.  However, it is in the shape of an external process that
receives the data through a pipe hence it provides a much wider attack
surface which is something that all cloud provider care about.  The
thing is that this allows extension developers to avoid arbitrary
commands on the backend as the OS user running the Postgres instance,
while still being able to process the data the way they want
(auditing, analytics, whatever) within the strict context of the
process running an extension code.  I'd say that this is a very cheap
change to allow people to have more fun with the backend engine
(similar to the recent changes with archive libraries for
archive_command, but much less complex):
 src/backend/commands/copy.c   |  2 +-
 src/backend/commands/copyto.c | 18 +++++++++++++++---
 2 files changed, 16 insertions(+), 4 deletions(-)

(Not to mention that we've had our share of CVEs regarding COPY
PROGRAM even if it is superuser-only).

> I really doubt that this small test case is going to cause anything
> approaching undue maintenance burden.  I think it's important to ensure
> this functionality continues to work as expected long into the future.

I like these toy modules, they provide test coverage while acting as a
template for new developers.  I am wondering whether it should have
something for the copy from callback, actually, as it is named
"test_copy_callbacks" but I see no need to extend the module more than
necessary in the context of this thread (logical decoding uses it,
anyway).
--
Michael

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Nathan Bossart
Date:
On Tue, Oct 11, 2022 at 09:01:41AM +0900, Michael Paquier wrote:
> I like these toy modules, they provide test coverage while acting as a
> template for new developers.  I am wondering whether it should have
> something for the copy from callback, actually, as it is named
> "test_copy_callbacks" but I see no need to extend the module more than
> necessary in the context of this thread (logical decoding uses it,
> anyway).

Yeah, I named it that way because I figured we might want a test for the
COPY FROM callback someday.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Adding Support for Copy callback functionality on COPY TO api

From
Michael Paquier
Date:
On Wed, Sep 30, 2020 at 01:48:12PM +0500, Andrey V. Lepikhov wrote:
> Your code almost exactly the same as proposed in [1] as part of 'Fast COPY
> FROM' command. But it seems there are differences.
>
> [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

I have been looking at what you have here while reviewing the contents
of this thread, and it seems to me that you should basically be able
to achieve the row-level control that your patch is doing with the
callback to do the per-row processing posted here.  The main
difference, though, is that you want to have more control at the
beginning and the end of the COPY TO processing which explains the
split of DoCopyTo().  I am a bit surprised to see this much footprint
in the backend code once there are two FDW callbacks to control the
beginning and the end of the COPY TO, to be honest, sacrifying a lot
the existing symmetry between the COPY TO and COPY FROM code paths
where there is currently a strict control on the pre-row and post-row
processing like the per-row memory context.
--
Michael

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Michael Paquier
Date:
On Mon, Oct 10, 2022 at 05:06:39PM -0700, Nathan Bossart wrote:
> Yeah, I named it that way because I figured we might want a test for the
> COPY FROM callback someday.

Okay.  So, I have reviewed the whole thing, added a description of all
the fields of BeginCopyTo() in its top comment, tweaked a few things
and added in the module an extra NOTICE with the number of processed
rows.  The result seemed fine, so applied.
--
Michael

Attachment

Re: Adding Support for Copy callback functionality on COPY TO api

From
Nathan Bossart
Date:
On Tue, Oct 11, 2022 at 11:52:03AM +0900, Michael Paquier wrote:
> Okay.  So, I have reviewed the whole thing, added a description of all
> the fields of BeginCopyTo() in its top comment, tweaked a few things
> and added in the module an extra NOTICE with the number of processed
> rows.  The result seemed fine, so applied.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com