Re: [GENERAL] Why "copy ... from stdio" does not return immediately when reading invalid data? - Mailing list pgsql-hackers

From Nicolas Grilly
Subject Re: [GENERAL] Why "copy ... from stdio" does not return immediately when reading invalid data?
Date
Msg-id AANLkTimca1JWHAf1XGRtNAzMG4fNKqjyXN0qDjmA5Qdr@mail.gmail.com
Whole thread Raw
List pgsql-hackers
I have analyzed the PostgreSQL protocol using Wireshark (an open source packet analyzer), and I observed that the PostgreSQL backend, while doing a COPY ... FROM STDIN, reports errors as soon as possible (especially errors related to invalid data).

Therefore, the "late" reporting of errors while doing a COPY ... FROM STDIN is not a limitation of the underlying protocol; it is a limitation (or a design choice) of the libpq library.

It looks like this is a well known issue because it is listed on the todo list:

And was discussed before:

Do you think it is possible to change that behavior, or work around it?

While reading libpq source code, I noticed the function pqParseInput3 (file fe-protocol3.c) ignores error responses while the connection is in PGASYNC_COPY_IN state. Maybe we can make a special case for the COPY FROM subprotocol and handle errors early, in order to make them available to PQgetResult? Is is feasible in a simple way or is it a bad idea?

Regards,

Nicolas Grilly

On Wed, Feb 2, 2011 at 20:06, John R Pierce <pierce@hogranch.com> wrote:
On 02/02/11 10:20 AM, Nicolas Grilly wrote:
Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send gigabytes of data with just one "copy ... from stdio" query, and is there a way to be notified of a potential error before calling PQputCopyEnd? Or do I have to send my data in small chunks (for example batch of 10000 rows), issue a PQputCopyEnd, check for errors, and continue with the next chunk?

I would batch the data, maybe 1000 lines or even 100 lines at a time if these errors are at all frequent.  put the errored batches in an exception list or something so you can sort them out later.


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: little mistakes in HS/SR
Next
From: Tom Lane
Date:
Subject: Re: A different approach to extension NO USER DATA feature