Re: Small fix on COPY ON_ERROR document - Mailing list pgsql-hackers
From | David G. Johnston |
---|---|
Subject | Re: Small fix on COPY ON_ERROR document |
Date | |
Msg-id | CAKFQuwZJZ6uaS2B7qpL2FJzWBsnDdzgtbsW3pH9zuT6vC3fH+g@mail.gmail.com Whole thread Raw |
In response to | Re: Small fix on COPY ON_ERROR document (Yugo NAGATA <nagata@sraoss.co.jp>) |
Responses |
Re: Small fix on COPY ON_ERROR document
|
List | pgsql-hackers |
On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > Hi,
> >
> > I found that the documentation of COPY ON_ERROR said
> > COPY stops operation at the first error when
> > "ON_ERROR is not specified.", but it also stop when
> > ON_ERROR is specified to the default value, "stop".
> >
I'm finding the current wording surrounding ON_ERROR to not fit in with the rest of the page. I've tried to improve things by being a bit more invasive.
This first hunk updates the description of the COPY command to include describing the purpose of the ON_ERROR clause.
I too am concerned about the word "parsed" here, and "malformed" in the more detailed descriptions; this would need to be modified to better reflect the circumstances under which ignore happens.
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..5fe4c9f747 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
in the <structname>pg_stat_progress_copy</structname> view. See
<xref linkend="copy-progress-reporting"/> for details.
</para>
+
+ <para>
+ By default, <command>COPY</command> will abort if it encounters errors.
+ For non-binary format <command>COPY FROM</command> the user can specify
+ to instead ignore input rows that cannot be parsed by specifying
+ the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
+ clause.
+ </para>
</refsect1>
<refsect1>
index 21a5c4a052..5fe4c9f747 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
in the <structname>pg_stat_progress_copy</structname> view. See
<xref linkend="copy-progress-reporting"/> for details.
</para>
+
+ <para>
+ By default, <command>COPY</command> will abort if it encounters errors.
+ For non-binary format <command>COPY FROM</command> the user can specify
+ to instead ignore input rows that cannot be parsed by specifying
+ the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
+ clause.
+ </para>
</refsect1>
<refsect1>
The use of the word "Currently" stood out to me when reading the next hunk. We always document current reality and don't call out that fact. We do not imply some future feature may change how this all works. On a related note, right now we have stop and ignore, which are basically enums just like we have for format (csv, text, binary). Unlike format, though, this option requires single quotes. Why is that? I did keep with not specifying the enum in the main syntax block since format doesn't as well; though writing { stop | ignore } seemed quite appealing.
The rest of the page indicates what the default is in a sentence by itself, not as a parenthetical next to the option.
The command limitations seemed worthy of a separate paragraph, though it repeats the description which isn't great. I'm going to sleep on this one.
@@ -379,17 +387,16 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<listitem>
<para>
Specifies which <replaceable class="parameter">
- error_action</replaceable> to perform when there is malformed data in the input.
- Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
- values are supported.
- If the <literal>stop</literal> value is specified,
- <command>COPY</command> stops operation at the first error.
- If the <literal>ignore</literal> value is specified,
- <command>COPY</command> skips malformed data and continues copying data.
- The option is allowed only in <command>COPY FROM</command>.
- Only <literal>stop</literal> value is allowed when
- using <literal>binary</literal> format.
+ error_action</replaceable> to perform when encountering a malformed input row:
+ <literal>stop</literal> or <literal>ignore</literal>.
+ A value of <literal>stop</literal> means fail the command, while
+ <literal>ignore</literal> means discard the input row and continue with the next one.
+ The default is <literal>stop</literal>
</para>
+ <para>
+ The ignore option is applicable only for <command>COPY FROM</command>
+ when the <literal>FORMAT</literal> is <literal>text</literal>
+ or <literal>csv</literal>.
+ </para>
</listitem>
</varlistentry>
The following was, IMO, too much text about something not to worry about, COPY TO and ignore mode. The point is dead tuples on error and running vacuum. It doesn't really even matter what caused the error, though if the error was even before rows started to be imported then obviously there would be no dead tuples.
</listitem>
</varlistentry>
The following was, IMO, too much text about something not to worry about, COPY TO and ignore mode. The point is dead tuples on error and running vacuum. It doesn't really even matter what caused the error, though if the error was even before rows started to be imported then obviously there would be no dead tuples.
Oh, and the word "first" really isn't adding anything here that we cannot reasonably leave to common sense, IMO. We either ignore all errors or stop on the first one. There isn't a stop mode that is capable of bypassing errors and the ignore mode doesn't have a "first n" option so one assumes all errors are ignored.
@@ -576,15 +583,12 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error when
- <literal>ON_ERROR</literal> is not specified. This
- should not lead to problems in the event of a <command>COPY
- TO</command>, but the target table will already have received
- earlier rows in a <command>COPY FROM</command>. These rows will not
- be visible or accessible, but they still occupy disk space. This might
- amount to a considerable amount of wasted disk space if the failure
- happened well into a large copy operation. You might wish to invoke
- <command>VACUUM</command> to recover the wasted space.
+ A failed <command>COPY FROM</command> command will have performed
+ physical insertion of all rows prior to the malformed one.
+ While these rows will not be visible or accessible, they still occupy
+ disk space. This might amount to a considerable amount of wasted disk
+ space if the failure happened well into a large copy operation.
+ <command>VACUUM</command> should be used to recover the wasted space.
</para>
<para>
David J.
pgsql-hackers by date: