Thread: on_error table, saving error info to a table

on_error table, saving error info to a table

From

jian he

Date:

03 February 2024, 06:22:34

Hi.
I previously did some work in COPY FROM save error information to a table.
still based on this suggestion:
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us
Now I refactored it.

the syntax:
ON_ERROR 'table', TABLE 'error_saving_tbl'

if ON_ERROR is not specified with 'table', TABLE is specified, then error.
if ON_ERROR is specified with 'table', TABLE is not specified or
error_saving_tbl does not exist, then error.

In BeginCopyFrom, we check the data definition of error_saving_table,
we also check if the user has INSERT privilege to error_saving_table
(all the columns).
We also did a preliminary check of the lock condition of error_saving_table.

if it does not meet these conditions, we quickly error out.
error_saving_table will be the same schema as the copy from table.

Because "table" is a keyword, I have to add the following changes to gram.y.
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3420,6 +3420,10 @@ copy_opt_item:
  {
  $$ = makeDefElem("null", (Node *) makeString($3), @1);
  }
+ | TABLE opt_as Sconst
+ {
+ $$ = makeDefElem("table", (Node *) makeString($3), @1);
+ }

since "table" is already a keyword, so there is no influence on the
parsing speed?

demo:

create table err_tbl(
    userid oid, -- the user oid while copy generated this entry
    copy_tbl oid, --copy table
    filename text,
    lineno  int8,
    line    text,
    colname text,
    raw_field_value text,
    err_message text,
    err_detail text,
    errorcode text
);
create table t_copy_tbl(a int, b int, c int);
COPY t_copy_tbl FROM STDIN WITH (delimiter ',', on_error 'table',
table err_tbl);
1,2,a
\.

table err_tbl \gx
-[ RECORD 1 ]---+-------------------------------------------
userid          | 10
copy_tbl        | 17920
filename        | STDIN
lineno          | 1
line            | 1,2,a
colname         | c
raw_field_value | a
err_message     | invalid input syntax for type integer: "a"
err_detail      |
errorcode       | 22P02

Attachment

v1-0001-on_error-table-saving-error-info-to-a-table.patch

Re: on_error table, saving error info to a table

From

Nishant Sharma

Date:

15 July 2024, 05:42:40

Thanks for the patch!

I was not able to understand why we need a special error table for COPY?

Can't we just log it in a new log error file for COPY in a proper format? Like

using table approach, PG would be unnecessarily be utilising its resources like

extra CPU to format the data in pages to store in its table format, writing and

keeping the table in its store (which may or may not be required), the user

would be required to make sure it creates the error table with proper columns

to be used in COPY, etc..

Meanwhile, please find some quick review comments:-

1) Patch needs re-base.

2) If the columns of the error table are fixed, then why not create it internally using

some function or something instead of making the user create the table correctly

with all the columns?

3) I think, error messages can be improved, which looks big to me.

4) I think no need of below pstrdup, As CStringGetTextDatum creates a text copy for

the same:-
err_code = pstrdup(unpack_sql_state(cstate->escontext->error_data->sqlerrcode));

t_values[9] = CStringGetTextDatum(err_code);

5) Should 'on_error_rel' as not NULL be checked along with below, as I can see it is

passed as NULL from two locations?
+ if (cstate->opts.on_error == COPY_ON_ERROR_TABLE)
+ {

6) Below declarations can be shifted to the if block, where they are used. Instead of

keeping them as global in function?
+ HeapTuple on_error_tup;
+ TupleDesc on_error_tupDesc;

+ if (cstate->opts.on_error == COPY_ON_ERROR_TABLE)
+ {

7) Below comment going beyond 80 char width:-
* if on_error is specified with 'table', then on_error_rel is the error saving table

8) Need space after 'false'
err_detail ? false: true;

9) Below call can fit in a single line. No need to keep the 2nd and 3rd parameter in

nextlines.
+ on_error_tup = heap_form_tuple(on_error_tupDesc,
+ t_values,
+ t_isnull);

10) Variable declarations Tab Spacing issue at multiple places.

11) Comments in the patch are not matched to PG comment style.

Kindly note I have not tested the patch properly yet. Only checked it with a positive test

case. As I will wait for a conclusion on my opinion of the proposed patch.

Regards,

Nishant Sharma.

EnterpriseDB, Pune.

On Sat, Feb 3, 2024 at 11:52 AM jian he <jian.universality@gmail.com> wrote:

Hi.
I previously did some work in COPY FROM save error information to a table.
still based on this suggestion:
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us
Now I refactored it.

the syntax:
ON_ERROR 'table', TABLE 'error_saving_tbl'

if ON_ERROR is not specified with 'table', TABLE is specified, then error.
if ON_ERROR is specified with 'table', TABLE is not specified or
error_saving_tbl does not exist, then error.

In BeginCopyFrom, we check the data definition of error_saving_table,
we also check if the user has INSERT privilege to error_saving_table
(all the columns).
We also did a preliminary check of the lock condition of error_saving_table.

if it does not meet these conditions, we quickly error out.
error_saving_table will be the same schema as the copy from table.

Because "table" is a keyword, I have to add the following changes to gram.y.
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3420,6 +3420,10 @@ copy_opt_item:
{
$$ = makeDefElem("null", (Node *) makeString($3), @1);
}
+ | TABLE opt_as Sconst
+ {
+ $$ = makeDefElem("table", (Node *) makeString($3), @1);
+ }

since "table" is already a keyword, so there is no influence on the
parsing speed?

demo:

create table err_tbl(
userid oid, -- the user oid while copy generated this entry
copy_tbl oid, --copy table
filename text,
lineno int8,
line text,
colname text,
raw_field_value text,
err_message text,
err_detail text,
errorcode text
);
create table t_copy_tbl(a int, b int, c int);
COPY t_copy_tbl FROM STDIN WITH (delimiter ',', on_error 'table',
table err_tbl);
1,2,a
\.

table err_tbl \gx
-[ RECORD 1 ]---+-------------------------------------------
userid | 10
copy_tbl | 17920
filename | STDIN
lineno | 1
line | 1,2,a
colname | c
raw_field_value | a
err_message | invalid input syntax for type integer: "a"
err_detail |
errorcode | 22P02

Re: on_error table, saving error info to a table

From

Nishant Sharma

Date:

05 November 2024, 13:30:30

Thanks for the v2 patch!

I see v1 review comments got addressed in v2 along with some

further improvements.

1) v2 Patch again needs re-base.

2) I think we need not worry whether table name is unique or not,

table name can be provided by user and we can check if it does

not exists then simply we can create it with appropriate columns,

if it exists we use it to check if its correct on_error table and

proceed.

3) Using #define in between the code? I don't see that style in

copyfromparse.c file. I do see such style in other src file. So, not

sure if committer would allow it or not.

#define ERROR_TBL_COLUMNS 10

4) Below appears redundant to me, it was not the case in v1 patch

set, where it had only one return and one increment of error as new

added code was at the end of the block:-

+ cstate->num_errors++;
+ return true;
+ }
cstate->num_errors++;

I was not able to test the v2 due to conflicts in v2, I did attempt to

resolve but I saw many failures in make world.

Regards,

Nishant.

On Tue, Aug 20, 2024 at 5:31 AM jian he <jian.universality@gmail.com> wrote:

On Mon, Jul 15, 2024 at 1:42 PM Nishant Sharma
<nishant.sharma@enterprisedb.com> wrote:
>
> Thanks for the patch!
>
> I was not able to understand why we need a special error table for COPY?
> Can't we just log it in a new log error file for COPY in a proper format? Like
> using table approach, PG would be unnecessarily be utilising its resources like
> extra CPU to format the data in pages to store in its table format, writing and
> keeping the table in its store (which may or may not be required), the user
> would be required to make sure it creates the error table with proper columns
> to be used in COPY, etc..
>
>
> Meanwhile, please find some quick review comments:-
>
> 1) Patch needs re-base.
>
> 2) If the columns of the error table are fixed, then why not create it internally using
> some function or something instead of making the user create the table correctly
> with all the columns?

I'll think about it more.
previously, i tried to use SPI to create tables, but at that time, i
thought that's kind of excessive.
you need to create the table, check whether the table name is unique,
check the privilege.
now we quickly error out if the error saving table definition does not
meet. I guess that's less work to do.

> 3) I think, error messages can be improved, which looks big to me.
>
> 4) I think no need of below pstrdup, As CStringGetTextDatum creates a text copy for
> the same:-
> err_code = pstrdup(unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
>
> t_values[9] = CStringGetTextDatum(err_code);
>
> 5) Should 'on_error_rel' as not NULL be checked along with below, as I can see it is
> passed as NULL from two locations?
> + if (cstate->opts.on_error == COPY_ON_ERROR_TABLE)
> + {
>
> 6) Below declarations can be shifted to the if block, where they are used. Instead of
> keeping them as global in function?
> + HeapTuple on_error_tup;
> + TupleDesc on_error_tupDesc;
>
> + if (cstate->opts.on_error == COPY_ON_ERROR_TABLE)
> + {
>
> 7) Below comment going beyond 80 char width:-
> * if on_error is specified with 'table', then on_error_rel is the error saving table
>
> 8) Need space after 'false'
> err_detail ? false: true;
>
> 9) Below call can fit in a single line. No need to keep the 2nd and 3rd parameter in
> nextlines.
> + on_error_tup = heap_form_tuple(on_error_tupDesc,
> + t_values,
> + t_isnull);
>
> 10) Variable declarations Tab Spacing issue at multiple places.
>
> 11) Comments in the patch are not matched to PG comment style.
>
>
> Kindly note I have not tested the patch properly yet. Only checked it with a positive test
> case. As I will wait for a conclusion on my opinion of the proposed patch.
>
almost all these issues have been addressed.
The error messages are still not so good. I need to polish it.

Re: on_error table, saving error info to a table

From

Kirill Reshke

Date:

03 December 2024, 13:28:40

On Tue, 3 Dec 2024 at 09:29, jian he <jian.universality@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 6:30 PM Nishant Sharma
> <nishant.sharma@enterprisedb.com> wrote:
> >
> > Thanks for the v2 patch!
> >
> > I see v1 review comments got addressed in v2 along with some
> > further improvements.
> >
> > 1) v2 Patch again needs re-base.
> >
> > 2) I think we need not worry whether table name is unique or not,
> > table name can be provided by user and we can check if it does
> > not exists then simply we can create it with appropriate columns,
> > if it exists we use it to check if its correct on_error table and
> > proceed.
>
> "simply we can create it with appropriate columns,"
> that would be more work.
> so i stick to if there is a table can use to
> error saving then use it, otherwise error out.
>
>
> >
> > 3) Using #define in between the code? I don't see that style in
> > copyfromparse.c file. I do see such style in other src file. So, not
> > sure if committer would allow it or not.
> > #define ERROR_TBL_COLUMNS   10
> >
> let's wait and see.
>
> > 4) Below appears redundant to me, it was not the case in v1 patch
> > set, where it had only one return and one increment of error as new
> > added code was at the end of the block:-
> > +                   cstate->num_errors++;
> > +                   return true;
> > +               }
> >                 cstate->num_errors++;
> >
> changed per your advice.
>
> > I was not able to test the v2 due to conflicts in v2, I did attempt to
> > resolve but I saw many failures in make world.
> >
> I get rid of all the SPI code.
>
> Instead, now I iterate through AttributeRelationId to check if the
> error saving table is ok or not,
> using DirectFunctionCall3 to do the privilege check.
> removed gram.y change, turns out it is not necessary.
> and other kinds of refactoring.
>
> please check attached.


Hi!

1)
> + switch (attForm->attnum)
> + {
> + case 1:
> + (.....)
> + case 2:

case 1,2,3 ... Is too random. Other parts of core tend to use `#define
Anum_<relname>_<columname> <num>`. Can we follow this style?

2)
>+ /*
> + * similar to commit a9cf48a
> + * (https://postgr.es/m/7bcfc39d4176faf85ab317d0c26786953646a411.camel@cybertec.at)
> + * in COPY FROM keep error saving table locks until the transaction end.
> + */

I can rarely see other comments referencing commits, and even few
referencing a mail archive thread.
Can we just write proper comment explaining the reasons?


===== overall

Patch design is a little dubious for me. We give users some really
incomprehensible API. To use on_error *relation* feature user must
create tables with proper schema.
Maybe a better design will be to auto-create on_error table if this
table does not exist.


Thoughts?

--
Best regards,
Kirill Reshke

Re: on_error table, saving error info to a table

From

Nishant Sharma

Date:

11 December 2024, 14:41:02

Thanks for the v3 patch!

Please find review comments on v3:-

1) I think no need to change the below if condition, we can keep

it the way it was before i.e with

"cstate->opts.on_error != COPY_ON_ERROR_STOP" and we

add a new error ereport the way v3 has. Because for

cstate->opts.on_error as COPY_ON_ERROR_STOP cases we

can avoid two if conditions inside upper if.

+ if (cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)

2) No need for the below "if" check for maxattnum. We can simply

increment it with "++maxattnum" and later check if we have exactly

10 attributes for the error table. Because even if we drop any

attribute and maxattnum is 10 in pg_attribute for that rel, we should

still error out. Maybe we can rename it to "totalatts"?

+ if (maxattnum <= attForm->attnum)
+ maxattnum = attForm->attnum;

3) #define would be better, also as mentioned by Kirill switch

condition with proper #define would be better.

+ if (maxattnum != 10)
+ on_error_tbl_ok = false;

4)

>
> that would be more work.
> so i stick to if there is a table can use to
> error saving then use it, otherwise error out.

>
YES. but that would lead to a better design with an error table.

Also, I think Krill mentions the same. That is to auto create, if it

does not exist.

Regards,

Nishant Sharma (EDB).

On Tue, Dec 3, 2024 at 3:58 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 3 Dec 2024 at 09:29, jian he <jian.universality@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 6:30 PM Nishant Sharma
> <nishant.sharma@enterprisedb.com> wrote:
> >
> > Thanks for the v2 patch!
> >
> > I see v1 review comments got addressed in v2 along with some
> > further improvements.
> >
> > 1) v2 Patch again needs re-base.
> >
> > 2) I think we need not worry whether table name is unique or not,
> > table name can be provided by user and we can check if it does
> > not exists then simply we can create it with appropriate columns,
> > if it exists we use it to check if its correct on_error table and
> > proceed.
>
> "simply we can create it with appropriate columns,"
> that would be more work.
> so i stick to if there is a table can use to
> error saving then use it, otherwise error out.
>
>
> >
> > 3) Using #define in between the code? I don't see that style in
> > copyfromparse.c file. I do see such style in other src file. So, not
> > sure if committer would allow it or not.
> > #define ERROR_TBL_COLUMNS 10
> >
> let's wait and see.
>
> > 4) Below appears redundant to me, it was not the case in v1 patch
> > set, where it had only one return and one increment of error as new
> > added code was at the end of the block:-
> > + cstate->num_errors++;
> > + return true;
> > + }
> > cstate->num_errors++;
> >
> changed per your advice.
>
> > I was not able to test the v2 due to conflicts in v2, I did attempt to
> > resolve but I saw many failures in make world.
> >
> I get rid of all the SPI code.
>
> Instead, now I iterate through AttributeRelationId to check if the
> error saving table is ok or not,
> using DirectFunctionCall3 to do the privilege check.
> removed gram.y change, turns out it is not necessary.
> and other kinds of refactoring.
>
> please check attached.

Hi!

1)
> + switch (attForm->attnum)
> + {
> + case 1:
> + (.....)
> + case 2:

case 1,2,3 ... Is too random. Other parts of core tend to use `#define
Anum_<relname>_<columname> <num>`. Can we follow this style?

2)
>+ /*
> + * similar to commit a9cf48a
> + * (https://postgr.es/m/7bcfc39d4176faf85ab317d0c26786953646a411.camel@cybertec.at)
> + * in COPY FROM keep error saving table locks until the transaction end.
> + */

I can rarely see other comments referencing commits, and even few
referencing a mail archive thread.
Can we just write proper comment explaining the reasons?

===== overall

Patch design is a little dubious for me. We give users some really
incomprehensible API. To use on_error *relation* feature user must
create tables with proper schema.
Maybe a better design will be to auto-create on_error table if this
table does not exist.

Thoughts?

--
Best regards,
Kirill Reshke

Re: on_error table, saving error info to a table

From

Andrew Dunstan

Date:

11 December 2024, 17:52:19

On 2024-12-02 Mo 11:28 PM, jian he wrote:

On Tue, Nov 5, 2024 at 6:30 PM Nishant Sharma
<nishant.sharma@enterprisedb.com> wrote:

Thanks for the v2 patch!

I see v1 review comments got addressed in v2 along with some
further improvements.

1) v2 Patch again needs re-base.

2) I think we need not worry whether table name is unique or not,
table name can be provided by user and we can check if it does
not exists then simply we can create it with appropriate columns,
if it exists we use it to check if its correct on_error table and
proceed.

"simply we can create it with appropriate columns,"
that would be more work.
so i stick to if there is a table can use to
error saving then use it, otherwise error out.

3) Using #define in between the code? I don't see that style in
copyfromparse.c file. I do see such style in other src file. So, not
sure if committer would allow it or not.
#define ERROR_TBL_COLUMNS   10

let's wait and see.

4) Below appears redundant to me, it was not the case in v1 patch
set, where it had only one return and one increment of error as new
added code was at the end of the block:-
+                   cstate->num_errors++;
+                   return true;
+               }                cstate->num_errors++;

changed per your advice.

I was not able to test the v2 due to conflicts in v2, I did attempt to
resolve but I saw many failures in make world.

I get rid of all the SPI code.

Instead, now I iterate through AttributeRelationId to check if the
error saving table is ok or not,
using DirectFunctionCall3 to do the privilege check.
removed gram.y change, turns out it is not necessary.
and other kinds of refactoring.

please check attached.

I don't have a comment on the patch in general, but wouldn't it be better to use a typed table? Then all you would need to check is the reloftype to make sure it's the right type, instead of checking all the column names and types. Creating the table would be simpler, too. Just

CREATE TABLE my_copy_errors OF TYPE pg_copy_error;

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: on_error table, saving error info to a table

From

Nishant Sharma

Date:

16 December 2024, 14:50:24

On Fri, Dec 13, 2024 at 1:57 PM jian he <jian.universality@gmail.com> wrote:

On Wed, Dec 11, 2024 at 7:41 PM Nishant Sharma
<nishant.sharma@enterprisedb.com> wrote:
>
> Thanks for the v3 patch!
>
> Please find review comments on v3:-
>
> 1) I think no need to change the below if condition, we can keep
> it the way it was before i.e with
> "cstate->opts.on_error != COPY_ON_ERROR_STOP" and we
> add a new error ereport the way v3 has. Because for
> cstate->opts.on_error as COPY_ON_ERROR_STOP cases we
> can avoid two if conditions inside upper if.
>
> + if (cstate->num_errors > 0 &&
> cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)

> 2) No need for the below "if" check for maxattnum. We can simply
> increment it with "++maxattnum" and later check if we have exactly
> 10 attributes for the error table. Because even if we drop any
> attribute and maxattnum is 10 in pg_attribute for that rel, we should
> still error out. Maybe we can rename it to "totalatts"?
>
> + if (maxattnum <= attForm->attnum)
> + maxattnum = attForm->attnum;
>
> 3) #define would be better, also as mentioned by Kirill switch
> condition with proper #define would be better.
>
> + if (maxattnum != 10)
> + on_error_tbl_ok = false;
>
> 4)

hi. Thanks for the review.
The attached v4 patch addressed these two issues.

> > that would be more work.
> > so i stick to if there is a table can use to
> > error saving then use it, otherwise error out.
> >
> YES. but that would lead to a better design with an error table.
> Also, I think Krill mentions the same. That is to auto create, if it
> does not exist.
>
I decided not to auto-create the table.
main reason not to do it:
1. utility COPY command with another SPI utility CREATE TABLE command may work.
but there is no precedent.

2. if we auto-create the on_error table with BeginCopyFrom.
then later we have to use get_relname_relid to get the newly created table Oid,
I think it somehow counts as repeating name lookups, see relevant
linke [1], [2].

[1] https://postgr.es/m/20240808171351.a9.nmisch@google.com
[2] https://postgr.es/m/CA+TgmobHYix=Nn8D4RUHa6fhUVPR88KGAMq1pBfnGfOfEjRixA@mail.gmail.com

Thanks for the v4 patch!

Review comment on v4:-

1) The new switch logic does not look correct to me. It will pass for

a failing scenario. I think you can use v3's logic instead with below

changes:-

a)
while (HeapTupleIsValid(atup = systable_getnext(ascan))) -->
while (HeapTupleIsValid(atup = systable_getnext(ascan)) && on_error_tbl_ok)

b)
attcnt++; --> just before the "switch (attForm->attnum)".

Thats it.

Also, I think Andrew's suggestion can resolve the concern me and Krill

had on forcing users to create tables with correct column names and

numbers. Also, will make error table checking simpler. No need for the

above kind of checks.

Regards,

Nishant.

Re: on_error table, saving error info to a table

From

Kirill Reshke

Date:

17 December 2024, 07:31:46

On Mon, 16 Dec 2024 at 16:50, Nishant Sharma
<nishant.sharma@enterprisedb.com> wrote:
> Also, I think Andrew's suggestion can resolve the concern me and Krill
> had on forcing users to create tables with correct column names and
> numbers. Also, will make error table checking simpler. No need for the
> above kind of checks.

+1 on that.

-- 
Best regards,
Kirill Reshke

Re: on_error table, saving error info to a table

From

jian he

Date:

25 April, 16:46:39

On Mon, Dec 16, 2024 at 7:50 PM Nishant Sharma
<nishant.sharma@enterprisedb.com> wrote:
>
>
> 1) The new switch logic does not look correct to me. It will pass for
> a failing scenario. I think you can use v3's logic instead with below
> changes:-
>
> a)
> while (HeapTupleIsValid(atup = systable_getnext(ascan))) -->
> while (HeapTupleIsValid(atup = systable_getnext(ascan)) && on_error_tbl_ok)
>
> b)
> attcnt++; --> just before the "switch (attForm->attnum)".
>
> Thats it.
>
You are right about this.

On Tue, Dec 17, 2024 at 12:31 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Mon, 16 Dec 2024 at 16:50, Nishant Sharma
> <nishant.sharma@enterprisedb.com> wrote:
> > Also, I think Andrew's suggestion can resolve the concern me and Krill
> > had on forcing users to create tables with correct column names and
> > numbers. Also, will make error table checking simpler. No need for the
> > above kind of checks.
>
> +1 on that.
>

Syntax: COPY (on_error table, table error_saving_tbl);
seems not ideal.

but auto-create on_error table if this table does not exist, seems way
more harder.

Since we can not use SPI interface here, maybe we can use DefineRelation
also, to auto-create a table, what if table already exists, then
our operation would be stuck for not COPY related reason.
also auto-create means we need to come up with a magic table name for
all COPY (on_error table)
operations, which seems not ideal IMO.

i realized we should error out case like:
COPY err_tbl FROM STDIN WITH (DELIMITER ',', on_error table, table err_tbl);

also by changing copy_generic_opt_arg, now we can
COPY err_tbl FROM STDIN WITH (DELIMITER ',', on_error table, table  x);
previously, we can only do
COPY err_tbl FROM STDIN WITH (DELIMITER ',', on_error 'table', table  x);

Attachment

v5-0001-COPY-FROM-on_error-table.patch