Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date
Msg-id f03bc3de-9769-4106-80cf-158c5f184a1e@yandex.ru
Whole thread Raw
In response to Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (jian he <jian.universality@gmail.com>)
Responses Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (jian he <jian.universality@gmail.com>)
List pgsql-hackers

Hi! Thank you for your work. Your patch looks better!

On 10.12.2023 13:32, jian he wrote:
On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
Thank you for your work. Unfortunately, your code contained errors during the make installation:

'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2

I have ubuntu 22.04 operation system.

On 06.12.2023 13:47, jian he wrote:

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:

Hi!

Thank you for your contribution to this thread.


I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.

To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned.

However, I didn't see if you dealt with cases where we already had a table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never be able to face such a problem?

Can you demo the expected behavior?

Unfortunately, I was unable to launch it due to a build issue.

Hopefully attached will work.

Yes, thank you! It works fine, and I see that the regression tests have been passed. 🙂


However, when I ran 'copy from with save_error' operation with simple csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I described below):
postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x       
      FOREIGN KEY(x)
          REFERENCES test(x));

I did not find a table with saved errors after operation, although I received a log about it:postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to table public.test_error
ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (x)=(2) already exists.
CONTEXT:  COPY test, line 3

postgres=# select * from public.test_error;
ERROR:  relation "public.test_error" does not exist
LINE 1: select * from public.test_error;

postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to table public.test1_error
ERROR:  insert or update on table "test1" violates foreign key constraint "fk_x"
DETAIL:  Key (x)=(2) is not present in table "test".

postgres=# select * from public.test1_error;
ERROR:  relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;

Two lines were written correctly in the csv files, therefore they should have been added to the tables, but they were not added to the tables test and test1. 

If I leave only the correct rows, everything works fine and the rows are added to the tables.

in copy_test.csv:

2,0

1,1

in copy_test1.csv:

2,0

2,1

1,1
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE:  No error happened.Error holding table public.test1_error will be droped
COPY 3

Maybe I'm launching it the wrong way. If so, let me know about it.


I also notice interesting behavior if the table was previously created by the user. When I was creating an error_table before the 'copy from' operation,
I received a message saying that it is impossible to create a table with the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, since this seems to be normal behavior to me.
postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);
CREATE TABLE
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error
ERROR:  Error save table public.test_error already exists. Cannot use it for COPY FROM error saving


2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.

Another column added.
now it looks like:

SELECT * FROM save_error_csv_error; filename | lineno |                        line | field | source |                 err_message                 |
err_detail | errorcode
----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+----------- STDIN    |      1 | 2002    232     40      50      60      70
80 | NULL  | NULL   | extra data after last expected column       |
NULL       | 22P04 STDIN    |      1 | 2000    230     23 | d     | NULL   | missing data for column "d"                 | NULL      | 22P04 STDIN    |      1 | z,,"" | a     | z      | invalid input syntax for type integer: "z"  | NULL      | 22P02 STDIN    |      2 | \0,, | a     | \0     | invalid input syntax for type integer: "\0" | NULL      | 22P02

Yes, I see the "filename" column, and this will solve the problem, but "STDIN" is unclear to me.
please see comment in struct CopyFromStateData:
char    *filename; /* filename, or NULL for STDIN */

Yes, I can see that.

I haven't figured out how to fix it yet either.

 */

Maybe we can rewrite it like this:

/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.

refactored.

Fine)

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: Robert Haas
Date:
Subject: Re: [CAUTION!! freemail] Re: Partial aggregates pushdown