Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
Date | |
Msg-id | CACJufxGEHmijmP-QqvrmqU6cxmhgpdjY7ewQBQ=E9NmdyEcqmw@mail.gmail.com Whole thread Raw |
In response to | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Tue, Apr 8, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > BTW have you measured the overheads of calling InputFunctionCallSafe > twice? If it's significant, we might want to find other ways to > achieve it as it would not be good to incur overhead just for > relatively rare cases. > Please check the attached two patches v17-0001-COPY-on_error-set_null.original, v17-0001-COPY-on_error-set_null.patch for non-domain types, (on_error set_null), the performance of these two are the same. for domain type with or without constraint, (on_error set_null): v17.original is slower than v17.patch. test script: create unlogged table t2(a text); insert into t2 select 'a' from generate_Series(1, 10_000_000) g; copy t2 to '/tmp/2.txt'; CREATE DOMAIN d1 AS INT ; CREATE DOMAIN d2 AS INT check (value > 0); create unlogged table t3(a int); create unlogged table t4(a d1); create unlogged table t5(a d2); performance result: v17-0001-COPY-on_error-set_null.patch -- 764.903 ms copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- 779.253 ms copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- Time: 750.390 ms copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 v17-0001-COPY-on_error-set_null.original -- 774.943 ms copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- 867.671 ms copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- 927.685 ms copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 > Here are some comments: > > + if (InputFunctionCallSafe(&in_functions[m], > + NULL, > + typioparams[m], > + att->atttypmod, > + NULL, > + &values[m])) > > Given that we pass NULL to escontext, does this function return false > in an error case? Or can we use InputFunctionCall instead? > > I think we should mention that SET_NULL still could fail if the data > type of the column doesn't accept NULL. > > How about restructuring the codes around handling data incompatibility > errors like: > > else if (!InputFunctionCallSafe(...)) > { > if (cstate->opts.on_error == IGNORE) > { > cstate->num_errors++; > if (cstate->opts.log_verbosity == VERBOSE) > write a NOTICE message; > return true; // ignore whole row. > } > else if (cstate->opts.on_error == SET_NULL) > { > current_row_erroneous = true; > set NULL to the column; > if (cstate->opts.log_verbosity == VERBOSE) > write a NOTICE message; > continue; // go to the next column. > } > > That way, we have similar structures for both on_error handling and > don't need to reset cstate->cur_attname at the end of SET_NULL > handling. > I think we still need to reset cstate->cur_attname. the current code structure is `` foreach(cur, cstate->attnumlist) { if (condition x) continue; cstate->cur_attname = NULL; cstate->cur_attval = NULL; } `` In some cases (last column , condition x is satisfied), once we reach the ``continue``, then we cannot reach. `` cstate->cur_attname = NULL; cstate->cur_attval = NULL; `` > --- > From the regression tests: > > --fail, column a is domain with not-null constraint > COPY t_on_error_null FROM STDIN WITH (on_error set_null); > a 11 14 > \. > ERROR: domain d_int_not_null does not allow null values > CONTEXT: COPY t_on_error_null, line 1, column a: "a" > > I guess that the log messages could confuse users since while the > actual error was caused by setting NULL to the non-NULL domain type > column, the context message says the data 'a' was erroneous. > if the second function is InputFunctionCall, then we cannot customize the error message. we can't have both. I guess we need a second InputFunctionCallSafe with escontext NOT NULL. now i change it to if (!cstate->domain_with_constraint[m] || InputFunctionCallSafe(&in_functions[m], NULL, typioparams[m], att->atttypmod, (Node *) cstate->escontext, &values[m])) else if (string == NULL) ereport(ERROR, errcode(ERRCODE_NOT_NULL_VIOLATION), errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), errdatatype(typioparams[m])); else ereport(ERROR, errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input value for domain %s: \"%s\"", format_type_be(typioparams[m]), string)); do these ``ELSE IF``, ``ELSE`` error report messages make sense to you?
Attachment
pgsql-hackers by date: