Re: Add on_error and log_verbosity options to file_fdw - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Add on_error and log_verbosity options to file_fdw |
Date | |
Msg-id | CAD21AoCt4UTGs-V_sGewCdjvmjhgnnOUG3T83tRF=VtbdW-dPg@mail.gmail.com Whole thread Raw |
In response to | Add on_error and log_verbosity options to file_fdw (torikoshia <torikoshia@oss.nttdata.com>) |
List | pgsql-hackers |
Hi, On Mon, Sep 30, 2024 at 8:36 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2024/09/26 21:57, torikoshia wrote: > > Updated the patches. > > Thanks for updating the patches! I’ve made some changes based on your work, which are attached. > Barring any objections, I'm thinking to push these patches. > > For patches 0001 and 0003, I ran pgindent and updated the commit message. > > Regarding patch 0002: > > - I updated the regression test to run ANALYZE on the file_fdw foreign table > since the on_error option also affects the ANALYZE command. To ensure test > stability, the test now runs ANALYZE with log_verbosity = 'silent'. > > - I removed the code that updated the count of skipped rows for > the pg_stat_progress_copy view. As far as I know, file_fdw doesn’t > currently support tracking pg_stat_progress_copy.tuples_processed. > Supporting only tuples_skipped seems inconsistent, so I suggest creating > a separate patch to extend file_fdw to track both tuples_processed and > tuples_skipped in this view. > > - I refactored the for-loop handling on_error = 'ignore' in fileIterateForeignScan() > by replacing it with a goto statement for improved readability. > > - I modified file_fdw to log a NOTICE message about skipped rows at the end of > ANALYZE if any rows are skipped due to the on_error = 'ignore' setting. > > Regarding the "file contains XXX rows" message reported by the ANALYZE VERBOSE > command on the file_fdw foreign table, what number should be reflected in XXX, > especially when some rows are skipped due to on_error = 'ignore'? > Currently, the count only includes valid rows, excluding any skipped rows. > I haven't modified this code yet. Should we instead count all rows > (valid and erroneous) and report that total? > > I noticed the code for reporting the number of skipped rows due to > on_error = 'ignore' appears in three places. I’m considering creating > a common function for this reporting to eliminate redundancy but haven’t > implemented it yet. > > - I’ve updated the commit message and run pgindent. Sorry for being late in joining the review of this patch. Both 0001 and 0003 look good to me. I have two comments on the 0002 patch: - found = NextCopyFrom(festate->cstate, econtext, - slot->tts_values, slot->tts_isnull); - if (found) + +retry: + if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull)) + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and just make + * ErrorSaveContext ready for the next NextCopyFrom. Since we + * don't set details_wanted and error_data is not to be filled, + * just resetting error_occurred is enough. + */ + cstate->escontext->error_occurred = false; + + /* Repeat NextCopyFrom() until no soft error occurs */ + goto retry; + } + ExecStoreVirtualTuple(slot); + } I think that while scanning a file_fdw foreign table with log_verbosity='silent' the query is not interruptible. Also, we don't switch to the per-tuple memory context when retrying due to a soft error. I'm not sure it's okay as in CopyFrom(), a similar function for COPY command, we switch to the per-tuple memory context every time before parsing an input line. Would it be problematic if we switch to another memory context while parsing an input line? In CopyFrom() we also call ResetPerTupleExprContext() and ExecClearTuple() for every input, so we might want to consider calling them for every input. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: