Re: Add on_error and log_verbosity options to file_fdw - Mailing list pgsql-hackers

From torikoshia
Subject Re: Add on_error and log_verbosity options to file_fdw
Date
Msg-id 79a336ccebc70401625743686c41b81a@oss.nttdata.com
Whole thread Raw
In response to Re: Add on_error and log_verbosity options to file_fdw  (torikoshia <torikoshia@oss.nttdata.com>)
List pgsql-hackers
On 2024-07-24 19:43, torikoshia wrote:
> On 2024-07-23 08:57, Michael Paquier wrote:
>> On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:
>>> I'm slightly concerned that users might not want to see the NOTICE
>>> message for every scan. Unlike COPY FROM, scanning a file via 
>>> file_fdw
>>> could be frequent.
> 
> Agreed.
> 
>> Yeah, I also have some concerns regarding the noise that this could
>> produce if called on a foreign table on a regular basis.  The verbose
>> mode is disabled by default so I don't see why we should not allow it
>> if the relation owner wants to show it.
>> 
>> Perhaps we should first do a silence mode for log_verbosity to skip
>> the NOTICE produced at the end of the COPY FROM summarizing the whole?
> 
> I like this idea.
> If there are no objections, I'm going to make a patch for this.

Attached patches.
0001 adds new option 'silent' to log_verbosity and 0002 adds on_error 
and log_verbosity options to file_fdw.


> I'm going to continue developing the patch(e.g. add doc, measure
performance degradation) when people also think this feature is worth 
adding.

Here is a quick performance test result.

I loaded 1,000,000 rows using pgbench_accounts to a file and counted the 
number of rows using file_fdw on different conditions and compared the 
execution times on my laptop.

The changed conditions are below:
- source code: HEAD/patch applied
- data: no error data/all row occur data conversion error at the 1st 
column
- file_fdw options: on_error=stop/on_error=ignore

There seems no significant difference in performance between HEAD and 
the patch applied with on_error option specified as either ignore/stop 
when data has no error.
OTOH when all rows occur data conversion error, it is significantly 
faster than other cases:

# HAED(e950fe58bd0)
## data:no error

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv');
=# select count(*) from t1;

1567.569 ms
1675.112 ms
1555.782 ms
1547.676 ms
1660.221 ms

# patch applied
## data:no error, on_error:stop

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv', on_error 'stop');
=# select count(*) from t1;

1580.656 ms
1623.784 ms
1596.947 ms
1652.307 ms
1613.607 ms

## data:no error, on_error:ignore

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv', on_error 'ignore');
=# select count(*) from t1;

1575.718 ms
1597.464 ms
1596.540 ms
1665.818 ms
1595.453 ms

#### data:all rows contain error, on_error:ignore

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv', on_error 'ignore');
=# select count(*) from t1;

914.537 ms
907.506 ms
912.768 ms
913.769 ms
914.327 ms


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachment

pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Using read stream in autoprewarm
Next
From: "cca5507"
Date:
Subject: Re: Historic snapshot doesn't track txns committed in BUILDING_SNAPSHOT state