Home > mailing lists

Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From	Daniel Gustafsson
Subject	Re: proposal: possibility to read dumped table's name from file
Date	September 21, 2021 15:37:55
Msg-id	60D482DD-2F63-4627-B154-D6C46A7DAEBE@yesql.se Whole thread Raw
In response to	Re: proposal: possibility to read dumped table's name from file (Pavel Stehule <pavel.stehule@gmail.com>)
Responses	Re: proposal: possibility to read dumped table's name from file Re: proposal: possibility to read dumped table's name from file Re: proposal: possibility to read dumped table's name from file Re: proposal: possibility to read dumped table's name from file
List	pgsql-hackers

Tree view

> On 21 Sep 2021, at 08:50, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> napsal:

> +       printf(_("  --filter=FILENAME            dump objects and data based on the filter expressions\n"
> +                        "                               from the filter file\n"));
> Before we settle on --filter I think we need to conclude whether this file is
> intended to be included from a config file, or used on it's own.  If we gow tih
> the former then we might not want a separate option for just --filter.
>
> I prefer to separate two files. Although there is some intersection, I think it is good to have two simple separate
filesfor two really different tasks. 
> It does filtering, and it should be controlled by option "--filter". When the implementation will be changed, then
thisoption can be changed too.  
> Filtering is just a pg_dump related feature. Revision of client application configuration is a much more generic
task,and if we mix it to one, we can be 
> in a trap. It can be hard to find one good format for large script generated content, and possibly hand written
structuredcontent. For practical 
> reasons it can be good to have two files too. Filters and configurations can have different life cycles.

I'm not convinced that we can/should change or remove a commandline parameter
in a coming version when there might be scripts expecting it to work in a
specific way.  Having a --filter as well as a --config where the configfile can
refer to the filterfile also passed via --filter sounds like problem waiting to
happen, so I think we need to settle how we want to interact with this file
before anything goes in.

Any thoughts from those in the thread who have had strong opinions on config
files etc?

> +    if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed.  The is
> not a hot-path, we can afford the string comparison in case of errors.  Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
>
> I agree that this is not a hot-path, just I don't feel well if I need to make a zero end string just for comparison
pg_strcasecmp.Current design reduces malloc/free cycles. It is used in more places, when Postgres parses strings - SQL
parser,plpgsql parser. I am not sure about the benefits and costs - pg_strcasecmp can be more readable, but for any
keywordI have to call pstrdup and pfree. Is it necessary? My opinion in this part is not too strong - it is a minor
issue,maybe I have a little bit different feelings about benefits and costs in this specific case, and if you really
thinkthe benefits of rewriting are higher, I'll do it 

Sorry, I typoed my response.  What I meant was to move the pg_strncasecmp call
inline and not do the strlen check, to save readers from jumping around.  So
basically end up with the below in read_filter_item():

+    /* Now we expect sequence of two keywords */
+    if (pg_strncasecmp(keyword, "include", size) == 0)
+        *is_include = true;

> +    initStringInfo(&line);
> Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump
> does?
>
> The StringInfo is used because I use the pg_get_line_buf function, and this function uses this API.

Ah, of course.

A few other comments from another pass over this:

+    exit_nicely(-1);
Why -1?  pg_dump (and all other binaries) exits with 1 on IMO even more serious
errors so I think this should use 1 as well.


+    if (!pg_get_line_buf(fstate->fp, line))
+    {
+        if (ferror(fstate->fp))
+            fatal("could not read from file \"%s\": %m", fstate->filename);
+
+        exit_invalid_filter_format(fstate,"unexpected end of file");
+    }
In the ferror() case this codepath isn't running fclose() on the file pointer
(unless stdin) which we do elsewhere, so this should use pg_log_error and
exit_nicely instead.


+    pg_log_fatal("could not read from file \"%s\": %m", fstate->filename);
Based on how other errors are treated in pg_dump I think this should be
downgraded to a pg_log_error.

The above comments are fixed in the attached, as well as a pass over the docs
and extended tests to actually test matching a foreign server.  What do think
about this version?  I'm still not convinced that there aren't more quoting
bugs in the parser, but I've left that intact for now.

--
Daniel Gustafsson        https://vmware.com/

Attachment

0001-Implement-filter-for-pg_dump.patch

pgsql-hackers by date:

From: Greg Nancarrow
Date: 21 September 2021, 15:35:30
Subject: Re: Added schema level support for publication.

From: torikoshia
Date: 21 September 2021, 15:43:14
Subject: Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails

Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

Attachment

Previous

Next