Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal: possibility to read dumped table's name from file
Date
Msg-id CAFj8pRAZbCvM3v8jp5wkPpZHJvAopWpbF=+v+84Pns=qWtvEcg@mail.gmail.com
Whole thread Raw
In response to Re: proposal: possibility to read dumped table's name from file  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: proposal: possibility to read dumped table's name from file
Re: proposal: possibility to read dumped table's name from file
List pgsql-hackers
Hi

po 13. 9. 2021 v 15:01 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 28 Jul 2021, at 09:28, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 13. 7. 2021 v 1:16 odesílatel Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> napsal:

> Hence I suggest
>
>         include table PATTERN
>         exclude table PATTERN
>
> which ends up being the above but with words not [+-].

One issue with this syntax is that the include keyword can be quite misleading
as it's semantic interpretion of "include table t" can be different from
"--table=t".  The former is less clear about the fact that it means "exclude
all other tables than " then the latter.  It can be solved with documentation,
but I think that needs be to be made clearer.

I invite any documentation enhancing and fixing

> Here is an updated implementation of filter's file, that implements syntax proposed by you.

While it's not the format I would prefer, it does allow for most (all?) use
cases expressed in this thread with ample armtwisting applied so let's go ahead
from this point and see if we can agree on it (or a version of it).

A few notes on the patch after a first pass over it:

+(include|exclude)[table|schema|foreign_data|data] <replaceable class="parameter">objectname</replaceable>
Lacks whitespace between keyword and object type.  Also, since these are
mandatory parameters, shouldn't they be within '{..}' ?

yes, fixed

 

+       /* skip initial white spaces */
+       while (isblank(*ptr))
+               ptr += 1;
We don't trust isblank() as of 3fd5faed5 due to portability concerns, this
should probably use a version of the pg_isblank() we already have (and possibly
move that to src/common/string.c as there now are more consumers).


I rewrote this part, and I don't use function isblank ever
 

+static bool
+isblank_line(const char *line)
This could be replaced with a single call to strspn() as we already do for
parsing the TOC file.


+       /* when first char is hash, ignore whole line */
+       if (*str == '#')
+               continue;
I think we should strip leading whitespace before this to allow commentlines to
start with whitespace, it's easy enough and will make life easier for users.

now, the comments can be used as first non blank char or after filter


+       pg_log_error("invalid format of filter file \"%s\": %s",
+                                filename,
+                                message);
+
+       fprintf(stderr, "%d: %s\n", lineno, line);
Can't we just include the lineno in the error logging and skip dumping the
offending line?  Fast-forwarding the pointer to print the offending part is
less useful than a context marker, and in some cases suboptimal.  With this
coding, if a pattern is omitted for example the below error message is given:
 
    pg_dump: error: invalid format of filter file "filter.txt": missing object name
    1:

The errormessage and the linenumber in the file should be enough for the user
to figure out what to fix.

I did it like you proposed, but still, I think the content can be useful. More times you read dynamically generated files, or you read data from stdin, and in complex environments it can be hard regenerate new content for debugging.



+       if (keyword && is_keyword(keyword, size, "table"))
+               objecttype = 't';
Should this use an enum, or at least a struct translation the literal keyword
to the internal representation?  Magic constants without explicit connection to
their token counterparts can easily be cause of bugs.


fixed


If I create a table called "a\nb" and try to dump it I get an error in parsing
the file.  Isn't this supposed to work?
    $ cat filter.txt
    include table "a
    b"
    $ ./bin/pg_dump --filter=filter.txt
    pg_dump: error: invalid format of filter file "filter.txt": unexpected chars after object name
    2:

probably there was some issue, because it should work. I tested a new version and this is tested in new regress tests. Please, check



Did you consider implementing this in Bison to abstract some of the messier
parsing logic?

Initially not, but now, when I am thinking about it, I don't think so Bison helps. The syntax of the filter file is nicely linear. Now, the code of the parser is a little bit larger than minimalistic, but it is due to nicer error's messages. The raw implementation in Bison raised just "syntax error" and positions. I did code refactoring, and now the scanning, parsing and processing are divided into separated routines. Parsing related code has 90 lines. In this case, I don't think using a parser grammar file can carry any benefit. grammar is more readable, sure, but we need to include bison, we need to handle errors, and if we want to raise more helpful errors than just "syntax error", then the code will be longer.

please, check attached patch

Regards

Pavel



--
Daniel Gustafsson               https://vmware.com/

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Hook for extensible parsing.
Next
From: Julien Rouhaud
Date:
Subject: Re: Hook for extensible parsing.