Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: proposal: possibility to read dumped table's name from file
Date
Msg-id 20221111081125.bxtndzqlt5tpc5qw@jrouhaud
Whole thread Raw
In response to Re: proposal: possibility to read dumped table's name from file  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: proposal: possibility to read dumped table's name from file
List pgsql-hackers
Hi,

On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote:
>
> pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123@gmail.com>
> napsal:
>
> > > But one thing I noticed is that "optarg" looks wrong here:
> > >
> > > simple_string_list_append(&opts->triggerNames, optarg);
> >
> > Ah indeed, good catch!  Maybe there should be an explicit test for every
> > (include|exclude) / objtype combination?  It would be a bit verbose (and
> > possibly hard to maintain).
> >
>
> yes - pg_restore is not well covered by  tests, fixed
>
> I found another issue. The pg_restore requires a full signature of the
> function and it is pretty sensitive on white spaces (pg_restore).

Argh, indeed.  It's a good thing to have expanded the regression tests :)

> I made a
> mistake when I partially parsed patterns like SQL identifiers. It can work
> for simple cases, but when I parse the function's signature it stops
> working. So I rewrote the parsing pattern part. Now, I just read an input
> string and I try to reduce spaces. Still multiline identifiers are
> supported. Against the previous method of pattern parsing, I needed to
> change just one regress test - now I am not able to detect garbage after
> pattern :-/.

I'm not sure it's really problematic.  It looks POLA-violation compatible with
regular pg_dump options, for instance:

$ echo "include table t1()" | pg_dump --filter - | ag CREATE
CREATE TABLE public.t1 (

$ pg_dump -t "t1()" | ag CREATE
CREATE TABLE public.t1 (

$ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE
pg_dump: error: no matching tables were found

$ pg_dump -t "t1()blabla" | ag CREATE
pg_dump: error: no matching tables were found

I don't think the file parsing code should try to be smart about checking the
found patterns.

> * function's filtering doesn't support schema - when the name of function
> is specified with schema, then the function is not found

Ah I didn't know that.  Indeed it only expect a non-qualified identifier, and
would restore any function that matches the name (and arguments), possibly
multiple ones if there are variants in different schema.  That's unrelated to
this patch though.

> * the function has to be specified with an argument type list - the
> separator has to be exactly ", " string. Without space or with one space
> more, the filtering doesn't work (new implementation of pattern parsing
> reduces white spaces sensitivity). This is not a bug, but it is not well
> documented.

Agreed.

> attached updated patch

It looks overall good to me!  I just have a few minor nitpicking complaints:

- you removed the pg_strip_clrf() calls and declared everything as "const char
  *", so there's no need to explicitly cast the filter_get_keyword() arguments
  anymore

Note also that the code now relies on the fact that there are some non-zero
bytes after a pattern to know that no errors happened.  It's not a problem as
you should find an EOF marker anyway if CLRF were stripped.

+ * Following routines are called from pg_dump, pg_dumpall and pg_restore.
+ * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore
+ * is different from implementation of this routine in pg_dumpall. So instead
+ * of directly calling exit_nicely we have to return some error flag (in this
+ * case NULL), and exit_nicelly will be executed from caller's routine.

Slight improvement:
[...]
Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore is
different from the one in pg_dumpall, so instead of...

+ * read_pattern - reads an pattern from input. The pattern can be mix of
+ * single line or multi line subpatterns. Single line subpattern starts first
+ * non white space char, and ending last non space char on line or by char
+ * '#'. The white spaces inside are removed (around char ".()"), or reformated
+ * around char ',' or reduced (the multiple spaces are replaced by one).
+ * Multiline subpattern starts by double quote and ending by this char too.
+ * The escape rules are same like for SQL quoted literal.
+ *
+ * Routine signalizes error by returning NULL. Otherwise returns pointer
+ * to next char after last processed char in input string.


typo: reads "a" pattern from input...

I don't fully understand the part about subpatterns, but is that necessary to
describe it?  Simply saying that any valid and possibly-quoted identifier can
be parsed should make it clear that identifiers containing \n characters should
work too.  Maybe also just mention that whitespaces are removed and special
care is taken to output routines in exactly the same way calling code will
expect it (that is comma-and-single-space type delimiter).



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Typo about subxip in comments
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply