Thread: \dn [PATTERN] handling not quite right...

\dn [PATTERN] handling not quite right...

From
Sean Chittenden
Date:
I haven't looked in great detail into why this is happpening, but it
seems as though processNamePattern() doesn't handle ?'s correctly in
the negative lookahead context correctly.

1) \dn [pattern] strips ?'s and replaces them with periods.  This may
be intentional (as the comment in describe.c suggests, converting input
from shell-style wildcards gets converted into regexp notation), but is
quite annoying.  Ex:

test=# \dn foo(?!_log|_shadow)
********* QUERY **********
SELECT n.nspname AS "Name",
        u.usename AS "Owner"
FROM pg_catalog.pg_namespace n LEFT JOIN pg_catalog.pg_user u
        ON n.nspowner=u.usesysid
WHERE   (n.nspname NOT LIKE 'pg\\_temp\\_%' OR
                  n.nspname = (pg_catalog.current_schemas(true))[1])
       AND n.nspname ~ '^foo(.!_log|_shadow)$'
ORDER BY 1;
**************************

Which is incorrect, IMHO.  Instead the last bit of the query should be:

    AND n.nspname ~ '^foo(?!_log|_shadow)$'

2) This brings up a large deficiency with the way that \d? [pattern]
handling is done in psql(1).  It'd be slick if there was a way to have
psql's pattern routine look at the first non-whitespace character or
two to change change the structure of the query.  Something like \dn
!.*_shadow% would change the RE operator from ~ to !~ and \dn %bar%
would translate to LIKE('bar%').  Doing the regexp equiv of
!LIKE('%_shadow') isn't trivial because '^.*(?!_shadow)$' doesn't
return the expected result for various reasons.  Oh!  This'd be a "gun
pointed at foot" feature, but having the first character being an =
would, without escaping, drop the remainder of the input directly into
the query (ex: \dn =nspname != (LIKE('%_log') OR LIKE('%_shadow'))).
Maybe a psql(1) variable that changes the behavior of the pattern
queries from using an RE to a LIKE statement could also be a
possibility.  The more I think about this, a leading pipe could be used
to pipe the output to a utility, so that \dn | egrep -v '(log|shadow)
would work and would be the easiest solution.

Maybe a better "bug report" would be, what's the suggested way of doing:

    n.nspname !~ '_(log|shadow)$'?

from a list pattern?

-sc

--
Sean Chittenden

Re: \dn [PATTERN] handling not quite right...

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> I haven't looked in great detail into why this is happpening, but it
> seems as though processNamePattern() doesn't handle ?'s correctly in
> the negative lookahead context correctly.

Negative lookahead context!?  You are several sigmas beyond the subset
of regex functionality that \d and friends are intended to support.
Given that we're defining * and ? as shell-style wildcards, it's not
going to be feasible to handle very much of ordinary regex usage let
alone advanced.

> The more I think about this, a leading pipe could be used
> to pipe the output to a utility, so that \dn | egrep -v '(log|shadow)
> would work and would be the easiest solution.

This on the other hand seems more like a potentially useful feature,
although I'm unclear on what you expect to get sent through the pipe
exactly --- you want column headers for instance?  What if you're using
a nondefault display layout?

            regards, tom lane

Re: \dn [PATTERN] handling not quite right...

From
Sean Chittenden
Date:
>> I haven't looked in great detail into why this is happpening, but it
>> seems as though processNamePattern() doesn't handle ?'s correctly in
>> the negative lookahead context correctly.
>
> Negative lookahead context!?  You are several sigmas beyond the subset
> of regex functionality that \d and friends are intended to support.
> Given that we're defining * and ? as shell-style wildcards, it's not
> going to be feasible to handle very much of ordinary regex usage let
> alone advanced.

I was worried you'd say as much.  I'm in a situation where I've got a
few hundred schemas floating around now and about half of them end with
_log or _shadow and I was surprised at how non-trivial it was to filter
out the _log or _shadow schemas with \dn.  I tried thinking up the psql
equiv of tcsh's fignore but had no luck (ex:  set fignore = (\~ .o
.bak)).

>> The more I think about this, a leading pipe could be used
>> to pipe the output to a utility, so that \dn | egrep -v '(log|shadow)
>> would work and would be the easiest solution.
>
> This on the other hand seems more like a potentially useful feature,
> although I'm unclear on what you expect to get sent through the pipe
> exactly --- you want column headers for instance?  What if you're using
> a nondefault display layout?

Instead of using printf(), fprintf(), fwrite(), or whatever it is that
psql(1) uses internally for displaying result sets, have it use the
following chunk of pseudo code:

if (pipe_symbol_found) {
    char buf[8192];
    size_t len = 0;
    memset(&buf, 0, sizeof(buf));

    fh = popen(..., "r+"); /* or setsocketpair() + fork() */
    fwrite(formatted_output_buffer, strlen(formatted_output_buffer), 1,
fh);

    while((len = read(fileno(fh), buf, sizeof(buf)))) {
        fwrite(buf, len, 1, stdout);
    }
} else {
    /* whatever the current code does */
}

That doesn't take into account the set option that lets you write the
output to a file, but that is trivial to handle.  To answer your
question, I'd send _everything_ through the pipe and use the pipe as a
blanket IO filter.  I haven't thought about this, but would it be
possible to hand the data off to sh(1) and have it handle the
pipe/redirection foo that way psql doesn't have to have any
pipe/redirection brains?  If so, I think that'd be slick since you
could do things like '\dn | tail -n +3 | grep -v blah' to handle your
concern about having the header sent through and a utility not wanting
it.

Too bad tee(1) doesn't support a -p option to have tee(1)'s argument
sent to sh(1) or a pipe instead of a file, then there'd be some real
interesting things that one could script.  Ex:

    \dn | tee -p 'head -n 3 >> /dev/stdout' | tail -n +3 | egrep -v
'_(log|shadow)$'

Which'd show you the header, but everything after the header would be
sent to egrep(1). I can't understand why win32 users think *NIX's CLI
can be confusing.... *grin*.

Just some thoughts.  -sc

--
Sean Chittenden