Thread: psql/pg_dump vs. dollar signs in identifiers

psql/pg_dump vs. dollar signs in identifiers

From
Tom Lane
Date:
An example being discussed on the jdbc list led me to try this:

regression=# create table a$b$c (f1 int);
CREATE TABLE
regression=# \d a$b$c
Did not find any relation named "a$b$c".

It works if you use quotes:

regression=# \d "a$b$c"    Table "public.a$b$c"Column |  Type   | Modifiers
--------+---------+-----------f1     | integer |

The reason it doesn't work without quotes is that processSQLNamePattern()
thinks this:
            * Inside double quotes, or at all times if force_escape is true,            * quote regexp special
characterswith a backslash to avoid            * regexp errors.  Outside quotes, however, let them pass through
  * as-is; this lets knowledgeable users build regexp expressions            * that are more powerful than shell-style
patterns.

and of course $ is a regexp special character, so it bollixes up the
match.

Now, because we surround the pattern with ^...$ anyway, I can't offhand
see a use-case for putting $ with its regexp meaning into the pattern.
And since we do allow $ as a non-first character of identifiers, there
is a use-case for expecting it to be treated like an ordinary character.

So I'm thinking that $ ought to be quoted whether it's inside double
quotes or not.  This change would affect psql's describe commands as
well as pg_dump -t and -n patterns.

Comments?
        regards, tom lane


Re: psql/pg_dump vs. dollar signs in identifiers

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Now, because we surround the pattern with ^...$ anyway, I can't offhand
> see a use-case for putting $ with its regexp meaning into the pattern.

It's possible to still usefully use $ in the regexp, but it's existence at the
end means there should always be a way to write the regexp without needing
another one inside.

Incidentally, are these really regexps? I always thought they were globs. 
And experiments seem to back up my memory:

postgres=# \d foo*   Table "public.foo^bar"Column |  Type   | Modifiers 
--------+---------+-----------i      | integer | 

postgres=# \d foo.*
Did not find any relation named "foo.*".


> Comments?

The first half of the logic applies to ^ as well. There's no use case for
regexps using ^ inside. You would have to use quotes to create the table but
we could have \d foo^* work:

postgres=# \d foo^*
Did not find any relation named "foo^*".


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: psql/pg_dump vs. dollar signs in identifiers

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Incidentally, are these really regexps? I always thought they were globs. 

They're regexps under the hood, but we treat . as a schema separator
and translate * to .*, which makes it look like mostly a glob scheme.
But you can make use of brackets, |, +, ...
        regards, tom lane


Re: psql/pg_dump vs. dollar signs in identifiers

From
"Jim C. Nasby"
Date:
On Mon, Jul 09, 2007 at 07:04:27PM +0100, Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > Now, because we surround the pattern with ^...$ anyway, I can't offhand
> > see a use-case for putting $ with its regexp meaning into the pattern.
>
> It's possible to still usefully use $ in the regexp, but it's existence at the
> end means there should always be a way to write the regexp without needing
> another one inside.

Unless you're doing muti-line regex, what's the point of a $ anywhere
but the end of the expression? Am I missing something? Likewise with ^.

I'm inclined to escape $ as Tom suggested.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: psql/pg_dump vs. dollar signs in identifiers

From
Gregory Stark
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:

> Unless you're doing muti-line regex, what's the point of a $ anywhere
> but the end of the expression? Am I missing something? Likewise with ^.

Leaving out the backslashes, you can do things like (foo$|baz|qux)(baz|qux|)
to say that all 9 combinations of those two tokens are valid except that foo
must be followed by the empty second half.

But it can always be refactored into something more normal like
(foo|((baz|qux)(baz|qux)?))

> I'm inclined to escape $ as Tom suggested.

Yeah, I have a tendency to look for the most obscure counter-example if only
to be sure I really understand precisely how obscure it is. I do agree that
it's not a realistic concern. Especially since I never even realized we
handled regexps here at all :)

IIRC some regexp engines don't actually treat $ specially except at the end of
the regexp at all. Tom's just suggesting doing the same thing here where
complicated regexps are even *less* likely and dollars as literals more.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com