Thread: BUG #15474: Special character escape sequences need betterdocumentation, or more easily found documentation

The following bug has been logged on the website:

Bug reference:      15474
Logged by:          Micheal Taylor
Email address:      bubthegreat@gmail.com
PostgreSQL version: 11.0
Operating system:   Any
Description:

When looking for postgresql documentation on characters that need to be
escaped, and how to escape them, I consistently get to this page:

https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-SPECIAL-CHARS

That page goes over high levels of escaping numerous things, but in the
special characters and operators, it doesn't clearly show how to escape the
operators or special characters within a more complicated query structure.
For example, if I have the following table:

            CREATE TABLE IF NOT EXISTS {tablename}
            (
                time        TIMESTAMP,
                case_number VARCHAR(25),
                jira        VARCHAR(25),
                status      VARCHAR(25),
                fqdn        VARCHAR(255),
                subject     TEXT,
                description TEXT,
                comment     TEXT
            )

And the following insertion information:

    INSERT INTO orphans(time, case_number, fqdn, status, subject,
description)
    SELECT '{timestamp}', '{case_number}', '{fqdn}', 'new', '{subject}',
'{description}'
    WHERE
        NOT EXISTS (
            SELECT case_number, status FROM orphans
            WHERE case_number = '{case_number}'
        )

Where all columns inputs are strings, if any of those inputs have special
characters like % or ', it is not clear by quick inspection of the
documentation how to escape those characters.  A simple table elaborating on
the escapes for each special character would be incredibly helpful at
determining how to translate those escapes for cleaning strings prior to
insertion so those of us using postgresql can quickly write cleaning
functions for data.


On Wednesday, October 31, 2018, PG Bug reporting form <noreply@postgresql.org> wrote:
Where all columns inputs are strings, if any of those inputs have special
characters like % or '

The only character needing escaping in a normal literal is the single quote since it identifies the end of the literal otherwise.  For an escaped string the escaping character (backslash) also needs to be protected.  These are the only two and are documented in the sections covering those types.  A table containing one or maybe two rows doesn’t seem like an improvement.

In both cases doubling up the special character protects its literal meaning.

Specifically, % is not a special character for literals - though it can be for a particular operator or function.  Said operator will then document how it is used.

David J.

On Wednesday, October 31, 2018, PG Bug reporting form <noreply@postgresql.org> wrote:
A simple table elaborating on
the escapes for each special character would be incredibly helpful at
determining how to translate those escapes for cleaning strings prior to
insertion so those of us using postgresql can quickly write cleaning
functions for data.

Translating escapes and writing cleaning functions should be a rare need unless you are writing driver-level code.  Applications should avoid caring about either by avoiding dynamic sql altogether or using quote_* functions or the format function.

David J.
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> A simple table elaborating on the escapes for each special
 PG> character would be incredibly helpful at determining how to
 PG> translate those escapes for cleaning strings prior to insertion so
 PG> those of us using postgresql can quickly write cleaning functions
 PG> for data.

If you're "writing cleaning functions" you're already making a serious
mistake, because you should be passing data values as parameters (which
do not require escapes) rather than interpolating into the query string.

If you actually do need to interpolate into the query string for some
reason (like doing COPY or other utility statement that doesn't support
parameters), then you should be using the quote/escape functions
provided by the driver for your client language (e.g. in libpq there is
PQescapeStringConn).

-- 
Andrew (irc:RhodiumToad)


tl;dr: If this is just an RTFM moment, then I'm happy to chalk it up to that and move on with my life. :)

I'm coming from a background of devops (mostly python/C++), not database administration, so this may be something that's fairly straightforward for a different audience - but this wasn't clear to me as someone who's coming into it from the outside.  An obscure stackoverflow answer is where I landed how to fix it rather than being able to find it in the documentation, so I figured I'd submit it as an improvement request.  If I'm super off base on it not being clear, then feel free to close - but I felt like the point of documentation is that it should answer those kinds of questions rather than googling other places.  Even just a link to the documentation for those would be very helpful as a "Looking for this?  It's <here>" for the special character documentation that David mentioned would have been extremely helpful IMHO.










On Wed, Oct 31, 2018 at 10:51 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> A simple table elaborating on the escapes for each special
 PG> character would be incredibly helpful at determining how to
 PG> translate those escapes for cleaning strings prior to insertion so
 PG> those of us using postgresql can quickly write cleaning functions
 PG> for data.

If you're "writing cleaning functions" you're already making a serious
mistake, because you should be passing data values as parameters (which
do not require escapes) rather than interpolating into the query string.

If you actually do need to interpolate into the query string for some
reason (like doing COPY or other utility statement that doesn't support
parameters), then you should be using the quote/escape functions
provided by the driver for your client language (e.g. in libpq there is
PQescapeStringConn).

--
Andrew (irc:RhodiumToad)


--
Mike Taylor
bubthegreat@gmail.com
(801)913-9767
btw - You guys are super responsive. Very nice to experience

On Wed, Oct 31, 2018 at 11:23 AM Mike Taylor <bubthegreat@gmail.com> wrote:
tl;dr: If this is just an RTFM moment, then I'm happy to chalk it up to that and move on with my life. :)

I'm coming from a background of devops (mostly python/C++), not database administration, so this may be something that's fairly straightforward for a different audience - but this wasn't clear to me as someone who's coming into it from the outside.  An obscure stackoverflow answer is where I landed how to fix it rather than being able to find it in the documentation, so I figured I'd submit it as an improvement request.  If I'm super off base on it not being clear, then feel free to close - but I felt like the point of documentation is that it should answer those kinds of questions rather than googling other places.  Even just a link to the documentation for those would be very helpful as a "Looking for this?  It's <here>" for the special character documentation that David mentioned would have been extremely helpful IMHO.










On Wed, Oct 31, 2018 at 10:51 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> A simple table elaborating on the escapes for each special
 PG> character would be incredibly helpful at determining how to
 PG> translate those escapes for cleaning strings prior to insertion so
 PG> those of us using postgresql can quickly write cleaning functions
 PG> for data.

If you're "writing cleaning functions" you're already making a serious
mistake, because you should be passing data values as parameters (which
do not require escapes) rather than interpolating into the query string.

If you actually do need to interpolate into the query string for some
reason (like doing COPY or other utility statement that doesn't support
parameters), then you should be using the quote/escape functions
provided by the driver for your client language (e.g. in libpq there is
PQescapeStringConn).

--
Andrew (irc:RhodiumToad)


--
Mike Taylor
bubthegreat@gmail.com
(801)913-9767


--
Mike Taylor
bubthegreat@gmail.com
(801)913-9767
On Wed, Oct 31, 2018 at 10:34 AM Mike Taylor <bubthegreat@gmail.com> wrote:
Even just a link to the documentation for those would be very helpful as a "Looking for this?  It's <here>" for the special character documentation that David mentioned would have been extremely helpful IMHO.
 
Not sure where you are expecting this "link to the documentation" to exist...

"SQL Syntax" is a top-level item in the Table of Contents of the documentation; Under that "Constants" seems like the right thing to call these hand-written literals so that seems adequate as well.  Given your background in programming languages I would expect those two headings to be sufficient to point you to the correct location in the documentation to learn how to write SQL literals.  Everyone has unique experiences though and that's why the lists exist.  The -general list would be a more appropriate place to make sure inquiries though as this isn't a bug.

David J.

On 31/10/2018 18:51, Andrew Gierth wrote:
>>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:
> 
>   PG> A simple table elaborating on the escapes for each special
>   PG> character would be incredibly helpful at determining how to
>   PG> translate those escapes for cleaning strings prior to insertion so
>   PG> those of us using postgresql can quickly write cleaning functions
>   PG> for data.
> 
> If you're "writing cleaning functions" you're already making a serious
> mistake, because you should be passing data values as parameters (which
> do not require escapes) rather than interpolating into the query string.
> 
> If you actually do need to interpolate into the query string for some
> reason (like doing COPY or other utility statement that doesn't support
> parameters), then you should be using the quote/escape functions
> provided by the driver for your client language (e.g. in libpq there is
> PQescapeStringConn).

Agreed. As pointed out in this report, though, the documentation doesn't 
say that. This section is part of the "Lexical structure" chapter, so 
it's perhaps more aimed at people writing drivers or SQL code 
generators, than general application authors. But when someone like the 
OP lands on that page, how can he tell?

It might be a good idea to add a note somewhere in there along the lines of:

"NOTE: All popular client libraries have functions for correctly quoting 
and escaping user input, for use in string literals or SQL identifiers. 
Most applications should use those, or use out-of-band query parameters, 
instead of trying to follow the rules explained here directly. Please 
refer to the documentation of your programming language or driver on how 
to do that. The libpq quoting/escaping functions are explained in 
https://www.postgresql.org/docs/current/static/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING".

- Heikki