Thread: LIKE with pattern containing backslash

LIKE with pattern containing backslash

From
Jack Orenstein
Date:
Suppose I have this table:

     create table test(id int, x varchar)

And I want to find rows whose x contains at least one backslash. The backslash
has to be escaped according to section 9.7.1 of the postgres docs.

     select *
     from test
     where x like E'%\\%'

I'll skip the results of my psql experimentation to avoid having to think about
escaping backslashes from the command-line, inside psql.

My test data set has three rows:

    0    a\b
    1    a\\b
    2    a\\\b

I wrote a JDBC test program, with two variants.

1) Searching with literals, escaping the backslash, e.g.

     statement.executeQuery("select id, x from test where x like E'%\\%'")

This turns up all three rows. But this:

     statement.executeQuery("select id, x from test where x like E'%\\\\%'")

doesn't retrieve any rows. From the docs, I would expect the second query to
retrieve rows with ids 1 and 2.

2) Avoiding literals completely, I created a PreparedStatement, and bound
variables containing the patterns, e.g.

     PreparedStatement find = connection.prepareStatement("select id, x from
test where x like ?");
     String oneBackslash = new String(new byte[]{'%', '\\', '%'});
     find.setString(1, oneBackslash);
     ResultSet resultSet = find.executeQuery();

Now, searching for %\% turns up nothing, while searching for %\\% turns up all
three rows.

BOTH behaviors seem wrong to me. In the first test (pattern specified as a
literal), it looks like %\\% is not matching strings that do contain two
backslashes. In the second test (pattern specified as a bound variable), it
looks like the first slash in each pattern is interpreted as an escape. Which I
didn't expect for a bound variable. Section 9.7.1 says "Note that the backslash
already has a special meaning in string literals ..." This seems to apply to
literals only, not to bound variables. And the need to have escapes in a bound
variable escapes me (so to speak).

Can someone help in my understanding of what is correct behavior (in the literal
and bound variable cases)? Is there a bug in the driver? in postgres? in the
docs? Or in my understanding?

Jack Orenstein

P.S. If you want to play with this, I can send you my test programs for the
cases described above.

Re: LIKE with pattern containing backslash

From
Tom Lane
Date:
Jack Orenstein <jack.orenstein@hds.com> writes:
> I wrote a JDBC test program, with two variants.

My Java is pretty weak, but doesn't it think that backslash is special
in string literals?  If I assume that each pair of \'s in your examples
went to the database as one \, then the results are what I'd expect.

            regards, tom lane

Re: LIKE with pattern containing backslash

From
"Daniel Verite"
Date:
    Jack Orenstein wrote:

> Can someone help in my understanding of what is correct
> behavior (in the literal and bound variable cases)?
> Is there a bug in the driver? in postgres? in the
> docs? Or in my understanding?

LIKE E'%\\%' will match a string that ends with a percent sign, not a
string that contains a backslash.
That's because the backslash acts additionally as the default escape
character for LIKE patterns.
You can add ESCAPE '' after the LIKE statement to avoid that.
Otherwise you get really two levels of different backslash
interpretation here, one for the string parser and one for the LIKE
operator, and that doesn't take into account any additional level
needed if you embed the query into a source code string.
Embedding "like backslash" in a C string is a kind of worst case, you
really need to write:
"LIKE E'\\\\\\\\'" (8 backslashes) which looks ridiculous but is the
correct form.
The C compiler will reduce 8 to 4. The SQL parser will reduce 4 to 2.
The LIKE operator will reduce 2 to 1.

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

Re: LIKE with pattern containing backslash

From
Jack Orenstein
Date:
Tom Lane wrote:

> My Java is pretty weak, but doesn't it think that backslash is special
> in string literals?  If I assume that each pair of \'s in your examples
> went to the database as one \, then the results are what I'd expect.
>
>             regards, tom lane

Daniel Verite wrote:

 > LIKE E'%\\%' will match a string that ends with a percent sign, not a string
 > that contains a backslash. That's because the backslash acts additionally as
 > the default escape character for LIKE patterns. You can add ESCAPE '' after
 > the LIKE > statement to avoid that.
 > Otherwise you get really two levels of different backslash interpretation
 > here, one for the string parser and one for the LIKE operator, and that
 > doesn't take into account any additional level needed if you embed the query
 > into a source code string.

OK, let's skip the first variant of the test, which relies on escapes in java
strings. In the second variant, there are no literal strings involved at all,
because I create a String from a char[] in which the characters are specified
individually:

     PreparedStatement find = connection.prepareStatement
         ("select id, x from test where x like ?");
     String oneBackslash = new String(new char[]{'%', '\\', '%'});
     find.setString(1, oneBackslash);
     ResultSet resultSet = find.executeQuery());

In this code, The string passed to JDBC is %\% (percent, one backslash,
percent), and no rows are returned. It appears as if the one backslash is being
treated as an escape for the %. And if I add a row (3, a\%) then that row DOES
get returned.

So back to the documented behavior of LIKE: "Note that the backslash already has
a special meaning in string literals, so to write a pattern constant that
contains a backslash you must write two backslashes in an SQL statement". Is
this statement meant to apply to the *value* of the RHS operand of LIKE? Or to
strings expressed as literals in general?

By referring to "string literals" and "pattern constant", the doc sounds like it
is describing how escapes are handled in strings. My example above does not rely
on a string literal for the RHS of LIKE. If escape processing is supposed to
occur for the RHS of LIKE, regardless of how the pattern is expressed, then I
believe the doc is confusing. (It confused me, anyway.) Another possibility is
that the doc is correct, and that the driver is getting things wrong. For now,
I'm trying to understand what the correct behavior is.

Jack

Here is my entire test program. It runs standalone - just provide the database
name, username, password on the command line.

import java.sql.*;

public class LikeVsBackslash_varchar
{
     public static void main(String[] args) throws Exception
     {
         String database = args[0];
         String username = args[1];
         String password = args[2];
         Class.forName("org.postgresql.Driver").newInstance();
         Connection connection =
DriverManager.getConnection(String.format("jdbc:postgresql:%s", database),
username, password);
         Statement statement = connection.createStatement();
         statement.execute("drop table if exists test");
         statement.execute("create table test(id int, x varchar)");
         PreparedStatement insert = connection.prepareStatement("insert into
test values(?, ?)");
         PreparedStatement find = connection.prepareStatement("select id, x from
test where x like ?");
         // Insert "a\b"
         insert.setInt(1, 0);
         insert.setString(2, new String(new char[]{'a', '\\', 'b'}));
         insert.executeUpdate();
         // Insert "a\\b"
         insert.setInt(1, 1);
         insert.setString(2, new String(new char[]{'a', '\\', '\\', 'b'}));
         insert.executeUpdate();
         // Insert "a\\\b"
         insert.setInt(1, 2);
         insert.setString(2, new String(new char[]{'a', '\\', '\\', '\\', 'b'}));
         insert.executeUpdate();
         // Insert "a\%"
         insert.setInt(1, 3);
         insert.setString(2, new String(new char[]{'a', '\\', '%'}));
         insert.executeUpdate();
         // Find rows with at least one backslash
         String oneBackslash = new String(new char[]{'%', '\\', '%'});
         find.setString(1, oneBackslash);
         report(String.format("Rows matching %s", oneBackslash),
find.executeQuery());
         // Find rows with at least two backslashes
         String twoBackslashes = new String(new char[]{'%', '\\', '\\', '%'});
         find.setString(1, twoBackslashes);
         report(String.format("Rows matching %s", twoBackslashes),
find.executeQuery());
     }

     private static void report(String label, ResultSet resultSet) throws
SQLException
     {
         System.out.println(String.format("%s: {", label));
         while (resultSet.next()) {
             int id = resultSet.getInt(1);
             String x = resultSet.getString(2);
             System.out.println(String.format("%s: %s (%s backslashes)", id, x,
backslashes(x)));
         }
         System.out.println("}");
     }

     private static String toString(byte[] x)
     {
         StringBuffer buffer = new StringBuffer();
         for (byte b : x) {
             buffer.append((char)b);
         }
         return buffer.toString();
     }

     private static int backslashes(String x)
     {
         int count = 0;
         for (int i = 0; i < x.length(); i++) {
             if (x.charAt(i) == '\\') {
                 count++;
             }
         }
         return count;
     }
}

Re: LIKE with pattern containing backslash

From
"Daniel Verite"
Date:
    Jack Orenstein wrote:

> In this code, The string passed to JDBC is %\% (percent, one
> backslash, percent), and no rows are returned.
> It appears as if the one backslash is being treated as an
> escape for the %.

That's right. So \% as a pattern matches a percent, and \\ as a pattern
matches a backslash. This has nothing to do with literals and all to do
with the special role of backslash in LIKE.

From your program:

>       // Find rows with at least one backslash
>       String oneBackslash = new String(new char[]{'%', '\\',
'%'});

Actually, this will find rows ending with a percent.

>       // Find rows with at least two backslashes
>       String twoBackslashes = new String(new char[]{'%', '\\',
'\\', '%'});

Actually, this will find rows having at least one backslash.

The point is the lack of an ESCAPE clause to LIKE. Just add it to
disable the special role of backslash and the patterns will match as
you expect.

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org