Thread: Re: [JDBC] Selecting Varchar range (through JDBC).

Re: [JDBC] Selecting Varchar range (through JDBC).

From
Daryl Beattie
Date:
Dear PostgreSQL people,

    A little while ago I posted a message about selecting a varchar
range through JDBC. Thomas was nice enough to give me some pointers, and I
went back to the database and tested some things out. The clincher came when
I realized that one of my JUnit tests was actually passing using the varchar
range; some of the regular expressions worked, while others did not.
    What I found is that using the collating elements in regex
comparisons does not work if the collating arguments are more than one
character in length. According to
http://www.postgresql.org/idocs/index.php?functions-matching.html#FUNCTIONS-
REGEXP, regular expressions should work with multi-character collating
arguments. I did a bunch of tests at the PostgreSQL command-line to verify
this finding. Here are some highlights:

testdb=# select 'test' ~* '^[[.a.]-[.z.]]';
 ?column?
----------
 t
(1 row)

testdb=# select 'test' ~* '^[[.aa.]-[.za.]]';
ERROR:  Invalid regular expression: invalid collating element
testdb=# select 'test' ~* '^[[.a.]-[.za.]]';
ERROR:  Invalid regular expression: invalid collating element
testdb=# select 'test' ~* '^[[.aa.]-[.z.]]';
ERROR:  Invalid regular expression: invalid collating element
testdb=# select 'chchcc' ~ '[[.ch.]]*c';
ERROR:  Invalid regular expression: invalid collating element


    That last test was taken straight from the PostgreSQL interactive
documentation at the link that I provided above. The documentation states
that it should match.
    Is this a bug? How do I get around it? I'd appreciate any advice.

Sincerely,

    Daryl.

P.S. I'm using PostgreSQL 7.2.1.
P.P.S. I changed my regex's slightly in the code found far below (as per
Thomas' suggestions) so that they read '^[[.testArtist1.]-[.testArtist2.]]'
instead of '[[.testArtist1.]-[.testArtist2.]].*'. I believe both should
work, but the one Thomas suggested should be faster.


> -----Original Message-----
> From: Daryl Beattie [mailto:dbeattie@insystems.com]
> Sent: Friday, August 30, 2002 9:18 AM
> To: pgsql-general@postgresql.org; pgsql-jdbc@postgresql.org
> Subject: Re: [JDBC] [GENERAL] Selecting Varchar range (through JDBC).
>
>
> Dear PostgreSQL people,
>
>     Thanks for your help from before, Vince. I didn't get around to
> attempting to implement that varchar range function until
> last night, and
> when I did I ran into some problems. Whenever I tried to execute the
> statement:
>
> SELECT ArtistId FROM Artist WHERE SortName ~*
> '[[.testArtist1.]-[.testArtist2.]].*'
>
>     I get the following JDBC error:
>
> javax.ejb.FinderException: SQLException while finding Artists:
> java.sql.SQLException: ERROR:  Invalid regular expression:
> invalid collating
> element
>
>     The SQLException is coming from the database, however,
> it MAY be a
> JDBC driver issue with it not escaping the statement properly.
>     My suspicion, however, is that I'm simply not using the regular
> expressions correctly. Can anybody kindly explain to me how to do this
> correctly? (I am trying to search a varchar range, for
> example, any Artists
> whose names start with the letters A to C.)
>     I have attached my Java code to the bottom of this e-mail for
> reference.
>     For your convenience, here's the link to the idocs that
> describes
> regex functionality in PostgreSQL:
> http://www.postgresql.org/idocs/index.php?functions-matching.h
> tml#FUNCTIONS-
> REGEXP
>
> Sincerely,
>
>     Daryl.
>
> P.S. Some notes:
>     - I'm using PostgreSQL 7.2.1 on RH Linux 7.2, and the
> latest binary
> PostgreSQL JDBC driver.
>     - It doesn't matter what I put in the [. .], it never
> seems to work
> any better. For example, a good test would be WHERE SortName ~*
> '[[.A.]-[.C.]].*'
>     - I tried using a straight Statement-from-a-query-string that I
> constructed instead of a PreparedStatement (to bypass any
> automatic escaping
> done by the PreparedStatement), and got the same results.
>
>
> > -----Original Message-----
> > From: Vince Vielhaber [mailto:vev@michvhf.com]
> > Sent: Monday, August 12, 2002 8:49 AM
> > To: Daryl Beattie
> > Cc: pgsql-general@postgresql.org
> > Subject: Re: [GENERAL] Selecting Varchar range.
> >
> >
> > On Mon, 12 Aug 2002, Daryl Beattie wrote:
> >
> > > Dear PostgreSQL people,
> > >
> > >     I didn't find this in the archives or the idocs, so...
> > >
> > >     How does one select a varchar range? The meaning of my
> > question is
> > > best illustrated by the following completely hypothetical
> > and totally
> > > imaginary situation:
> > >     Let's say I have a table of musical artists:
> > >
> > > CREATE TABLE Artists (
> > >     Id SERIAL PRIMARY KEY,
> > >     Name VARCHAR(100)
> > > );
> > >
> > >     And I want to eventually display only the artists from
> > 'Aa' to 'Am'.
> > > Naturally doing a select where Name like 'A%' will not
> > return me what I
> > > want.
> > >     Any ides on how to do this? [Will "SELECT Name FROM
> > Artists WHERE
> > > Name > 'Aa%' AND Name < 'Am%';" work?]
> > >
> > >     Bonus question: How do I select artists whose names start with a
> > > number?
> >
> > Think REGEX.    A[a-m]%   [0-9]%
> >
> > If any of your artists are indies, send them to
> www.unknown-artists.com
> >
> > Vince.
>
>
>
>
>
> /**
>  * Finds all of the Artists whose sort-names fall within a
> particular range,
>  * and returns them in a Collection. An example of this would
> be to ask for
>  * all of the artists that are within "Aa" and "Cz", and it
> would return all
>  * the Artists that start with A, B, and C. Case is ignored
> for the search,
>  * but <i>be careful</i> because some bands could start with numbers!
>  *
>  * @param start The start String of the Artists we want to find.
>  * @param end The end String of the Artists we want to find.
>  * @return A Collection containing all the Artists whose
> sort-names fall
>  * within a particular range.
>  * @throws FinderException If there was a problem finding
> such Artists.
>  */
> public Collection
> ejbFindArtistsWhoseSortNamesAreWithin(String start, String
> end) throws FinderException {
>     ArrayList artists = new ArrayList(); // All of the Artists.
>     Connection dbConnection = null;      // The database connection.
>     PreparedStatement ps = null;         // The
> statement(s) to execute.
>     ResultSet resultSet = null;          // The results of
> the query.
>
>     if ((start.indexOf("-") > 0) || (start.indexOf("[") > 0) ||
> (start.indexOf("]") > 0) || (end.indexOf("-") > 0) ||
> (end.indexOf("[") > 0)
> || (end.indexOf("]") > 0)) throw new FinderException("The
> start and end
> strings for findArtistsWhoseSortNamesAreWithin() cannot have
> '-', '[' or ']'
> in them.");
>     if ((start == null) || ("".equals(start)) || (end == null) ||
> ("".equals(end))) return artists;
>
>     // STEP 1: Get the database connection.
>     try { dbConnection = getConnection(); } catch
> (Exception e) { throw
> new EJBException(e.getMessage()); }
>
>     // STEP 2: Find the Artists.
>     try {
>         String queryStr = "SELECT " +
> DatabaseNames.ARTIST_ID + "
> FROM " +
>                 DatabaseNames.ARTIST_TABLE + " WHERE " +
>                 DatabaseNames.ARTIST_SORTNAME + " ~* ?";
>         ps = dbConnection.prepareStatement(queryStr);
>         ps.setString(1, "[[." + start + ".]-[." + end +
> ".]].*");
>         resultSet = ps.executeQuery();
>         ArtistPK artistPK = null;
>         while (resultSet.next()) {
>             artistPK = new ArtistPK();
>
> artistPK.setId(resultSet.getInt(DatabaseNames.ARTIST_ID));
>             artists.add(artistPK);
>         }
>     }
>     catch(SQLException e) {
>         context.setRollbackOnly();
>         throw new FinderException("SQLException while finding
> Artists:\n" + e);
>     }
>     finally {
>         try { closeResultSet(resultSet); } catch (Exception e) {
> log.warning("Failed to close ResultSet: " + e.getMessage()); }
>         try { closeStatement(ps); } catch (Exception e) {
> log.warning("Failed to close Statement: " + e.getMessage()); }
>         try { closeConnection(dbConnection); } catch
> (Exception e) {
> log.warning("Failed to close Connection: " + e.getMessage()); }
>     }
>
>     // STEP 3: Return that Collection of ArtistPKs.
>     return artists;
> }
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

Re: [JDBC] Selecting Varchar range (through JDBC).

From
Stephan Szabo
Date:
Stephan Szabo sszabo@bigpanda.com

On Tue, 10 Sep 2002, Daryl Beattie wrote:

> Dear PostgreSQL people,
>
>     A little while ago I posted a message about selecting a varchar
> range through JDBC. Thomas was nice enough to give me some pointers, and I
> went back to the database and tested some things out. The clincher came when
> I realized that one of my JUnit tests was actually passing using the varchar
> range; some of the regular expressions worked, while others did not.
>     What I found is that using the collating elements in regex
> comparisons does not work if the collating arguments are more than one
> character in length. According to
> http://www.postgresql.org/idocs/index.php?functions-matching.html#FUNCTIONS-
> REGEXP, regular expressions should work with multi-character collating
> arguments. I did a bunch of tests at the PostgreSQL command-line to verify
> this finding. Here are some highlights:
>
> testdb=# select 'test' ~* '^[[.a.]-[.z.]]';
>  ?column?
> ----------
>  t
> (1 row)
>
> testdb=# select 'test' ~* '^[[.aa.]-[.za.]]';
> ERROR:  Invalid regular expression: invalid collating element
> testdb=# select 'test' ~* '^[[.a.]-[.za.]]';
> ERROR:  Invalid regular expression: invalid collating element
> testdb=# select 'test' ~* '^[[.aa.]-[.z.]]';

Are you really in a locale where aa or za are collating elements?

> ERROR:  Invalid regular expression: invalid collating element
> testdb=# select 'chchcc' ~ '[[.ch.]]*c';
> ERROR:  Invalid regular expression: invalid collating element
>
>
>     That last test was taken straight from the PostgreSQL interactive
> documentation at the link that I provided above. The documentation states
> that it should match.

Note the caveat in the paragraph above.  If the collating sequence
includes a ch collating element...
It means if you're in a locale where ch collates as a single element you
can use .ch. to refer to that element.