Thread: text search and "filenames"

text search and "filenames"

From
Alvaro Herrera
Date:
Hi,

I noticed that the default parser does not recognize Windows-style
filenames:

alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\archivos');
   alias   |   description   |  token
-----------+-----------------+----------
 asciiword | Word, all ASCII | c
 blank     | Space symbols   | :\
 asciiword | Word, all ASCII | archivos
(3 lignes)

I played with it a bit (see attached patch -- basically I added \ in all
places where a / was being parsed, in the file-path states) and managed
to have it parse some naive versions, like

alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\archivos\\foo');
 alias |    description    |      token
-------+-------------------+-----------------
 file  | File or path name | c:\archivos\foo
(1 ligne)

However it fails as soon as you have a space, which is quite common on
Windows, for example

alvherre=# SELECT alias, description, token FROM ts_debug(e'c:\\Program Files\\');
   alias   |    description    |   token
-----------+-------------------+------------
 file      | File or path name | c:\Program
 blank     | Space symbols     |
 asciiword | Word, all ASCII   | Files
 blank     | Space symbols     | \
(4 lignes)

It also fails to recognize "network" file names, like

alvherre=# SELECT alias, description, token FROM ts_debug(e'\\\\server\\archivos\\foo');
   alias   |   description   |  token
-----------+-----------------+----------
 blank     | Space symbols   | \\
 asciiword | Word, all ASCII | server
 blank     | Space symbols   | \
 asciiword | Word, all ASCII | archivos
 blank     | Space symbols   | \
 asciiword | Word, all ASCII | foo
(6 lignes)

Is this something worth worrying about?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment

Re: text search and "filenames"

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I noticed that the default parser does not recognize Windows-style
> filenames:
> Is this something worth worrying about?

I'm not too excited about it.  The fact that there's a filename category
at all seems a bit of a wart to me, particularly since simple examples
like 'example.txt' don't get parsed that way.  I definitely don't see
any good way to allow spaces in Windows filenames...
        regards, tom lane