Thread: tsearch filenames unlikes special symbols and numbers
Hello I am found small bug postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= 'cs_czutf'); ERROR: invalid text search configuration file name "cs_czutf" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= 'csczutf8'); ERROR: invalid text search configuration file name "csczutf8" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= "csczutf8"); ERROR: invalid text search configuration file name "csczutf8" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= "cs_czutf"); ERROR: invalid text search configuration file name "cs_czutf" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= "csczutf"); ERROR: could not open dictionary file "/usr/local/pgsql/share/tsearch_data/csczutf.dict": není souborem ani adresářem regards Pavel Stehule
I just tried on CVS HEAD and seems something is broken postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( TEMPLATE = ispell, DictFile = russian-utf8.dict, AffFile = russian-utf8.aff, StopWords = russian ); ERROR: syntax error at or near "-" LINE 3: DictFile = russian-utf8.dict, postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( TEMPLATE = ispell, DictFile = 'russian-utf8.dict', AffFile = 'russian-utf8.aff', StopWords = russian ); ERROR: invalid text search configuration file name "russian-utf8.dict" Honestly speaking, I have no time to follow constantly changed syntax, but documentation http://momjian.us/main/writings/pgsql/sgml/sql-createtsdictionary.html doesn't make clear what's wrong. Also, I'm wondering do we really need to show all schemas without text search configurations defined ? Looks rather stranger. postgres=# \dF List of text search configurations Schema | Name | Description --------------------+------------+--------------------------------------- information_schema | | pg_catalog | danish | Configuration for danish language pg_catalog | dutch | Configuration for dutch languagepg_catalog | english | Configuration for english language pg_catalog | finnish | Configurationfor finnish language pg_catalog | french | Configuration for french language pg_catalog | german | Configuration for german language pg_catalog | hungarian | Configuration for hungarian language pg_catalog | italian | Configuration for italian language pg_catalog | norwegian | Configuration for norwegianlanguage pg_catalog | portuguese | Configuration for portuguese language pg_catalog | romanian | Configuration for romanian language pg_catalog | russian | Configuration for russian language pg_catalog | simple | simple configuration pg_catalog | spanish | Configuration for spanish language pg_catalog | swedish | Configuration for swedish language pg_catalog | turkish | Configuration for turkishlanguage pg_temp_1 | | pg_toast | | pg_toast_temp_1 | | public | | (21 rows) Another problem I see are broken examples of dictionary and parser in documentation: http://momjian.us/main/writings/pgsql/sgml/textsearch-rule-dictionary-example.html http://momjian.us/main/writings/pgsql/sgml/textsearch-parser-example.html Include files in dictionary example are now in tsearch directory: #include "tsearch/ts_locale.h" #include "tsearch/ts_public.h" #include "tsearch/ts_utils.h" I didn't test parser example. Oleg PS. Sorry, I miss last syntax changes, but I really don't understand parenthesis and commas usage in SQL. It's so strange. I remember Peter raised an objections at the very beginning. On Sun, 2 Sep 2007, Pavel Stehule wrote: > Hello > I am found small bug > postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= 'cs_czutf');ERROR: invalid text search configurationfile name "cs_czutf"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= 'csczutf8');ERROR: invalid text search configuration file name "csczutf8"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE= ispell,DictFile= "csczutf8");ERROR: invalid text search configuration file name "csczutf8"postgres=# CREATETEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "cs_czutf");ERROR: invalid text search configuration file name"cs_czutf"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "csczutf");ERROR: could not opendictionary file"/usr/local/pgsql/share/tsearch_data/csczutf.dict": nen? souborem aniadres??em > regardsPavel Stehule > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( > TEMPLATE = ispell, > DictFile = 'russian-utf8.dict', > AffFile = 'russian-utf8.aff', > StopWords = russian > ); > ERROR: invalid text search configuration file name "russian-utf8.dict" I made it reject all but latin letters, which is the same restriction that's in place for timezone set filenames. That might be overly strong, but we definitely have to forbid "." and "/" (and "\" on Windows). Do we want to restrict it to letters, digits, underscore? Or does it need to be weaker than that? > Also, I'm wondering do we really need to show all schemas without > text search configurations defined ? Looks rather stranger. Um ... I don't see that; I get regression=# \dF List of text search configurations Schema | Name | Description ------------+------------+---------------------------------------pg_catalog | danish | Configuration for danish languagepg_catalog| dutch | Configuration for dutch languagepg_catalog | english | Configuration for english languagepg_catalog| finnish | Configuration for finnish languagepg_catalog | french | Configuration for french languagepg_catalog| german | Configuration for german languagepg_catalog | hungarian | Configuration for hungarian languagepg_catalog| italian | Configuration for italian languagepg_catalog | norwegian | Configuration for norwegianlanguagepg_catalog | portuguese | Configuration for portuguese languagepg_catalog | romanian | Configuration forromanian languagepg_catalog | russian | Configuration for russian languagepg_catalog | simple | simple configurationpg_catalog| spanish | Configuration for spanish languagepg_catalog | swedish | Configuration for swedishlanguagepg_catalog | turkish | Configuration for turkish language (16 rows) Are you sure you're using CVS-head psql? > Another problem I see are broken examples of dictionary and parser in > documentation: > http://momjian.us/main/writings/pgsql/sgml/textsearch-rule-dictionary-example.html > http://momjian.us/main/writings/pgsql/sgml/textsearch-parser-example.html Yeah, I wanted to discuss that with you. Code examples in sgml docs are a bad idea: they're impossible to use as actual templates, because of all the weird markup changes, and there's no easy way to notice if they're broken. It would be better to remove these from the docs and set them up as contrib modules. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Oleg Bartunov <oleg@sai.msu.su> writes: >> postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( >> TEMPLATE = ispell, >> DictFile = 'russian-utf8.dict', >> AffFile = 'russian-utf8.aff', >> StopWords = russian >> ); >> ERROR: invalid text search configuration file name "russian-utf8.dict" > > I made it reject all but latin letters, which is the same restriction > that's in place for timezone set filenames. That might be overly > strong, but we definitely have to forbid "." and "/" (and "\" on > Windows). Do we want to restrict it to letters, digits, underscore? > Or does it need to be weaker than that? What's the problem with "."? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I made it reject all but latin letters, which is the same restriction >> that's in place for timezone set filenames. That might be overly >> strong, but we definitely have to forbid "." and "/" (and "\" on >> Windows). Do we want to restrict it to letters, digits, underscore? >> Or does it need to be weaker than that? > What's the problem with "."? ../../../../etc/passwd Possibly we could allow '.' as long as we forbade /, but the other trouble with allowing . is that it encourages people to try to specify the filetype suffix (as indeed Oleg was doing). I'd prefer to keep the suffixes out of the SQL object definitions, with an eye to possibly someday migrating all the configuration data inside the database. There's a reasonable argument for restricting the names used for these things in the SQL definitions to be valid SQL identifiers, so that that will work nicely... regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> I made it reject all but latin letters, which is the same restriction >>> that's in place for timezone set filenames. That might be overly >>> strong, but we definitely have to forbid "." and "/" (and "\" on >>> Windows). Do we want to restrict it to letters, digits, underscore? >>> Or does it need to be weaker than that? > >> What's the problem with "."? > > ../../../../etc/passwd > > Possibly we could allow '.' as long as we forbade /, Right, traditionally the only characters forbidden in filenames in Unix are / and nul. If we want the files to play nice in Gnome etc then we should restrict them to ascii since we don't know what encoding the gui expects. Actually I think in Windows \ : and . are problems (not allowed more than one dot in dos). > There's a reasonable argument for restricting the names used for these > things in the SQL definitions to be valid SQL identifiers, so that that > will work nicely... Ah -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 9/2/07, Gregory Stark <stark@enterprisedb.com> wrote: > Right, traditionally the only characters forbidden in filenames in Unix are / > and nul. If we want the files to play nice in Gnome etc then we should > restrict them to ascii since we don't know what encoding the gui expects. > > Actually I think in Windows \ : and . are problems (not allowed more than one > dot in dos). Reserved characters in Windows filenames are < > : " / \ | ? * DOS limitations aren't relevant on the OS versions Postgres supports. ...but I thought this was about opening existing files, not creating them, in which case the only relevant limitation is path separators. Any other reserved characters are going to result in no open file, rather than a security hole.
On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > > Gregory Stark <stark@enterprisedb.com> writes: > >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >>> I made it reject all but latin letters, which is the same restriction > >>> that's in place for timezone set filenames. That might be overly > >>> strong, but we definitely have to forbid "." and "/" (and "\" on > >>> Windows). Do we want to restrict it to letters, digits, underscore? > >>> Or does it need to be weaker than that? > > > >> What's the problem with "."? > > > > ../../../../etc/passwd > > > > Possibly we could allow '.' as long as we forbade /, > > Right, traditionally the only characters forbidden in filenames in Unix are / > and nul. If we want the files to play nice in Gnome etc then we should > restrict them to ascii since we don't know what encoding the gui expects. > > Actually I think in Windows \ : and . are problems (not allowed more than one > dot in dos). \ and : are problems. . is not a problem. We don't support 16-bit windows anyway, and multiple dots works fine on any system we support. //Magnus
Magnus Hagander <magnus@hagander.net> writes: > On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote: >> Actually I think in Windows \ : and . are problems (not allowed more >> than one dot in dos). > \ and : are problems. Is : really a problem, given that the name in question will be appended to a known directory's path? > . is not a problem. We don't support 16-bit windows anyway, and multiple > dots works fine on any system we support. I'm not convinced that . is issue-free. On most if not all versions of Unix, you are allowed to open a directory as a file and read the filenames it contains. While I don't say it'd be easy to manage that through tsearch, there's at least a potential for discovering the filenames present in . and .. --- how much do we care about that? regards, tom lane
On Mon, Sep 03, 2007 at 09:27:19AM -0400, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote: > >> Actually I think in Windows \ : and . are problems (not allowed more > >> than one dot in dos). > > > \ and : are problems. > > Is : really a problem, given that the name in question will be appended > to a known directory's path? Yes. It won't work - the API calls will reject it. > > . is not a problem. We don't support 16-bit windows anyway, and multiple > > dots works fine on any system we support. > > I'm not convinced that . is issue-free. On most if not all versions of Unix, > you are allowed to open a directory as a file and read the filenames it > contains. While I don't say it'd be easy to manage that through > tsearch, there's at least a potential for discovering the filenames > present in . and .. --- how much do we care about that? I just meant that it's not a problem on Win32 to have a file with multiple dots in the name. There can certainly be *other* reasons for it. I don't really see the need to have an extra dot in the filename in this particular case, so I'd certainly be fine with restricting this one a lot more. //Magnus
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > I'm not convinced that . is issue-free. On most if not all versions of Unix, > you are allowed to open a directory as a file and read the filenames it > contains. While I don't say it'd be easy to manage that through > tsearch, there's at least a potential for discovering the filenames > present in . and .. --- how much do we care about that? Actually I don't think that's true any more, most file systems on most Unixen do not allow it. However it appears it's still the case for Solaris so it's still a good point. I'm sure it's not true for modern versions of Linux and I thought it was false for other modern OSes -- I'm surprised it's not for Solaris even. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I'm not convinced that . is issue-free. On most if not all versions of Unix, >> you are allowed to open a directory as a file and read the filenames it >> contains. While I don't say it'd be easy to manage that through >> tsearch, there's at least a potential for discovering the filenames >> present in . and .. --- how much do we care about that? > Actually I don't think that's true any more, most file systems on most Unixen > do not allow it. However it appears it's still the case for Solaris so it's > still a good point. Actually, now that I've woken up a bit more, it is not a problem as long as the tsearch code always appends some kind of file extension to what the user gives, such as ".dict". It'll be impossible to name "." or ".." with that addition. Also, Magnus says that Windows throws an error for ":" in the filename, which means we needn't. So the bottom line seems to be that rejecting directory separators is sufficient to prevent any unwanted file accesses. It might still be a good idea to restrict the names to be SQL identifiers (ie, alphanumerics and underscores) for future-proofing, but it wasn't clear whether anyone but me thought that was a good argument. I'm willing to make it just be no-dir-separators. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > It might still be a good idea to restrict the names to be SQL > identifiers (ie, alphanumerics and underscores) for future-proofing, > but it wasn't clear whether anyone but me thought that was a good > argument. I'm willing to make it just be no-dir-separators. I thought that was a good argument actually. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: <blockquote cite="mid:1692.1188826039@sss.pgh.pa.us" type="cite"><pre wrap="">Magnus Hagander <a class="moz-txt-link-rfc2396E"href="mailto:magnus@hagander.net"><magnus@hagander.net></a> writes: </pre><blockquotetype="cite"><pre wrap="">On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote: </pre><blockquotetype="cite"><pre wrap="">Actually I think in Windows \ : and . are problems (not allowed more than one dot in dos).</pre></blockquote></blockquote><blockquote type="cite"><pre wrap="">\ and : are problems. </pre></blockquote><prewrap="">Is : really a problem, given that the name in question will be appended to a known directory's path? </pre></blockquote> The file name shouldn't have a ':' in it. Accessing a path with multiple':' in it to open a file for reading should just fail normally. So yes, there should be no problem.<br /><br /><blockquotecite="mid:1692.1188826039@sss.pgh.pa.us" type="cite"><blockquote type="cite"><pre wrap="">. is not a problem.We don't support 16-bit windows anyway, and multiple dots works fine on any system we support. </pre></blockquote><pre wrap="">I'm not convinced that . is issue-free. On mostif not all versions of Unix, you are allowed to open a directory as a file and read the filenames it contains. While I don't say it'd be easy to manage that through tsearch, there's at least a potential for discovering the filenames present in . and .. --- how much do we care about that? </pre></blockquote> No more than discovering the file names in anyother directory without using '.' or '..'? If it matters, check to ensure it is a regular file before opening it?<br /><br/> Cheers,<br /> mark<br /><br /><pre class="moz-signature" cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
Tom Lane wrote: > Also, ____ says that Windows throws an error for ":" in the filename, > which means we needn't. > > Windows doesn't fail - but it can do odd things. For example, try: C:\> echo hi >foo:bar If one then checks the directory, one finds a "foo". Depending on *which* API one uses, the rules may change around a bit - but whatever the situation, as long as you prefix it with a valid path, the ":" is not going to cause you problems. > It might still be a good idea to restrict the names to be SQL > identifiers (ie, alphanumerics and underscores) for future-proofing, > but it wasn't clear whether anyone but me thought that was a good > argument. I'm willing to make it just be no-dir-separators. > I think it is a good argument. Cheers, mark -- Mark Mielke <mark@mielke.cc>
On 9/3/07, Mark Mielke <mark@mark.mielke.cc> wrote: > Tom Lane wrote: > > Also, ____ says that Windows throws an error for ":" in the filename, > > which means we needn't. > Windows doesn't fail - but it can do odd things. For example, try: > > C:\> echo hi >foo:bar > > If one then checks the directory, one finds a "foo". : is used for naming streams and attribute types in NTFS filenames. It's not very well-known functionality and tends to confuse people, but I'm not aware of any situation where it'd be a problem for read access. (Creation is not a security risk in the technical sense, but as most administrators aren't aware of alternate data streams and the shell does not expose them, it's effectively hidden data.) If any of you are familiar with MacOS HFS resource forks, NTFS basically supports an arbitrary number of named forks. A file is collection of one or more data streams, the single unnamed stream being default.
Moving to -docs On Sun, Sep 02, 2007 at 06:46:11PM -0400, Tom Lane wrote: > > Another problem I see are broken examples of dictionary and parser in > > documentation: > > http://momjian.us/main/writings/pgsql/sgml/textsearch-rule-dictionary-example.html > > http://momjian.us/main/writings/pgsql/sgml/textsearch-parser-example.html > > Yeah, I wanted to discuss that with you. Code examples in sgml docs are > a bad idea: they're impossible to use as actual templates, because of > all the weird markup changes, and there's no easy way to notice if > they're broken. It would be better to remove these from the docs and > set them up as contrib modules. Couldn't we come up with some method of specifying code examples in the docs and then having the doc build process actually run those examples and put that into the doc build? I wrote some code that does this back when I was thinking about writing a book, if anyone wants to see it. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
Trevor Talbot wrote: > On 9/3/07, Mark Mielke <mark@mark.mielke.cc> wrote: >> Tom Lane wrote: >>> Also, ____ says that Windows throws an error for ":" in the filename, >>> which means we needn't. > >> Windows doesn't fail - but it can do odd things. For example, try: >> >> C:\> echo hi >foo:bar >> >> If one then checks the directory, one finds a "foo". > > : is used for naming streams and attribute types in NTFS filenames. > It's not very well-known functionality and tends to confuse people, > but I'm not aware of any situation where it'd be a problem for read > access. (Creation is not a security risk in the technical sense, but > as most administrators aren't aware of alternate data streams and the > shell does not expose them, it's effectively hidden data.) > > If any of you are familiar with MacOS HFS resource forks, NTFS > basically supports an arbitrary number of named forks. A file is > collection of one or more data streams, the single unnamed stream > being default. On MacOS (prior) to OSX, : was used as a directory seperator (Paths looked like "My Harddisk:My Folder:Somefile"). In OSX, "/" is used, but for backwards-compatibility the Finder translates "/" in filenames to ":". So, of you do for example "touch 'my:test'" on the shell, you see "my/test" in the Finder. Thats another argument for staying away from : in filenames. greetings, Florian Pflug
Tom Lane escribió: > Possibly we could allow '.' as long as we forbade /, but the other > trouble with allowing . is that it encourages people to try to specify > the filetype suffix (as indeed Oleg was doing). I'd prefer to keep the > suffixes out of the SQL object definitions, with an eye to possibly > someday migrating all the configuration data inside the database. > There's a reasonable argument for restricting the names used for these > things in the SQL definitions to be valid SQL identifiers, so that that > will work nicely... Well, if we were to use SQL identifiers, we couldn't forbade anything too much, seeing as almost anything can be used as an identifier, so long as it is properly quoted. But it seems to me like we could just pick an convenient subset which doesn't make any OS too angry about it (say, reject / \ . and :), and when we get to using actual SQL identifiers, we can enlarge the supported char set without creating any backwards-compatibility problem. On the other hand, this means the name has to be quoted if it would be quoted as an SQL identifier, right? -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo" (Barón Vladimir Harkonnen)
Alvaro Herrera <alvherre@commandprompt.com> writes: > On the other hand, this means the name has to be quoted if it would be > quoted as an SQL identifier, right? Something like that. I wasn't planning on rejecting uppercase letters, though, which would be necessary if you wanted to be strict about matching unquoted identifiers. There seems fairly clear use-case for allowing A-Z a-z 0-9 and underscore (while CVS head rejects 0-9 and underscore). There also seem to be good arguments for disallowing / \ : on various platforms, which leaves us with some other punctuation in question, as well as the whole matter of non-ASCII characters. I'm not sure whether we want to touch the idea of non-ASCII; comments? regards, tom lane
Tom Lane wrote: > I'm not sure whether we want to touch > the idea of non-ASCII; comments? Non-ASCII filenames sounds like recipe for problems to me. We don't know what encoding the filenames are in on disk. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 9/3/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > On the other hand, this means the name has to be quoted if it would be > > quoted as an SQL identifier, right? > > Something like that. I wasn't planning on rejecting uppercase letters, > though, which would be necessary if you wanted to be strict about > matching unquoted identifiers. > > There seems fairly clear use-case for allowing A-Z a-z 0-9 and > underscore (while CVS head rejects 0-9 and underscore). There also seem > to be good arguments for disallowing / \ : on various platforms, which > leaves us with some other punctuation in question, as well as the whole > matter of non-ASCII characters. I'm not sure whether we want to touch > the idea of non-ASCII; comments? The problem with allowing uppercase letters is that on some filesystems foo and Foo are the same file, and on others they are not.This may lead to obscure portability problems wherecode worked fine on Unix and then fails when the database is running on Windows. The approach that I'd suggest is allow a very restricted subset as an immediate solution (say a-z and 0-9), and plan to later allow arbitrary data to be passed in, then be encoded in some way before hitting disk. (And later need not be much later - such encodings are not that hard to write.) Cheers, Ben
On 9/3/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> I'm not convinced that . is issue-free. On most if not all versions of Unix, > >> you are allowed to open a directory as a file and read the filenames it > >> contains. While I don't say it'd be easy to manage that through > >> tsearch, there's at least a potential for discovering the filenames > >> present in . and .. --- how much do we care about that? > > > Actually I don't think that's true any more, most file systems on most Unixen > > do not allow it. However it appears it's still the case for Solaris so it's > > still a good point. > > Actually, now that I've woken up a bit more, it is not a problem as > long as the tsearch code always appends some kind of file extension > to what the user gives, such as ".dict". It'll be impossible to name > "." or ".." with that addition. I don't know what you're discussing well enough to know if this is relevant, but what you just said is not always true. If there is any way to pass arbitrary binary data into your function call, then someone can pass in a string with nul in it. When that hits the OS API, your appended .dict won't be seen as part of the filename. (This is a common security oversight when calling C APIs from higher-level languages such as Perl. See http://artofhacking.com/files/phrack/phrack55/P55-07.TXT for more.) [...] Cheers, Ben
"Ben Tilly" <btilly@gmail.com> writes: > I don't know what you're discussing well enough to know if this is > relevant, but what you just said is not always true. If there is any > way to pass arbitrary binary data into your function call, then > someone can pass in a string with nul in it. Not a problem here, because the passed-in data is considered nul-terminated already. (Sometimes, not being 8-bit-clean is an advantage...) regards, tom lane
"Ben Tilly" <btilly@gmail.com> writes: > On 9/3/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There seems fairly clear use-case for allowing A-Z a-z 0-9 and >> underscore (while CVS head rejects 0-9 and underscore). > The problem with allowing uppercase letters is that on some > filesystems foo and Foo are the same file, and on others they are not. > This may lead to obscure portability problems where code worked fine > on Unix and then fails when the database is running on Windows. Yeah, good point. So far it seems that a-z 0-9 and underscore cover the real use-cases, so what say we just allow those for now? It's a lot easier to loosen up later than tighten up ... regards, tom lane
2007/9/4, Tom Lane <tgl@sss.pgh.pa.us>: > "Ben Tilly" <btilly@gmail.com> writes: > > On 9/3/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> There seems fairly clear use-case for allowing A-Z a-z 0-9 and > >> underscore (while CVS head rejects 0-9 and underscore). > > > The problem with allowing uppercase letters is that on some > > filesystems foo and Foo are the same file, and on others they are not. > > This may lead to obscure portability problems where code worked fine > > on Unix and then fails when the database is running on Windows. > > Yeah, good point. So far it seems that a-z 0-9 and underscore cover the > real use-cases, so what say we just allow those for now? It's a lot > easier to loosen up later than tighten up ... > > regards, tom lane > It's system specific. I prefere a-z and A-Z. Clasic name for dictionaries combine lower and upper characters .. for czech cs_CZ_UTF8 etc. dictfile = cs_CZ_UTF8 ... automatic convert to cs_cz_utf8.dict dictfile = 'cs_CZ_UTF8' .. check and use cs_CZ_UTF8 Regards Pavel Stehule p.s. it's important on UNIX platforms and without any efect on windows.
"Pavel Stehule" <pavel.stehule@gmail.com> writes: > 2007/9/4, Tom Lane <tgl@sss.pgh.pa.us>: >> Yeah, good point. So far it seems that a-z 0-9 and underscore cover the >> real use-cases, so what say we just allow those for now? It's a lot >> easier to loosen up later than tighten up ... > It's system specific. I prefere a-z and A-Z. Clasic name for > dictionaries combine lower and upper characters .. for czech > cs_CZ_UTF8 etc. You're going to need to alter that habit anyway, because it's not appropriate to mention any specific encoding in the dictionary name. But on further thought it strikes me that insisting on all lower case doesn't eliminate case-sensitivity portability problems. For instance, suppose the given parameter is 'foo' and the actual file name is Foo.dict. This will work fine on Windows and will stop working when moved to Unix. So I'm not sure we really buy much by rejecting upper-case letters in the parameter --- all we do is constrain which side of the fence you have to fix any mismatches on. And we picked the side that only a DBA, rather than a plain SQL user, can fix. regards, tom lane
2007/9/4, Tom Lane <tgl@sss.pgh.pa.us>: > "Pavel Stehule" <pavel.stehule@gmail.com> writes: > > 2007/9/4, Tom Lane <tgl@sss.pgh.pa.us>: > >> Yeah, good point. So far it seems that a-z 0-9 and underscore cover the > >> real use-cases, so what say we just allow those for now? It's a lot > >> easier to loosen up later than tighten up ... > > > It's system specific. I prefere a-z and A-Z. Clasic name for > > dictionaries combine lower and upper characters .. for czech > > cs_CZ_UTF8 etc. > > You're going to need to alter that habit anyway, because it's not > appropriate to mention any specific encoding in the dictionary name. > > But on further thought it strikes me that insisting on all lower case > doesn't eliminate case-sensitivity portability problems. For instance, > suppose the given parameter is 'foo' and the actual file name is > Foo.dict. This will work fine on Windows and will stop working when > moved to Unix. So I'm not sure we really buy much by rejecting > upper-case letters in the parameter --- all we do is constrain which > side of the fence you have to fix any mismatches on. And we picked the > side that only a DBA, rather than a plain SQL user, can fix. > ok. I can understand it. But I don't see sense of quoting of params Regards Pavel Stehule
On 9/4/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: [...] > But on further thought it strikes me that insisting on all lower case > doesn't eliminate case-sensitivity portability problems. For instance, > suppose the given parameter is 'foo' and the actual file name is > Foo.dict. This will work fine on Windows and will stop working when > moved to Unix. So I'm not sure we really buy much by rejecting > upper-case letters in the parameter --- all we do is constrain which > side of the fence you have to fix any mismatches on. And we picked the > side that only a DBA, rather than a plain SQL user, can fix. True, only a DBA can fix it. But only a DBA can screw it up. That seems reasonable to me. Furthermore fixing this mistake at the plain SQL user level in reality means auditing a code base for the construct, which is never fun. However if you wish to be paranoid, I believe that all filesystems of interest to PostgreSQL are at least case preserving. In which case on case sensitive filesystems you could check that the case of the stored filename matches what you want it to be. Now the problem of the filename having the case wrong can be detected on both Windows and Unix. Of course that check is a complication and slows things down. If all dictionary files have to be in a fixed directory, then you can easily add a cron job that scans that directory and fixes the case of any dictionary files that have upper case letters in their names. (Beware, there was once a bug in Windows where renaming Foo to foo accidentally deleted the file. It is therefore safer to rename Foo to bar then bar to foo. However this is a moot point since I doubt that anyone would actually run a brand new PostgreSQL database on an early version of NT 4.0...) Cheers, Ben
On Sun, 2 Sep 2007, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> I made it reject all but latin letters, which is the same restriction >>> that's in place for timezone set filenames. That might be overly >>> strong, but we definitely have to forbid "." and "/" (and "\" on >>> Windows). Do we want to restrict it to letters, digits, underscore? >>> Or does it need to be weaker than that? > >> What's the problem with "."? > > ../../../../etc/passwd > > Possibly we could allow '.' as long as we forbade /, but the other > trouble with allowing . is that it encourages people to try to specify > the filetype suffix (as indeed Oleg was doing). I'd prefer to keep the > suffixes out of the SQL object definitions, with an eye to possibly > someday migrating all the configuration data inside the database. > There's a reasonable argument for restricting the names used for these > things in the SQL definitions to be valid SQL identifiers, so that that > will work nicely... So, what's the current policy ? Still a-z, A-Z ? I think we should allow '.' and prevent '/'. Look, how ugly is our current ispell setup, which depends on 3 files - stop word list, .dict and .aff. Right now, I can use something like CREATE TEXT SEARCH DICTIONARY en_ispell ( TEMPLATE = ispell, DictFile = englishDict, AffFile = englishAff, StopWords = english ); I'd better use english.dict, english.aff, english.stop, whih is usual for any user, without dictating user here. We already did a lot of restrictions. I hope we won't require special extension like .dict, .aff, since it's unknown in advance what files will use other dictionaries. If we allow '.' without '/', then we'd be happy. I'd remove requirement for extension of stop words list, which looks rather artificially to me. Oh, my god, I see we dictate extensions ! STATEMENT: CREATE TEXT SEARCH DICTIONARY en_ispell ( TEMPLATE = ispell, DictFile = englishDict, AffFile = englishAff, StopWords = englishStop ); ERROR: could not open dictionary file "/usr/local/pgsql-dev/share/tsearch_data/englishdict.dict": No such file or directory Folk, this is too much ! Now, we dictate extensions '.dict, .affix, .stop', what else ? Does it defined by ispell template only, or it's global requirements ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Sun, 9 Sep 2007, Oleg Bartunov wrote: > Oh, my god, I see we dictate extensions ! > > STATEMENT: CREATE TEXT SEARCH DICTIONARY en_ispell ( > TEMPLATE = ispell, > DictFile = englishDict, > AffFile = englishAff, > StopWords = englishStop > ); > ERROR: could not open dictionary file > "/usr/local/pgsql-dev/share/tsearch_data/englishdict.dict": No such file or > directory > > Folk, this is too much ! Now, we dictate extensions '.dict, .affix, .stop', > what else ? I notice, that documentation doesn't mention about this http://momjian.us/main/writings/pgsql/sgml/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > Oh, my god, I see we dictate extensions ! > Folk, this is too much ! Now, we dictate extensions '.dict, .affix, .stop', > what else ? > Does it defined by ispell template only, or it's global requirements ? It's the callers of get_tsearch_config_filename() that specify the extension, so AFAICS each dictionary can do what it wants. I don't see the problem with enforcing an extension: it keeps the namespaces for different kinds of files separate, and it gets us out of the potential security risk of allowing access to "." or "..". I remain of the opinion that we don't really want the SQL-command definitions of dictionaries to expose the fact that these are files at all. We should be thinking of the command parameters as identifiers. regards, tom lane