Thread: Simplifying Text Search
Something Tom Dunstan just mentioned has made me ask the question "Why does our full text search feature look so strange?". It's the operator-laden syntax that causes the problem. By any stretch, this query is difficult for most people to understand: SELECT * FROM text_table WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); Wouldn't it be much simpler to just have a function, so we can write this query like this? SELECT * FROM text_table WHERE text_search('haystack needle haystack', 'needle'); We then explain to people that while the above is nice, it will presume that both the function inputs are Text, which isn't any good for complex searches, indexing and dictionaries etc.., so then we move to: SELECT * FROM text_table WHERE text_search('haystack needle haystack'::tsvector, 'needle'::tsquery); or perhaps SELECT * FROM text_table WHERE full_text_search('haystack needle haystack', 'needle & hay'); which would automatically do the conversions to tsvector and tsquery for us. No more tedious casting, easy to read. [*text_search() functions would return bool] So we end up with a normal sounding function that is overloaded to provide all of the various goodies. We can include the text_search(text, text) version of the function in the normal chapter on functions, with a pointer to the more complex stuff elsewhere. Sound good? We can then explain everything without having to use @@ operators. They can then be introduced as an option. The side benefit of this is that we can then allow our wonderful new functionality to be more easily usable by things like Hibernate. We just tell them we have this new function and thats all they need to know. I know that under the covers the @@ operator is necessary because we hang various pieces of optimizer information from it. Each function signature gets an operator with matching signature, so there's a 1:1 correspondence in most use cases. So to make this all hang together, there'd need to be a some smarts that says: if there is only one operator on a function then use the operator's optimizer information when you see just the function. That information can be assessed at DDL time, so we can keep accurate track of operator counts in pgproc. An alternative approach might be to make the first operator created on a function the "primary" operator. All other operators would then be secondary operators, so that adding operators would not change the inference mechanism. I've not got sufficient knowledge to say how hard the function-to-operator inference is, but it would be dang useful in making text search and many other programs readable and easy to interface to. In the end that is going to mean wider usage of that functionality, with more people feeling like they can dip their toes into the water. I must confess I have insufficient time to do this myself right now, not least me discovering exactly how. I'm spending time on this now because I'm the one that has to explain this stuff to people and things like this can make a huge difference in their understanding and eventual uptake. Thoughts? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Mon, Nov 12, 2007 at 03:00:36PM +0000, Simon Riggs wrote: > Something Tom Dunstan just mentioned has made me ask the question "Why > does our full text search feature look so strange?". It's the > operator-laden syntax that causes the problem. > > By any stretch, this query is difficult for most people to understand: > > SELECT * FROM text_table > WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); > > Wouldn't it be much simpler to just have a function, so we can write > this query like this? > > SELECT * FROM text_table > WHERE text_search('haystack needle haystack', 'needle'); Can't you do this with an SQL function that gets expanded inline? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Mon, 2007-11-12 at 16:28 +0100, Martijn van Oosterhout wrote: > On Mon, Nov 12, 2007 at 03:00:36PM +0000, Simon Riggs wrote: > > Something Tom Dunstan just mentioned has made me ask the question "Why > > does our full text search feature look so strange?". It's the > > operator-laden syntax that causes the problem. > > > > By any stretch, this query is difficult for most people to understand: > > > > SELECT * FROM text_table > > WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); > > > > Wouldn't it be much simpler to just have a function, so we can write > > this query like this? > > > > SELECT * FROM text_table > > WHERE text_search('haystack needle haystack', 'needle'); > > Can't you do this with an SQL function that gets expanded inline? Yep, we can. Good thinking. So the change is fairly trivial. What do you think of the proposal to make text search work this way and to document this more easily readable form? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Mon, Nov 12, 2007 at 03:48:20PM +0000, Simon Riggs wrote: > On Mon, 2007-11-12 at 16:28 +0100, Martijn van Oosterhout wrote: > > On Mon, Nov 12, 2007 at 03:00:36PM +0000, Simon Riggs wrote: > > > Something Tom Dunstan just mentioned has made me ask the question "Why > > > does our full text search feature look so strange?". It's the > > > operator-laden syntax that causes the problem. > > > > > > By any stretch, this query is difficult for most people to understand: > > > > > > SELECT * FROM text_table > > > WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); > > > > > > Wouldn't it be much simpler to just have a function, so we can write > > > this query like this? > > > > > > SELECT * FROM text_table > > > WHERE text_search('haystack needle haystack', 'needle'); > > > > Can't you do this with an SQL function that gets expanded inline? > > Yep, we can. Good thinking. So the change is fairly trivial. > > What do you think of the proposal to make text search work this way > and to document this more easily readable form? +1 for adding this. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Simon Riggs wrote: > Something Tom Dunstan just mentioned has made me ask the question "Why > does our full text search feature look so strange?". It's the > operator-laden syntax that causes the problem. > > By any stretch, this query is difficult for most people to understand: > > SELECT * FROM text_table > WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); > > Wouldn't it be much simpler to just have a function, so we can write > this query like this? > > SELECT * FROM text_table > WHERE text_search('haystack needle haystack', 'needle'); > > We then explain to people that while the above is nice, it will presume > that both the function inputs are Text, which isn't any good for complex > searches, indexing and dictionaries etc.., so then we move to: > > SELECT * FROM text_table > WHERE text_search('haystack needle haystack'::tsvector, > 'needle'::tsquery); > > or perhaps > > SELECT * FROM text_table > WHERE full_text_search('haystack needle haystack', 'needle & hay'); > > which would automatically do the conversions to tsvector and tsquery for > us. No more tedious casting, easy to read. There's a text @@ text operator, so you can write just: SELECT * FROM tstable where data @@ 'needle'; No need to cast. Unfortunately, that form can't use a GIN index, I think. But that's another issue, which I don't think your proposal would fix... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > So we end up with a normal sounding function that is overloaded to > provide all of the various goodies. As best I can tell, @@ does exactly this already. This is just a different spelling of the same capability, and I don't actually find it better. Why is "text_search(x,y)" better than "x @@ y"? We don't recommend that people write "texteq(x,y)" instead of "x = y". > Sound good? It's not an improvement, it's not compatible with what existing tsearch2 users are accustomed to, and it's several months too late... regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Simon Riggs <simon@2ndquadrant.com> writes: >> So we end up with a normal sounding function that is overloaded to >> provide all of the various goodies. > > As best I can tell, @@ does exactly this already. This is just a > different spelling of the same capability, and I don't actually > find it better. Why is "text_search(x,y)" better than "x @@ y"? > We don't recommend that people write "texteq(x,y)" instead of > "x = y". I agree, I find it odd to suggest that a function would be more natural than an operator. The main reason the non-core version of tsearch felt so much like an add-on was precisely that it had to use functions to interface with objects. That Postgres supports creating new operators is a strength which allows a lot more extensibility. And yet I agree that there's something awkward about the tsearch syntax. I'm not sure where the core of it comes from though, but I don't think it comes from the use of operators. Part of it is that "@@" isn't a familiar operator. I'm not even sure what to read it as. "Matches"? "Satisfies"? Perhaps we should think (at some point in the future) about some way of allowing alphabetic characters in operator names. Then you could write something like: col ~satisfies~ '1 & 2' (That exact syntax wouldn't work without removing ~ from the characters in normal operators so something with more finesse would be needed.) The other part of tsearch that seems somewhat awkward is just the very concept and syntax of tsqueries. But that seems pretty integral to the functionality and I don't see any way to avoid it. It's not entirely unlike the idea of regexps which I'm sure would seem unnatural if we were just meeting them with no background. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > So we end up with a normal sounding function that is overloaded to > > provide all of the various goodies. > > As best I can tell, @@ does exactly this already. This is just a > different spelling of the same capability, and I don't actually > find it better. Why is "text_search(x,y)" better than "x @@ y"? > We don't recommend that people write "texteq(x,y)" instead of > "x = y". Most people don't understand those differences. x = y means "make sure they are the same" to most people. They don't see what you (and I) see: function and operator interchangeability. So text_search() is better than @@ and = is better than texteq(). Life ain't neat... Right now, Full Text Search SQL looks like complete gibberish and it dissuades many people from using what is an awesome set of features. I just want to add a little sugar to help people get started. > > Sound good? > > It's not an improvement That is the very point of debate > it's not compatible with what existing tsearch2 > users are accustomed to @@ would still exist, so no problems. These additions are for new users, not old ones. > it's several months too late... True. I wish I'd thought of it before. I've waded through the syntax without thinking how to make it more easily readable and explainable. Damn. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote: >> Simon Riggs <simon@2ndquadrant.com> writes: >>> So we end up with a normal sounding function that is overloaded to >>> provide all of the various goodies. >> As best I can tell, @@ does exactly this already. This is just a >> different spelling of the same capability, and I don't actually >> find it better. Why is "text_search(x,y)" better than "x @@ y"? >> We don't recommend that people write "texteq(x,y)" instead of >> "x = y". > > Most people don't understand those differences. x = y means "make sure > they are the same" to most people. They don't see what you (and I) see: > function and operator interchangeability. So text_search() is better > than @@ and = is better than texteq(). Life ain't neat... > > Right now, Full Text Search SQL looks like complete gibberish and it > dissuades many people from using what is an awesome set of features. I > just want to add a little sugar to help people get started. Granted, @@ is a bit awkward until you get used to it. "x LIKE y" would read out better, but unfortunately that's already taken ;-). In any case, it's way too late. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Nov 12, 2007 at 08:09:48PM +0000, Simon Riggs wrote: > > @@ would still exist, so no problems. These additions are for new users, > not old ones. Given that this is all sugar on top of tsearch anyway, why not put it in pgfoundry as the tsearch_sugar project? Then packagers could include a standard set of such sugar if they wanted. A -- Andrew Sullivan Old sigs will return after re-constitution of blue smoke
Heikki Linnakangas wrote: > Granted, @@ is a bit awkward until you get used to it. "x LIKE y" would > read out better, but unfortunately that's already taken ;-). Actually LIKE does not make much sense when you have 'hay & needle'. Probably MATCHES would be a better term ... but then, MySQL defines a strange thing called MATCH/AGAINST; so apparently you use "MATCH (column list) AGAINST (pattern spec)" None of this is standard though ... -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Everybody understands Mickey Mouse. Few understand Hermann Hesse. Hardly anybody understands Einstein. And nobody understands Emperor Norton."
* Heikki Linnakangas <heikki@enterprisedb.com> [071112 15:18]: > Simon Riggs wrote: > >Right now, Full Text Search SQL looks like complete gibberish and it > >dissuades many people from using what is an awesome set of features. I > >just want to add a little sugar to help people get started. > Granted, @@ is a bit awkward until you get used to it. "x LIKE y" would > read out better, but unfortunately that's already taken ;-). Can LIKE be easily overloaded in the parser? So:text LIKE text works in it's current form, andtsvector LIKE tsquery also works like the @@? Or have I gotten all the ts* types all mixed up again... But it doesn't buy anything except avoiding the "@@" that people seem to not grok easily, and it might actually cause more grief, because of people confusing the 2 forms of LIKE. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Mon, 2007-11-12 at 20:17 +0000, Heikki Linnakangas wrote: > Granted, @@ is a bit awkward until you get used to it. "x LIKE y" would > read out better, but unfortunately that's already taken ;-). Remember, I'm not suggesting we get rid of @@ > In any case, it's way too late. I'm suggesting we add a couple of simple SQL functions that will help text search docs be more easily understood. It's beta and its valid to respond to usability issues just as we would respond to code bugs. Otherwise, why have beta? Late, but not too late. SQLServer, Oracle and MySQL all use functions, not operators. My observation would be that we have the hardest and most difficult to understand full text search capability. The Contains() function seems like a better name than I gave earlier also. I love what we've done; I just want more people be able to use it. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Hello look to standard, please. SQL/MM has part - full text. SELECT docno FROM information WHERE document.CONTAINS('STEMMED FORM OF "standard"IN SAME PARAGRAPH ASSOUNDS LIKE "sequel"') = 1 it's little bit baroque, It's sample of method. So,it can be: SELECT .. FROM x.contains(y); It's well readable and elegant too. Regards Pavel Stehule
Simon Riggs wrote: > On Mon, 2007-11-12 at 20:17 +0000, Heikki Linnakangas wrote: > >> Granted, @@ is a bit awkward until you get used to it. "x LIKE y" would >> read out better, but unfortunately that's already taken ;-). > > Remember, I'm not suggesting we get rid of @@ > >> In any case, it's way too late. > > I'm suggesting we add a couple of simple SQL functions that will help > text search docs be more easily understood. > > It's beta and its valid to respond to usability issues just as we would > respond to code bugs. Otherwise, why have beta? Late, but not too late. > > SQLServer, Oracle and MySQL all use functions, not operators. My > observation would be that we have the hardest and most difficult to > understand full text search capability. The Contains() function seems > like a better name than I gave earlier also. > > I love what we've done; I just want more people be able to use it. Hmmm, my choices are: SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector @@ 'cat & rat'::tsquery; Or: SELECT ts_match('a fat cat sat on a mat and ate a fat rat','cat & rat'); This seems a little too much like the "duh" department to ignore. A set of SQL functions would certainly be appropriate here. Sincerely, Joshua D. Drake
On Mon, Nov 12, 2007 at 03:44:18PM -0500, Aidan Van Dyk wrote: > Can LIKE be easily overloaded in the parser? So: > text LIKE text > works in it's current form, and > tsvector LIKE tsquery > also works like the @@? Or have I gotten all the ts* types all mixed up > again... AIUI LIKE is mashed into an operator at parse time, so yes, if you create the operator with the right name it will just work. Or not (I havn't tested it). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Mon, 2007-11-12 at 21:59 +0100, Pavel Stehule wrote: > SELECT docno > FROM information > WHERE document.CONTAINS > ('STEMMED FORM OF "standard" > IN SAME PARAGRAPH AS > SOUNDS LIKE "sequel"') = 1 > > it's little bit baroque, It's sample of method. Seems thats the way Oracle does it too. The SQLServer syntax is WHERE contains(text_column, search_query) which seems marginally better. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Pavel Stehule escribió: > Hello > > look to standard, please. SQL/MM has part - full text. Huh, what version of the standard is this? My copy (the typical 2003 draft) doesn't have SQL/MM AFAICS. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "La espina, desde que nace, ya pincha" (Proverbio africano)
On 12/11/2007, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Pavel Stehule escribió: > > Hello > > > > look to standard, please. SQL/MM has part - full text. > > Huh, what version of the standard is this? My copy (the typical 2003 > draft) doesn't have SQL/MM AFAICS. > > I found http://jtc1sc32.org/doc/N0751-0800/32N0771T.pdf http://www.sigmod.org/record/issues/0112/standards.pdf http://dbs.uni-leipzig.de/file/kap5.pdf Pavel > -- > Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 > "La espina, desde que nace, ya pincha" (Proverbio africano) >
Simon Riggs wrote: > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > So we end up with a normal sounding function that is overloaded to > > > provide all of the various goodies. > > > > As best I can tell, @@ does exactly this already. This is just a > > different spelling of the same capability, and I don't actually > > find it better. Why is "text_search(x,y)" better than "x @@ y"? > > We don't recommend that people write "texteq(x,y)" instead of > > "x = y". > > Most people don't understand those differences. x = y means "make sure > they are the same" to most people. They don't see what you (and I) see: > function and operator interchangeability. So text_search() is better > than @@ and = is better than texteq(). Life ain't neat... > > Right now, Full Text Search SQL looks like complete gibberish and it > dissuades many people from using what is an awesome set of features. I > just want to add a little sugar to help people get started. I realized this when editing the documentation but not clearly. I noticed that: http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING tsvector @@ tsquerytsquery @@ tsvectortext @@ tsquerytext @@ text The first two of these we saw already. The form text @@ tsquery isequivalent to to_tsvector(x) @@ y. The form text @@ text is equivalentto to_tsvector(x) @@ plainto_tsquery(y). was quite odd, especially the "text @@ text" case, and in fact it makes casting almost required unless you can remember which one is a query and which is a vector (hint, the vector is first). What really adds to the confusion is that the operator is two _identical_ characters, meaning the operator is symetric, and it behave symetric if you cast one side, but as vector @@ query if you don't. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, 2007-11-12 at 23:03 -0500, Bruce Momjian wrote: > Simon Riggs wrote: > > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote: > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > So we end up with a normal sounding function that is overloaded to > > > > provide all of the various goodies. > > > > > > As best I can tell, @@ does exactly this already. This is just a > > > different spelling of the same capability, and I don't actually > > > find it better. Why is "text_search(x,y)" better than "x @@ y"? > > > We don't recommend that people write "texteq(x,y)" instead of > > > "x = y". > > > > Most people don't understand those differences. x = y means "make sure > > they are the same" to most people. They don't see what you (and I) see: > > function and operator interchangeability. So text_search() is better > > than @@ and = is better than texteq(). Life ain't neat... > > > > Right now, Full Text Search SQL looks like complete gibberish and it > > dissuades many people from using what is an awesome set of features. I > > just want to add a little sugar to help people get started. > > I realized this when editing the documentation but not clearly. I > noticed that: > > http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING > > tsvector @@ tsquery > tsquery @@ tsvector > text @@ tsquery > text @@ text > > The first two of these we saw already. The form text @@ tsquery is > equivalent to to_tsvector(x) @@ y. The form text @@ text is equivalent > to to_tsvector(x) @@ plainto_tsquery(y). > > was quite odd, especially the "text @@ text" case, and in fact it makes > casting almost required unless you can remember which one is a query and > which is a vector (hint, the vector is first). What really adds to the > confusion is that the operator is two _identical_ characters, meaning > the operator is symetric, and it behave symetric if you cast one side, > but as vector @@ query if you don't. I'm thinking we can have an inlinable function contains(text, text) returns int Return values limited to just 0 or 1 or NULL, as with SQL/MM. It's close to SQL/MM, but not exact. contains(sourceText, searchText) is a macro for case to_tsvector(default_text_search_config, sourceText) @@ to_tsquery(default_text_search_config, searchText) when true then 1 when false then 0 else null end that allows us to write indexable queries like this WHERE contains(sourceText, searchText) > 0 where we must still have built the index on a constant config. Not checked that still works yet, maybe not, in which case something slightly more complex to make sure its still indexable. This is the difficult part. So changes are: - add SQL function - simplify first 2 pages of docs using this function -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On 13/11/2007, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2007-11-12 at 23:03 -0500, Bruce Momjian wrote: > > Simon Riggs wrote: > > > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote: > > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > > So we end up with a normal sounding function that is overloaded to > > > > > provide all of the various goodies. > > > > > > > > As best I can tell, @@ does exactly this already. This is just a > > > > different spelling of the same capability, and I don't actually > > > > find it better. Why is "text_search(x,y)" better than "x @@ y"? > > > > We don't recommend that people write "texteq(x,y)" instead of > > > > "x = y". > > > > > > Most people don't understand those differences. x = y means "make sure > > > they are the same" to most people. They don't see what you (and I) see: > > > function and operator interchangeability. So text_search() is better > > > than @@ and = is better than texteq(). Life ain't neat... > > > > > > Right now, Full Text Search SQL looks like complete gibberish and it > > > dissuades many people from using what is an awesome set of features. I > > > just want to add a little sugar to help people get started. > > > > I realized this when editing the documentation but not clearly. I > > noticed that: > > > > http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING > > > > tsvector @@ tsquery > > tsquery @@ tsvector > > text @@ tsquery > > text @@ text > > > > The first two of these we saw already. The form text @@ tsquery is > > equivalent to to_tsvector(x) @@ y. The form text @@ text is equivalent > > to to_tsvector(x) @@ plainto_tsquery(y). > > > > was quite odd, especially the "text @@ text" case, and in fact it makes > > casting almost required unless you can remember which one is a query and > > which is a vector (hint, the vector is first). What really adds to the > > confusion is that the operator is two _identical_ characters, meaning > > the operator is symetric, and it behave symetric if you cast one side, > > but as vector @@ query if you don't. > > I'm thinking we can have an inlinable function > > contains(text, text) returns int > > Return values limited to just 0 or 1 or NULL, as with SQL/MM. > It's close to SQL/MM, but not exact. > > contains(sourceText, searchText) is a macro for > > case to_tsvector(default_text_search_config, sourceText) @@ > to_tsquery(default_text_search_config, searchText) > when true then 1 > when false then 0 > else null > end > it's look well. Pavel
On Tue, 2007-11-13 at 08:58 +0100, Pavel Stehule wrote: > On 13/11/2007, Simon Riggs <simon@2ndquadrant.com> wrote: > > I'm thinking we can have an inlinable function > > > > contains(text, text) returns int > > > > Return values limited to just 0 or 1 or NULL, as with SQL/MM. > > It's close to SQL/MM, but not exact. > > > > contains(sourceText, searchText) is a macro for > > > > case to_tsvector(default_text_search_config, sourceText) @@ > > to_tsquery(default_text_search_config, searchText) > > when true then 1 > > when false then 0 > > else null > > end > > > > it's look well. I think it needs lot more thought yet. Travelling now, so not able to add further. Will pick up again in next few days. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Am Montag, 12. November 2007 schrieb Martijn van Oosterhout: > On Mon, Nov 12, 2007 at 03:44:18PM -0500, Aidan Van Dyk wrote: > > Can LIKE be easily overloaded in the parser? So: > > text LIKE text > > works in it's current form, and > > tsvector LIKE tsquery > > also works like the @@? Or have I gotten all the ts* types all mixed up > > again... > > AIUI LIKE is mashed into an operator at parse time, so yes, if you > create the operator with the right name it will just work. What we'd need is a way to convert a LIKE pattern into a tsquery ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak index-optimized text search into existing applications. Might be worth a try. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Am Montag, 12. November 2007 schrieb Pavel Stehule: > > Huh, what version of the standard is this? My copy (the typical 2003 > > draft) doesn't have SQL/MM AFAICS. > > I found > > http://jtc1sc32.org/doc/N0751-0800/32N0771T.pdf > http://www.sigmod.org/record/issues/0112/standards.pdf > http://dbs.uni-leipzig.de/file/kap5.pdf I think this standard never actually became one. -- Peter Eisentraut http://developer.postgresql.org/~petere/
"Peter Eisentraut" <peter_e@gmx.net> writes: > What we'd need is a way to convert a LIKE pattern into a tsquery ('%foo%bar%' > => 'foo & bar'). Then you might even be able to sneak index-optimized text > search into existing applications. Might be worth a try. I don't think that's the right direction to go. Notably "%foo%bar%" isn't the same thing as "foo & bar". Also most tsearch queries can't be expressed as LIKE patterns anyways. What I do like is the idea of using LIKE as a model to follow for tsearch. Make some keyword which maps to an operator. So you could do something like WHERE col SATISFIES 'foo & bar' which maps to @@ just as LIKE maps to ~~. This has the interesting property of working for ltree and other data types which support the @@ operator. Now I'm starting to wonder though. How evil would it be to just rename all the @@ operators to ~~ ? That redefines LIKE to be a data-type-specific operation. Effectively declaring that tsquery, lquery, et al are all analogous concepts for LIKE patterns. It's probably safer to come up with a new keyword for @@ and if we ever decide to pursue a regexp data type we could define @@ for that as well which would make that keyword an equivalent to LIKE for regexps. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Am Dienstag, 13. November 2007 schrieb Gregory Stark: > "Peter Eisentraut" <peter_e@gmx.net> writes: > > What we'd need is a way to convert a LIKE pattern into a tsquery > > ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak > > index-optimized text search into existing applications. Might be worth a > > try. > > I don't think that's the right direction to go. Notably "%foo%bar%" isn't > the same thing as "foo & bar". Also most tsearch queries can't be expressed > as LIKE patterns anyways. The requirement is to express LIKE patterns as tsearch queries, not the other way around. -- Peter Eisentraut http://developer.postgresql.org/~petere/
"Peter Eisentraut" <peter_e@gmx.net> writes: > Am Dienstag, 13. November 2007 schrieb Gregory Stark: >> "Peter Eisentraut" <peter_e@gmx.net> writes: >> > What we'd need is a way to convert a LIKE pattern into a tsquery >> > ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak >> > index-optimized text search into existing applications. Might be worth a >> > try. >> >> I don't think that's the right direction to go. Notably "%foo%bar%" isn't >> the same thing as "foo & bar". Also most tsearch queries can't be expressed >> as LIKE patterns anyways. > > The requirement is to express LIKE patterns as tsearch queries, not the other > way around. As an optimization to use GIST indexes this is a cool idea. Much like our existing functionality which converts "col LIKE 'bar%'" to range queries so it can use btree indexes. But it doesn't help the people who find "col @@ 'foo & bar'" confusing. They're just being thrown off because "@@" is an unfamiliar operator. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
"Bruce Momjian" <bruce@momjian.us> writes: > I realized this when editing the documentation but not clearly. I > noticed that: > > http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING > > tsvector @@ tsquery > tsquery @@ tsvector > text @@ tsquery > text @@ text > > The first two of these we saw already. The form text @@ tsquery is > equivalent to to_tsvector(x) @@ y. The form text @@ text is equivalent > to to_tsvector(x) @@ plainto_tsquery(y). > > was quite odd, especially the "text @@ text" case, and in fact it makes > casting almost required unless you can remember which one is a query and > which is a vector (hint, the vector is first). What really adds to the > confusion is that the operator is two _identical_ characters, meaning > the operator is symetric, and it behave symetric if you cast one side, > but as vector @@ query if you don't. I find this odd as well. Effectively what we're doing is rather than defining the casting behaviour in a global way we're defining operators specifically for text which do the casts internally. That seems like a bad idea, especially given that other data types implement @@ operators as well. I feel like the right idea is to throw out all but tsvector @@ tsquery and define casts as necessary to get that to work in every (non-inverted) case above. Actually the only case which wouldn't work with just that is a bare 'foo' @@ 'bar' And even that would work fine until you load _int.sql or ltree.sql which define conflicting operators. Separately I feel like we should name this operator something like ~= or =? or something like that. @@ doesn't look like any kind of equality or matching operator and it looks symmetric. We also already have @@ operators which are right-handed unary operators for geometric data types which this is very different from. I would suggest something like =? PS: I thought of a whacky idea which would look neat but be mainly silly. I thought I would mention it anyways though. If we define a unary postfix operator "text ?" which just casted text to tsquery then define a "text = tsquery" operator which does what @@ does. Then you could write queries like: WHERE col = 'foo & bar' ? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
On Tue, 2007-11-13 at 13:32 +0100, Peter Eisentraut wrote: > Am Dienstag, 13. November 2007 schrieb Gregory Stark: > > "Peter Eisentraut" <peter_e@gmx.net> writes: > > > What we'd need is a way to convert a LIKE pattern into a tsquery > > > ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak > > > index-optimized text search into existing applications. Might be worth a > > > try. > > > > I don't think that's the right direction to go. Notably "%foo%bar%" isn't > > the same thing as "foo & bar". Also most tsearch queries can't be expressed > > as LIKE patterns anyways. > > The requirement is to express LIKE patterns as tsearch queries, not the other > way around. +1 -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On 11/13/07, Peter Eisentraut <peter_e@gmx.net> wrote: > Am Dienstag, 13. November 2007 schrieb Gregory Stark: > > "Peter Eisentraut" <peter_e@gmx.net> writes: > > > What we'd need is a way to convert a LIKE pattern into a tsquery > > > ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak > > > index-optimized text search into existing applications. Might be worth a > > > try. > > I don't think that's the right direction to go. Notably "%foo%bar%" isn't > > the same thing as "foo & bar". Also most tsearch queries can't be expressed > > as LIKE patterns anyways. > The requirement is to express LIKE patterns as tsearch queries, not the other > way around. How? LIKE queries are incapable of expressing word boundaries, do not support substitution, and are implicitly ordered. tsearch queries operate entirely on word boundaries, may substitute words, and are unordered. I don't see the two as even working in the same space, let alone be convertable for optimization purposes. If the idea was just to use a tsearch index as an initial filter, then running LIKE on the results, dictionary-based substitution makes that unreliable.
Trevor Talbot wrote: > On 11/13/07, Peter Eisentraut <peter_e@gmx.net> wrote: > >> Am Dienstag, 13. November 2007 schrieb Gregory Stark: >> >>> "Peter Eisentraut" <peter_e@gmx.net> writes: >>> > > >>>> What we'd need is a way to convert a LIKE pattern into a tsquery >>>> ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak >>>> index-optimized text search into existing applications. Might be worth a >>>> try. >>>> > > >>> I don't think that's the right direction to go. Notably "%foo%bar%" isn't >>> the same thing as "foo & bar". Also most tsearch queries can't be expressed >>> as LIKE patterns anyways. >>> > > >> The requirement is to express LIKE patterns as tsearch queries, not the other >> way around. >> > > How? LIKE queries are incapable of expressing word boundaries, do not > support substitution, and are implicitly ordered. tsearch queries > operate entirely on word boundaries, may substitute words, and are > unordered. > > I don't see the two as even working in the same space, let alone be > convertable for optimization purposes. If the idea was just to use a > tsearch index as an initial filter, then running LIKE on the results, > dictionary-based substitution makes that unreliable. > > > The fact that we are having this discussion at all demonstrates to me that we should leave well alone - any use of LIKE in this context is just about guaranteed to cause massive confusion. (Not to mention that it's far too late in the dev cycle to be making such changes, if we're thinking of them for 8.3). cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > The fact that we are having this discussion at all demonstrates to me > that we should leave well alone - any use of LIKE in this context is > just about guaranteed to cause massive confusion. Also, the semantics of LIKE are perfectly clearly specified by the SQL standard; unless you're willing to ignore the spec, it's simply not that useful for full-text search. Therefore, we have to tell people to use some other API anyway. The existing tsearch2 API at least has the virtue of having been proven in the field over several years. regards, tom lane
> ... Therefore, we have to tell people > to use some other API anyway. The existing tsearch2 API at least has > the virtue of having been proven in the field over several years. I can only speak as a moderately sophisticated end user, but ... I think the tsearch2 API has been "proven" to alienate a lot of potential users, myself included. If the simple things were simple, there might be a large user base that would rebel against an API extension, but I don't think this is the case. And I think the need for a simpler, refactored interface to tsearch is desperate. Granted, one can learn tsearch2 as is, but it is somewhat painful. It isn't the sort of thing one figures out for fun and potential future use, but probably only if one is forced to. If we (well, "you", really) could make tsearch2 less like C++ (or OCAML or FORTH ) and more like Python, we would get a whole lot of new users of tsearch in the process, and probably a whole of good will toward PostgreSQL. -W
On Tue, 2007-11-13 at 08:58 +0100, Pavel Stehule wrote: > On 13/11/2007, Simon Riggs <simon@2ndquadrant.com> wrote: > > I'm thinking we can have an inlinable function > > > > contains(text, text) returns int > > > > Return values limited to just 0 or 1 or NULL, as with SQL/MM. > > It's close to SQL/MM, but not exact. > > > > contains(sourceText, searchText) is a macro for > > > > case to_tsvector(default_text_search_config, sourceText) @@ > > to_tsquery(default_text_search_config, searchText) > > when true then 1 > > when false then 0 > > else null > > end Better idea: in-linable function called create function contains(sourceText text, searchText text, config text) returns boolean as $$ to_tsvector(config, sourceText) @@ to_tsquery(config, searchText); $$ language sql; so that SELECT title FROM pgweb WHERE contains(body, 'a & b', 'english') is an indexable, easily readable way of using full text search. allowing SELECT to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat');?column? ----------t to become SELECT contains('fat cats ate fat rats', 'fat & rat', 'english');?column? ----------t Proposed changes: 1. Add function contains() 2. Alter docs to show use of contains() All other @@ features still the same -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > Proposed changes: > 1. Add function contains() > 2. Alter docs to show use of contains() > > All other @@ features still the same Have you yet given any advantages of contains over @@ ? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
On Wed, Nov 14, 2007 at 07:46:58AM +0000, Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > Proposed changes: > > 1. Add function contains() > > 2. Alter docs to show use of contains() > > > > All other @@ features still the same > > Have you yet given any advantages of contains over @@ ? Familiarity for users of SQL Server that are migrating? ;-) (http://msdn2.microsoft.com/en-us/library/ms187787.aspx) //Magnus
Simon Riggs wrote: > Better idea: > > in-linable function called > > create function > contains(sourceText text, searchText text, config text) returns boolean > as $$ > to_tsvector(config, sourceText) @@ to_tsquery(config, searchText); > $$ language sql; > > so that > > SELECT title > FROM pgweb > WHERE contains(body, 'a & b', 'english') > > is an indexable, easily readable way of using full text search. > > allowing > > SELECT to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); > ?column? > ---------- > t > > to become > > SELECT contains('fat cats ate fat rats', 'fat & rat', 'english'); > ?column? > ---------- > t > > Proposed changes: > 1. Add function contains() > 2. Alter docs to show use of contains() > > All other @@ features still the same One advantage to this syntax is that it allows the specification of the configuration name, which needs a function call to be done with the @@ operator. However, to access a tsvector column we would need to add another function that takes a tsvector value. However, the config specification would apply only to the text column, not the pre-computed tsvector, so that might be confusing. I think this method could use a function index but it wouldn't be very clear from the syntax. The problem I always kept having in clarifying the syntax is that I could never find anything that covered all common use cases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I wrote: > What we'd need is a way to convert a LIKE pattern into a tsquery > ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak > index-optimized text search into existing applications. Might be worth a > try. Here is how this could work: CREATE FUNCTION likepattern_to_tsquery(text) RETURNS tsquery RETURNS NULL ON NULL INPUT IMMUTABLE LANGUAGE SQL AS $$SELECT trim(replace($1, '%', ' & '), '& ')::tsquery; $$; UPDATE pg_operator SET oprname = '#~~#' WHERE oprcode = 'textlike'::regproc; CREATE FUNCTION textlike_ts(text, text) RETURNS boolean RETURNS NULL ON NULL INPUT IMMUTABLE LANGUAGE SQL AS $$ SELECT$1 @@ likepattern_to_tsquery($2) AND $1 #~~# $2; $$; CREATE OPERATOR ~~ ( PROCEDURE = textlike_ts, LEFTARG = text, RIGHTARG = text ); Maybe something like this could be useful for people who cannot readily change their application code. (Of course it is not meant to solve the issue of how to make the text-search functionality itself easier to access.) -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > [ replace LIKE with this: ] > CREATE FUNCTION textlike_ts(text, text) RETURNS boolean > RETURNS NULL ON NULL INPUT IMMUTABLE > LANGUAGE SQL > AS $$ SELECT $1 @@ likepattern_to_tsquery($2) AND $1 #~~# $2; $$; Cute trick, but as-is this will not make anything go any faster, because it doesn't expose any opportunities for indexing the @@ operation. I think what you'd really need is something like $$ SELECT to_tsvector('english', $1) @@ likepattern_to_tsquery($2) AND $1 #~~# $2; $$; which will win if there is an expression index on to_tsvector('english', <textcolumn>). (You can substitute your preferred configuration of course, but you don't get to rely on default_text_search_config, because that would render the expression non-immutable and thus non-indexable.) This points up the same old notational problem that there is no good place in the operator notation to mention which text search configuration you want to use. Simon's suggestion of a three-parameter function at least addresses that issue. regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > Better idea: > create function > contains(sourceText text, searchText text, config text) returns boolean > as $$ > to_tsvector(config, sourceText) @@ to_tsquery(config, searchText); > $$ language sql; I think you have confused "simple query syntax" with "easy to use". The above does make the query notation look nicer, but the query will not actually go fast unless the DBA has made an expression index on to_tsvector('desired config', textcolumn). Thus, in terms of getting people "over the hump" of doing their first decently-performing text search, you haven't reduced the number of concepts to be understood at all; rather, you've added one more. People will still have to understand the complexity that contains() is supposedly hiding. Worse, they'll have to make the connection between two completely different-looking syntaxes every time they look at their schemas. regards, tom lane
Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Better idea: > > > create function > > contains(sourceText text, searchText text, config text) returns boolean > > as $$ > > to_tsvector(config, sourceText) @@ to_tsquery(config, searchText); > > $$ language sql; > > I think you have confused "simple query syntax" with "easy to use". > > The above does make the query notation look nicer, but the query will > not actually go fast unless the DBA has made an expression index on > to_tsvector('desired config', textcolumn). Thus, in terms of getting > people "over the hump" of doing their first decently-performing text > search, you haven't reduced the number of concepts to be understood > at all; rather, you've added one more. People will still have to > understand the complexity that contains() is supposedly hiding. > Worse, they'll have to make the connection between two completely > different-looking syntaxes every time they look at their schemas. Yes, this the same problem we had months ago trying to improve the syntax, that there was no easy syntax that covered all common use cases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Yes, this the same problem we had months ago trying to improve the > syntax, that there was no easy syntax that covered all common use cases. The thing that we keep coming up against is that we'd like queries to be able to depend on default_text_search_config. Quite aside from anyone's personal opinions about whether operator or function syntax is preferable, it would clearly be desirable to be able to say ... WHERE textcolumn @@ 'pattern-constant' and get a fast full-text search that's governed by the current setting of default_text_search_config (of course, it can only be fast if there is an index using that same configuration, but that's a setup detail). It strikes me that now that we have an invalidatable plan cache, we could fix this by having the planner rewrite "textcolumn @@ something" into "to_tsvector(regconfig_constant, textcolumn) @@ something", so long as it marked the resulting plan as needing to be invalidated by any change in the value of default_text_search_config. Once you have that form, it can be matched against an index, and away you go. Too late for 8.3, but seems doable enough for 8.4. regards, tom lane
On 11/14/07, Peter Eisentraut <peter_e@gmx.net> wrote: > I wrote: > > What we'd need is a way to convert a LIKE pattern into a tsquery > > ('%foo%bar%' => 'foo & bar'). Then you might even be able to sneak > > index-optimized text search into existing applications. Might be worth a > > try. > > Here is how this could work: > > CREATE FUNCTION likepattern_to_tsquery(text) RETURNS tsquery [...] But that coversion itself is fundamentally flawed, is the problem. 'foo bar' 'fooandbar' 'barfoo and foobar' '%foo%bar%' matches all 3. 'foo & bar' matches only the first. If the application currently using LIKE actually wants a word-based search, it should probably just convert to using tsearch wholesale, since it doesn't work as intended now. If it actually wants wildcard matching behavior, it can't use tsearch at all.
On Wed, 14 Nov 2007, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> Yes, this the same problem we had months ago trying to improve the >> syntax, that there was no easy syntax that covered all common use cases. > > The thing that we keep coming up against is that we'd like queries to > be able to depend on default_text_search_config. Quite aside from > anyone's personal opinions about whether operator or function syntax > is preferable, it would clearly be desirable to be able to say > > ... WHERE textcolumn @@ 'pattern-constant' > > and get a fast full-text search that's governed by the current setting > of default_text_search_config (of course, it can only be fast if there > is an index using that same configuration, but that's a setup detail). > > It strikes me that now that we have an invalidatable plan cache, > we could fix this by having the planner rewrite "textcolumn @@ something" > into "to_tsvector(regconfig_constant, textcolumn) @@ something", > so long as it marked the resulting plan as needing to be invalidated > by any change in the value of default_text_search_config. Once you > have that form, it can be matched against an index, and away you go. > > Too late for 8.3, but seems doable enough for 8.4. cute, pity it's didn't come up in our early discussion > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Magnus Hagander <magnus@hagander.net> writes: > On Wed, Nov 14, 2007 at 07:46:58AM +0000, Gregory Stark wrote: >> Have you yet given any advantages of contains over @@ ? > Familiarity for users of SQL Server that are migrating? ;-) > (http://msdn2.microsoft.com/en-us/library/ms187787.aspx) That argument would only hold water if we were going to adopt *all* of their syntax for the feature ... and the other choices they've made seem pretty ugly to me. regards, tom lane
Am Mittwoch, 14. November 2007 schrieb Trevor Talbot: > But that coversion itself is fundamentally flawed, is the problem. I know it's incorrect, but with a different parser and/or dictionary you could make it work. In practice, the search pattern will mostly be provided dynamically from some user input, so you could conceivably be able to modify the search patterns more readily than the entire queries in your application. Anyway, it's just an idea for those who need it. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On 11/15/07, Peter Eisentraut <peter_e@gmx.net> wrote: > In practice, the search pattern will mostly be provided dynamically from some > user input, so you could conceivably be able to modify the search patterns > more readily than the entire queries in your application. Anyway, it's just > an idea for those who need it. Ah, I see what you mean, like for a simple web forum that only knows LIKE searches now. It may be easier to adjust the DB to do the intended thing instead of trying to change the entire forum. I wasn't thinking of those narrow cases.
Peter Eisentraut <peter_e@gmx.net> writes: > Am Mittwoch, 14. November 2007 schrieb Trevor Talbot: >> But that coversion itself is fundamentally flawed, is the problem. > I know it's incorrect, but with a different parser and/or dictionary you could > make it work. No, I don't think so. Trevor's killer point is that the token boundaries chosen by the parser --- no matter *what* they are --- might not line up with the substrings needed by a given LIKE pattern. There isn't any mechanism in tsearch that will find the stored word "foobar" if the search is for "foo" & "bar"; nor vice versa. It might be possible to use a pg_trgm index in this way, since AFAICT from the documents pg_trgm just chops up the substrings blindly rather than trying to have smarts about word boundaries. (And on third thought, I suppose you could emulate pg_tgrm with a suitable parser and a lobotomized dictionary ... but it would be pretty wasteful to use the tsearch mechanisms for that.) regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Magnus Hagander <magnus@hagander.net> writes: >> On Wed, Nov 14, 2007 at 07:46:58AM +0000, Gregory Stark wrote: >>> Have you yet given any advantages of contains over @@ ? > >> Familiarity for users of SQL Server that are migrating? ;-) >> (http://msdn2.microsoft.com/en-us/library/ms187787.aspx) > > That argument would only hold water if we were going to adopt *all* of > their syntax for the feature ... and the other choices they've made seem > pretty ugly to me. Well, yes and no. A degree of partial compatibility could still be interesting without the bells and whistles as long as we're compatible as far as we do go. We could always add bells and whistles later if we want. What would be of concern here is that we could be cutting off adding a more fully compatible function. Their syntax for the search pattern is quite different (and more human-readable). If we implement a contains() function which just takes a text and casts it from our text representation to a tsquery then we're blocking ever having a function which does a full conversion from an MSSQL style query string to a tsquery. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Simon Riggs wrote: > Something Tom Dunstan just mentioned has made me ask the question "Why > does our full text search feature look so strange?". It's the > operator-laden syntax that causes the problem. > > By any stretch, this query is difficult for most people to understand: > > SELECT * FROM text_table > WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); > > Wouldn't it be much simpler to just have a function, so we can write > this query like this? > > SELECT * FROM text_table > WHERE text_search('haystack needle haystack', 'needle'); > > We then explain to people that while the above is nice, it will presume > that both the function inputs are Text, which isn't any good for complex > searches, indexing and dictionaries etc.., so then we move to: > > SELECT * FROM text_table > WHERE text_search('haystack needle haystack'::tsvector, > 'needle'::tsquery); > > or perhaps > > SELECT * FROM text_table > WHERE full_text_search('haystack needle haystack', 'needle & hay'); > > which would automatically do the conversions to tsvector and tsquery for > us. No more tedious casting, easy to read. > > [*text_search() functions would return bool] > > So we end up with a normal sounding function that is overloaded to > provide all of the various goodies. We can include the text_search(text, > text) version of the function in the normal chapter on functions, with a > pointer to the more complex stuff elsewhere. > > Sound good? > > We can then explain everything without having to use @@ operators. They > can then be introduced as an option. > > The side benefit of this is that we can then allow our wonderful new > functionality to be more easily usable by things like Hibernate. We just > tell them we have this new function and thats all they need to know. > > I know that under the covers the @@ operator is necessary because we > hang various pieces of optimizer information from it. Each function > signature gets an operator with matching signature, so there's a 1:1 > correspondence in most use cases. So to make this all hang together, > there'd need to be a some smarts that says: if there is only one > operator on a function then use the operator's optimizer information > when you see just the function. That information can be assessed at DDL > time, so we can keep accurate track of operator counts in pgproc. > > An alternative approach might be to make the first operator created on a > function the "primary" operator. All other operators would then be > secondary operators, so that adding operators would not change the > inference mechanism. > > I've not got sufficient knowledge to say how hard the > function-to-operator inference is, but it would be dang useful in making > text search and many other programs readable and easy to interface to. > In the end that is going to mean wider usage of that functionality, with > more people feeling like they can dip their toes into the water. > > I must confess I have insufficient time to do this myself right now, not > least me discovering exactly how. I'm spending time on this now because > I'm the one that has to explain this stuff to people and things like > this can make a huge difference in their understanding and eventual > uptake. > > Thoughts? > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 2007-11-20 at 14:25 -0500, Bruce Momjian wrote: > This has been saved for the 8.4 release: > > http://momjian.postgresql.org/cgi-bin/pgpatches_hold > It isn't a patch, so isn't being held for later review, nor have you added it to the TODO list, so I'm not sure what this means. Would you mind me asking what happens to things on this list? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Tue, 2007-11-20 at 14:25 -0500, Bruce Momjian wrote: > > This has been saved for the 8.4 release: > > > > http://momjian.postgresql.org/cgi-bin/pgpatches_hold > > > > It isn't a patch, so isn't being held for later review, nor have you > added it to the TODO list, so I'm not sure what this means. > > Would you mind me asking what happens to things on this list? It says "saved" meaning we will review the issue for 8.4. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Nov 15, 2007, at 4:41 AM, Gregory Stark wrote: > Ask me about EnterpriseDB's Slony Replication support! Hah, wtf is that all about? :) BTW, looked at londiste? -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 4 Dec 2007 16:41:52 -0600 Decibel! <decibel@decibel.org> wrote: > On Nov 15, 2007, at 4:41 AM, Gregory Stark wrote: > > Ask me about EnterpriseDB's Slony Replication support! > > > Hah, wtf is that all about? :) Stealth marketing :)... Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHVdlPATb/zqfZUUQRAsnFAJ9W8KkxkoijdAUOLsB71kZGmWaxfgCgqat+ GswHcfzZNDkw4i37s9ecy7M= =7x8o -----END PGP SIGNATURE-----
On Dec 4, 2007, at 4:48 PM, Joshua D. Drake wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Tue, 4 Dec 2007 16:41:52 -0600 > Decibel! <decibel@decibel.org> wrote: > >> On Nov 15, 2007, at 4:41 AM, Gregory Stark wrote: >>> Ask me about EnterpriseDB's Slony Replication support! >> >> >> Hah, wtf is that all about? :) > > Stealth marketing :)... Dammit, I need to de-program reply-all from my fingers... But yeah, I thought it funny. :) -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Added to TODO: * Consider a function-based API for '@@' full text searches http://archives.postgresql.org/pgsql-hackers/2007-11/msg00511.php --------------------------------------------------------------------------- Simon Riggs wrote: > Something Tom Dunstan just mentioned has made me ask the question "Why > does our full text search feature look so strange?". It's the > operator-laden syntax that causes the problem. > > By any stretch, this query is difficult for most people to understand: > > SELECT * FROM text_table > WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); > > Wouldn't it be much simpler to just have a function, so we can write > this query like this? > > SELECT * FROM text_table > WHERE text_search('haystack needle haystack', 'needle'); > > We then explain to people that while the above is nice, it will presume > that both the function inputs are Text, which isn't any good for complex > searches, indexing and dictionaries etc.., so then we move to: > > SELECT * FROM text_table > WHERE text_search('haystack needle haystack'::tsvector, > 'needle'::tsquery); > > or perhaps > > SELECT * FROM text_table > WHERE full_text_search('haystack needle haystack', 'needle & hay'); > > which would automatically do the conversions to tsvector and tsquery for > us. No more tedious casting, easy to read. > > [*text_search() functions would return bool] > > So we end up with a normal sounding function that is overloaded to > provide all of the various goodies. We can include the text_search(text, > text) version of the function in the normal chapter on functions, with a > pointer to the more complex stuff elsewhere. > > Sound good? > > We can then explain everything without having to use @@ operators. They > can then be introduced as an option. > > The side benefit of this is that we can then allow our wonderful new > functionality to be more easily usable by things like Hibernate. We just > tell them we have this new function and thats all they need to know. > > I know that under the covers the @@ operator is necessary because we > hang various pieces of optimizer information from it. Each function > signature gets an operator with matching signature, so there's a 1:1 > correspondence in most use cases. So to make this all hang together, > there'd need to be a some smarts that says: if there is only one > operator on a function then use the operator's optimizer information > when you see just the function. That information can be assessed at DDL > time, so we can keep accurate track of operator counts in pgproc. > > An alternative approach might be to make the first operator created on a > function the "primary" operator. All other operators would then be > secondary operators, so that adding operators would not change the > inference mechanism. > > I've not got sufficient knowledge to say how hard the > function-to-operator inference is, but it would be dang useful in making > text search and many other programs readable and easy to interface to. > In the end that is going to mean wider usage of that functionality, with > more people feeling like they can dip their toes into the water. > > I must confess I have insufficient time to do this myself right now, not > least me discovering exactly how. I'm spending time on this now because > I'm the one that has to explain this stuff to people and things like > this can make a huge difference in their understanding and eventual > uptake. > > Thoughts? > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +