Thread: Why is there a tsquery data type?
Why does text search need a tsquery data type? I realize it needs tsvector so it can create indexes and updated trigger columns, but it seems tsquery could instead just be a simple text string. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Why does text search need a tsquery data type? I realize it needs > tsvector so it can create indexes and updated trigger columns, but it > seems tsquery could instead just be a simple text string. By that logic, we don't need any data types other than text. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Why does text search need a tsquery data type? I realize it needs > > tsvector so it can create indexes and updated trigger columns, but it > > seems tsquery could instead just be a simple text string. > > By that logic, we don't need any data types other than text. What is tsquery giving us that text would not? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> By that logic, we don't need any data types other than text. > What is tsquery giving us that text would not? A preprocessed representation that can be compared to tsvector efficiently. Now, if you'd asked whether we need *both* tsvector and tsquery, that'd be a fair question. I'm not 100% clear on what the differences are, but they seem pretty subtle. Do you think that having only one datatype would be clearer? regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> By that logic, we don't need any data types other than text. > > > What is tsquery giving us that text would not? > > A preprocessed representation that can be compared to tsvector > efficiently. > > Now, if you'd asked whether we need *both* tsvector and tsquery, > that'd be a fair question. I'm not 100% clear on what the differences > are, but they seem pretty subtle. Do you think that having only > one datatype would be clearer? There is no question things would be clearer with only one text search data type. The only value I can see to having a tsquery data type is that you can store a tsquery value in a column, but why would that be much better than just storing it in a TEXT field? Internally I assume you would have to generate a tsquery structure from a TEXT string, so the idea of a query representation wouldn't go away; it would just be internal. The one thing we would lose is the ability to process the query string with a named configuration. If we always cast to TEXT, I assume we would always be using "default_text_search_config", and I am a little worried about queries in triggers that have to wire-down the configuration name. As I understand it the tsquery goes through the configuration just like the tsvector. Right now you can already do: 'query' @@ 'boy girl'::tsvector and the system casts your text string to tsquery automatically. Perhaps we just need to minimize tsquery in the documentation and mention its special purpose. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > There is no question things would be clearer with only one text search > data type. The only value I can see to having a tsquery data type is > that you can store a tsquery value in a column, but why would that be > much better than just storing it in a TEXT field? When you try storing a tsquery in a column does it alert you if you have an invalid syntax at that point? Storing it as text would mean not finding out until you try to use the query. Is converting a text query into the internal format faster or less memory intensive than converting text into the internal representation? When you run a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then '...' would have to be parsed over and over again for each row. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > > There is no question things would be clearer with only one text search > > data type. The only value I can see to having a tsquery data type is > > that you can store a tsquery value in a column, but why would that be > > much better than just storing it in a TEXT field? > > When you try storing a tsquery in a column does it alert you if you have an > invalid syntax at that point? Storing it as text would mean not finding out > until you try to use the query. Yes it does check syntax: test=> select 'lkjadsf kjfdsa'::tsquery;ERROR: syntax error in tsearch query: "lkjadsf kjfdsa" A larger question is how many people store queries in the database to make it worth the complexity. > Is converting a text query into the internal format faster or less memory > intensive than converting text into the internal representation? When you run > a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then > '...' would have to be parsed over and over again for each row. No, internally the TEXT string would be converted to something the system could deal with for that query, which is probably what 99% of all queries are going to do anyway by calling to_tsquery(). -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > Gregory Stark wrote: >> "Bruce Momjian" <bruce@momjian.us> writes: >> >> > There is no question things would be clearer with only one text search >> > data type. The only value I can see to having a tsquery data type is >> > that you can store a tsquery value in a column, but why would that be >> > much better than just storing it in a TEXT field? >> >> When you try storing a tsquery in a column does it alert you if you have an >> invalid syntax at that point? Storing it as text would mean not finding out >> until you try to use the query. > > Yes it does check syntax: > > test=> select 'lkjadsf kjfdsa'::tsquery; > ERROR: syntax error in tsearch query: "lkjadsf kjfdsa" > > A larger question is how many people store queries in the database to > make it worth the complexity. So would this still happen if you didn't have a tsquery type? Or would it throw the error only when the actual matching operator executed and tried to parse the text? >> Is converting a text query into the internal format faster or less memory >> intensive than converting text into the internal representation? When you run >> a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then >> '...' would have to be parsed over and over again for each row. > > No, internally the TEXT string would be converted to something the > system could deal with for that query, which is probably what 99% of all > queries are going to do anyway by calling to_tsquery(). How would that happen if there wasn't a tsquery type? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > > Gregory Stark wrote: > >> "Bruce Momjian" <bruce@momjian.us> writes: > >> > >> > There is no question things would be clearer with only one text search > >> > data type. The only value I can see to having a tsquery data type is > >> > that you can store a tsquery value in a column, but why would that be > >> > much better than just storing it in a TEXT field? > >> > >> When you try storing a tsquery in a column does it alert you if you have an > >> invalid syntax at that point? Storing it as text would mean not finding out > >> until you try to use the query. > > > > Yes it does check syntax: > > > > test=> select 'lkjadsf kjfdsa'::tsquery; > > ERROR: syntax error in tsearch query: "lkjadsf kjfdsa" > > > > A larger question is how many people store queries in the database to > > make it worth the complexity. > > So would this still happen if you didn't have a tsquery type? Or would it > throw the error only when the actual matching operator executed and tried to > parse the text? Well, if you didn't have a tsquery data type then you would get the error when the TEXT was cast to tsquery for the search itself. > >> Is converting a text query into the internal format faster or less memory > >> intensive than converting text into the internal representation? When you run > >> a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then > >> '...' would have to be parsed over and over again for each row. > > > > No, internally the TEXT string would be converted to something the > > system could deal with for that query, which is probably what 99% of all > > queries are going to do anyway by calling to_tsquery(). > > How would that happen if there wasn't a tsquery type? There is an internal C structure which holds the tsquery information. My guess is that we would internally have something like tsquery but it wouldn't be user-visible perhaps. Right now I am a little confused about how to do this and keep the data-type-independent nature of the backend. You are right that we might have to end up re-parsing the TEXT field every time it hits the @@ operator, which is a pain. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Gregory Stark <stark@enterprisedb.com> writes: > How would that happen if there wasn't a tsquery type? I don't think Bruce is suggesting that the search operand can really be plain text (or if he is, he's nuts). The question here is whether there's really a need for a distinction between tsvector and tsquery datatypes; could we have tsvector serve both purposes instead? I can see that there are differences: tsquery can tell the difference between "x AND y" and "x OR y", whereas tsvector just knows "x, y". A superset datatype that can do both is certainly possible, but whether it's practical, or would be easier to use than the current design, I dunno. Perhaps a suitable analogy is regexp pattern matching. Traditionally regexps are conceived of as strings, but if they'd originated in more strongly typed languages than they did, they'd certainly be thought of as a distinct data type. Had we implemented ~ as taking a right operand of type 'regexp', we could win on a number of levels: entry-time syntax checking for regexps, and a precompiled internal representation, for instance. For regexps it seems clear to me that the target text string and the pattern really are different datatypes, and fuzzing that distinction is not an improvement. regards, tom lane
Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > How would that happen if there wasn't a tsquery type? > > I don't think Bruce is suggesting that the search operand can really > be plain text (or if he is, he's nuts). The question here is whether Yes, that is what I was suggesting because as I mentioned TEXT already functions fine as tsquery. > there's really a need for a distinction between tsvector and tsquery > datatypes; could we have tsvector serve both purposes instead? > > I can see that there are differences: tsquery can tell the difference > between "x AND y" and "x OR y", whereas tsvector just knows "x, y". > A superset datatype that can do both is certainly possible, but whether > it's practical, or would be easier to use than the current design, > I dunno. Because of the special behavior of & and |, I assume tsquery and tsvector cannot be the same data type. > Perhaps a suitable analogy is regexp pattern matching. Traditionally > regexps are conceived of as strings, but if they'd originated in more > strongly typed languages than they did, they'd certainly be thought > of as a distinct data type. Had we implemented ~ as taking a right > operand of type 'regexp', we could win on a number of levels: entry-time > syntax checking for regexps, and a precompiled internal representation, > for instance. For regexps it seems clear to me that the target text > string and the pattern really are different datatypes, and fuzzing that > distinction is not an improvement. Yes, this is a good analogy. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 30 Aug 2007, Bruce Momjian wrote: > Gregory Stark wrote: >> "Bruce Momjian" <bruce@momjian.us> writes: >> >>> There is no question things would be clearer with only one text search >>> data type. The only value I can see to having a tsquery data type is >>> that you can store a tsquery value in a column, but why would that be >>> much better than just storing it in a TEXT field? >> >> When you try storing a tsquery in a column does it alert you if you have an >> invalid syntax at that point? Storing it as text would mean not finding out >> until you try to use the query. > > Yes it does check syntax: > > test=> select 'lkjadsf kjfdsa'::tsquery; > ERROR: syntax error in tsearch query: "lkjadsf kjfdsa" > > A larger question is how many people store queries in the database to > make it worth the complexity. you forget about very powerfull query rewriting, which is table driven > >> Is converting a text query into the internal format faster or less memory >> intensive than converting text into the internal representation? When you run >> a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then >> '...' would have to be parsed over and over again for each row. > > No, internally the TEXT string would be converted to something the > system could deal with for that query, which is probably what 99% of all > queries are going to do anyway by calling to_tsquery(). > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> Why does text search need a tsquery data type? I realize it needs >> tsvector so it can create indexes and updated trigger columns, but it >> seems tsquery could instead just be a simple text string. > > By that logic, we don't need any data types other than text. > Could similar logic argue that we'd want special types for regular expressions too? That seems quite parallel to the tsquery type to me.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Could similar logic argue that we'd want special types for regular > expressions too? That seems quite parallel to the tsquery type to me. Yeah, it certainly seems like something we might want to consider in future --- we could get rid of that klugy cache for compiled regexps, for one thing. An implicit cast from text to the regexp type would cover backwards compatibility issues, I think. regards, tom lane
Added to TODO: * Consider a special data type for regular expressions http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php --------------------------------------------------------------------------- Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > > Could similar logic argue that we'd want special types for regular > > expressions too? That seems quite parallel to the tsquery type to me. > > Yeah, it certainly seems like something we might want to consider in > future --- we could get rid of that klugy cache for compiled regexps, > for one thing. An implicit cast from text to the regexp type would > cover backwards compatibility issues, I think. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +