Thread: tsearch comments
Hi, I was browsing the archives for anything related to tsearch and stumbled across a post from Christopher Kings-Lynne in Nov 2002. He suggested to have txt2txtidx as an index function, but he had some problems implementing his idea. But his idea works very fine. Just apply the attached patch (against 7.3.1), install tsearch as normal, and then CREATE INDEX my_test_idx ON mytable using gist(txt2txtidx(mycolumn)); assuming "mycolumn" is a text-type column. Then you can search "mycolumn" with SELECT * FROM mytable WHERE txt2txtidx(mycolumn) ## 'patch&gist'; This method has several advantages: - you don't have to extend your table with another column - pg_dump's output won't be bloated - you don't have to use a special trigger, the index keeps itself up to date! Besides, we tested tsearch also with arabian and chinese input, and it works perfectly with unicode databases as well (although it must be the exact same phrase, since the builtin stemmer cannot stem chinese :). Oleg: Can you search your memory why txt2txtidx possibly should not be marked as "iscachable"? The rest: Is there perhaps a problem with this approach that I am not aware of? Regards, Bjoern
Attachment
> Oleg: Can you search your memory why txt2txtidx possibly should not be > marked as "iscachable"? Well I now see where the problem is, the function does not always have to return the same if the arguments are the same (the data might have changed and there may be more or less matching entries). Any hints how to overcome this? Regards, Bjoern
On Wed, 22 Jan 2003, [iso-8859-1] BjЖrn Metzdorf wrote: > > Oleg: Can you search your memory why txt2txtidx possibly should not be > > marked as "iscachable"? > > Well I now see where the problem is, the function does not always have to > return the same if the arguments are the same (the data might have changed > and there may be more or less matching entries). Any hints how to overcome > this? No way, Bjoern. Think about stemming, for example. > > Regards, > Bjoern > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> > Well I now see where the problem is, the function does not always have to > > return the same if the arguments are the same (the data might have changed > > and there may be more or less matching entries). Any hints how to overcome > > this? > > No way, Bjoern. Think about stemming, for example. Do index functions always have to be marked "iscachable" ? Btw, the error message was confusing. It said that the function has to be marked "isImmutable", but there is no such attribute, instead "iscachable" had to be used. This seems to be a bug. Regards, Bjoern
> Do index functions always have to be marked "iscachable" ? > > Btw, the error message was confusing. It said that the function has to be > marked "isImmutable", but there is no such attribute, instead "iscachable" > had to be used. This seems to be a bug. The manual says: "All functions and operators used in an index definition must be immutable, that is, their results must depend only on their input arguments and never on any outside influence (such as the contents of another table or the current time). This restriction ensures that the behavior of the index is well-defined. To use a user-defined function in an index, remember to mark the function immutable when you create it." Well, in the tsearch case the results don't depend on any influence outside of the function. The stemmer is integrated in the function and we don't use the current time nor another table. So it should be safe to mark that function "iscachable" or when it is fixed "isimmutable", am I right? Regards, Bjoern
Bjorn, you'll get problem with your approach not to create additional columns if index will not used ! In our approach sequential scan will use txtidx column. I don't understand what error you're comment. Probably I miss something. We are working on new version of tsearch and alpha version will be available next week. Are you sure 'iscacheability' should be configureable somehow ? Teodor, what do you think ? Oleg On Thu, 23 Jan 2003, Bjorn Metzdorf wrote: > > Do index functions always have to be marked "iscachable" ? > > > > Btw, the error message was confusing. It said that the function has to be > > marked "isImmutable", but there is no such attribute, instead "iscachable" > > had to be used. This seems to be a bug. > > The manual says: > > "All functions and operators used in an index definition must be immutable, > that is, their results must depend only on their input arguments and never > on any outside influence (such as the contents of another table or the > current time). This restriction ensures that the behavior of the index is > well-defined. To use a user-defined function in an index, remember to mark > the function immutable when you create it." > > Well, in the tsearch case the results don't depend on any influence outside > of the function. The stemmer is integrated in the function and we don't use > the current time nor another table. So it should be safe to mark that > function "iscachable" or when it is fixed "isimmutable", am I right? > > Regards, > Bjoern > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> txtidx column. I don't understand what error you're comment. I think "iscachable" and "isimmutable" are mixed up. "CREATE FUNTION" does not accept "isimmutable". > Probably I miss something. We are working on new version of tsearch > and alpha version will be available next week. Are you sure Great news! Any comments on the new version? Snowball support? txt2txtidx without morphology? Regards, Bjoern
> > Yes, I don't see that problem either. The locale and the configuration of > > dictionaries do not change while running the postmaster, do they? > I forgot parser too. > > Parser and dictionaries will be changed after gmake install (contrib/tsearch) > :), without restarting postmaster. > > In common case, we can't set iscachable option to txt2txtidx, but if you know > what you do you can set it. May be > some comments must be included in README. Ok, but then it should be sufficient to recreate the txt2txtidx function and/or the index after a change of parser and dictionaries. So generally spoken, the index function approach to tsearch works, if you take care of that. I gladly recreate the function and/or index from time to time, if I can do without that additional column, bloated dump and slow trigger. This is very good news, as this is a very easy approach to having an easy to use fulltextsearch in postgresql. > > Great news! Any comments on the new version? Snowball support? txt2txtidx > > without morphology? > Snowball, ISpell.... > SQL-level for configure parsers and dictionaries. > SQL-level for configure map (which dictionary for each type of lexem). > Just wait :) Wow, sounds great! Regards, Bjoern
As someone who is just getting started with PostygreSQL from years working with MySQL, it appears that the timestamp data type does not behave in the way it did with MySQL. I got used to just defining a column as a timestamp and letting the database throw the latest time stamp in there whenever a row was updated. Is there anything simular in PosgreSQL? How can I accomplish something simular inside the database, or am I stuck populating the field in some manner as in the following example update blah blah blah timestamp = NOW() Thanks, Alan
> As someone who is just getting started with PostygreSQL from years working > with MySQL, it appears that the timestamp data type does not behave in the > way it did with MySQL. I got used to just defining a column as a timestamp > and letting the database throw the latest time stamp in there whenever a row > was updated. Is there anything simular in PosgreSQL? How can I accomplish > something simular inside the database, or am I stuck populating the field in > some manner as in the following example There is no such datatype in postgresql. If you just need the current time inserted on INSERT and not on UPDATE, then you can declare that column with "default now()". Else you will need to install a trigger, then you can have exactly the same behaviour as with mysql. Regards, Bjoern
> Ok, but then it should be sufficient to recreate the txt2txtidx function > and/or the index after a change of parser and dictionaries. So generally > spoken, the index function approach to tsearch works, if you take care of > that. I gladly recreate the function and/or index from time to time, if I > can do without that additional column, bloated dump and slow trigger. > > This is very good news, as this is a very easy approach to having an easy to > use fulltextsearch in postgresql. I talked a bit more with Oleg and Teodor about this index function approach and we came to the conclusion that it is safe to use (if you take care of the above), but it might be a bit slower than the original column based approach. That it because the used operators are defined with RECHECK, and with the index function approach the RECHECK is against an (expensive) function instead of "raw" data in a column. Btw. the "iscachable" has another advantage, it seems to really speed up the search. I have done tests again 250000 entries, mostly nicknames based on fantasy (so the english stemmer has not much to do), and the first search for a name takes about 0.5 - 2.5 seconds whereas all subsequent searches for the same name (even with fresh inserted data) take less than 0.1 seconds. Or does this have nothing to do with "iscachable"? Regards, Bjoern
"Bjorn Metzdorf" <bm@turtle-entertainment.de> writes: > Btw, the error message was confusing. It said that the function has to be > marked "isImmutable", but there is no such attribute, instead "iscachable" > had to be used. This seems to be a bug. Actually the preferred syntax in 7.3 is CREATE FUNCTION ... LANGUAGE foo IMMUTABLE For awhile during 7.3 development you had to write WITH (isImmutable) but we changed the syntax to be more SQL-spec-compatible. This error message seems not to have gotten fixed --- thanks for pointing it out. regards, tom lane
> As someone who is just getting started with PostygreSQL from years working > with MySQL, it appears that the timestamp data type does not behave in the > way it did with MySQL. Much as I like MySQL, it can sometimes be a little *too* helpful. > I got used to just defining a column as a timestamp > and letting the database throw the latest time stamp in there whenever a > row > was updated. Is there anything simular in PosgreSQL? When you create the table do something like: CREATE TABLE foo ( bar timestamp DEFAULT now(), ... ); You can of course do this with any column-type and value. See the SQL-reference for details. - Richard Huxton
On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote: > As someone who is just getting started with PostygreSQL from > years working with MySQL, it appears that the timestamp data > type does not behave in the way it did with MySQL. I got used > to just defining a column as a timestamp and letting the > database throw the latest time stamp in there whenever a row > was updated. Is there anything simular in PosgreSQL? How can I > accomplish something simular inside the database, or am I stuck > populating the field in some manner as in the following example > > update blah blah blah timestamp = NOW() triggers or rules can do that with any timestamp (or timestamp(0) -- no partial-seconds) field. here's a "rules" approach: create table _something ( id serial, dat text, freshened timestamp(0), primary key ( id ) ); create view something as select id, dat, freshened from _something; create rule something_add as on insert to something do instead ( insert into _something ( --id, dat, freshened ) values ( --let id take care of itself, NEW.dat, current_timestamp ); ); create rule something_edit as on update to something do instead ( update _something set --id = leave it alone, dat = NEW.dat, freshened = current_timestamp where id = NEW.id ; ); then you can just insert into something (dat) values ('yada yada'); update something set dat = 'here we go' where id = 23978; and "freshened" takes care of itself. -- There are 10 kinds of people: ones that get binary, and ones that don't. will@serensoft.com http://sourceforge.net/projects/newbiedoc -- we need your brain! http://www.dontUthink.com/ -- your brain needs us! Looking for a firewall? Do you think smoothwall sucks? You're probably right... Try the folks at http://clarkconnect.org/ !
Of course the proposed solution solves the INSERTs....you can also set a trigger that on UPDATE set the value to 'now()' dev@archonet.com wrote: >>As someone who is just getting started with PostygreSQL from years working >>with MySQL, it appears that the timestamp data type does not behave in the >>way it did with MySQL. >> >> > >Much as I like MySQL, it can sometimes be a little *too* helpful. > > > >>I got used to just defining a column as a timestamp >>and letting the database throw the latest time stamp in there whenever a >>row >>was updated. Is there anything simular in PosgreSQL? >> >> > >When you create the table do something like: > >CREATE TABLE foo ( > bar timestamp DEFAULT now(), > ... >); > >You can of course do this with any column-type and value. See the >SQL-reference for details. > >- Richard Huxton > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > >
> Yes, I don't see that problem either. The locale and the configuration of > dictionaries do not change while running the postmaster, do they? I forgot parser too. Parser and dictionaries will be changed after gmake install (contrib/tsearch) :), without restarting postmaster. In common case, we can't set iscachable option to txt2txtidx, but if you know what you do you can set it. May be some comments must be included in README. Björn Metzdorf wrote: >>txtidx column. I don't understand what error you're comment. > > > I think "iscachable" and "isimmutable" are mixed up. "CREATE FUNTION" does > not accept "isimmutable". > > >>Probably I miss something. We are working on new version of tsearch >>and alpha version will be available next week. Are you sure > > > Great news! Any comments on the new version? Snowball support? txt2txtidx > without morphology? Snowball, ISpell.... SQL-level for configure parsers and dictionaries. SQL-level for configure map (which dictionary for each type of lexem). Just wait :) -- Teodor Sigaev teodor@stack.net
One word of caution, I *think* you want quotes around the 'now()' statement in your table definition. Otherwise your default value will be the instant the *table* was created, not the instant your insert happened. I too had an adjustment period when switching from MySQL to postgres. However, I think you'll find that if you use a RULE to implement this feature you will soon become addicted to PostgreSQL's advanced feature set. You can learn more about rules by going to http://www.ca.postgresql.org/docs/aw_pgsql_book/node124.html where you can find a brief description and example, or for more detail, you can go to http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/rules-insert .html I attained all of my database knowledge through hands on experience; much of it on MySQL. Switching to PostgreSQL was somewhat traumatic because it has a lot more power and a lot more features. Once I got a handle on things like Views, Triggers and Rules, I have become somewhat dependant on them and I haven't been able to use MySQL for anything more than the most basic of applications. I guess that's just a warning... There may be no turning back. -- Matthew Nuzum www.bearfruit.org cobalt@bearfruit.org > -----Original Message----- > From: Björn Metzdorf [mailto:bm@turtle-entertainment.de] > Sent: Thursday, January 23, 2003 7:32 AM > To: Alan T. Miller; pgsql-general@postgresql.org > Subject: Re: I was spoiled by the MySQL timestamp field > > > As someone who is just getting started with PostygreSQL from years > working > > with MySQL, it appears that the timestamp data type does not behave in > the > > way it did with MySQL. I got used to just defining a column as a > timestamp > > and letting the database throw the latest time stamp in there whenever a > row > > was updated. Is there anything simular in PosgreSQL? How can I > accomplish > > something simular inside the database, or am I stuck populating the > field > in > > some manner as in the following example > > There is no such datatype in postgresql. If you just need the current time > inserted on INSERT and not on UPDATE, then you can declare that column > with > "default now()". Else you will need to install a trigger, then you can > have > exactly the same behaviour as with mysql. > > Regards, > Bjoern
postgreSQL actually has a better implementation of timestamps.... In mySQL -- You can have only 1 timestamp field... In postgreSQL -- you can have as many "create_dt" timestamp default 'now()' fields as you want and on an insert -- they ALL get stamped whereas in mySQL --ONLY 1 COLUMN gets updated.... ""Alan T. Miller"" <amiller@hollywood101.com> wrote in message news:003a01c2c2da$a90d10a0$6e01a8c0@webdev... > As someone who is just getting started with PostygreSQL from years working > with MySQL, it appears that the timestamp data type does not behave in the > way it did with MySQL. I got used to just defining a column as a timestamp > and letting the database throw the latest time stamp in there whenever a row > was updated. Is there anything simular in PosgreSQL? How can I accomplish > something simular inside the database, or am I stuck populating the field in > some manner as in the following example > > update blah blah blah timestamp = NOW() > > Thanks, > > Alan > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Why not just use a DEFAULT NOW()? ie: CREATE TABLE blah ( time TIMESTAMP NOT NULL DEFAULT NOW() ); I know it works for DATETIME types, don't know about TIMESTAMP but I assume it would be the same. Luke. ----- Original Message ----- From: "will trillich" <will@serensoft.com> To: <pgsql-general@postgresql.org> Sent: Friday, January 24, 2003 7:13 AM Subject: Re: [GENERAL] I was spoiled by the MySQL timestamp field > On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote: > > As someone who is just getting started with PostygreSQL from > > years working with MySQL, it appears that the timestamp data > > type does not behave in the way it did with MySQL. I got used > > to just defining a column as a timestamp and letting the > > database throw the latest time stamp in there whenever a row > > was updated. Is there anything simular in PosgreSQL? How can I > > accomplish something simular inside the database, or am I stuck > > populating the field in some manner as in the following example > > > > update blah blah blah timestamp = NOW() > > triggers or rules can do that with any timestamp (or timestamp(0) > -- no partial-seconds) field. here's a "rules" approach: > > create table _something ( > id serial, > dat text, > freshened timestamp(0), > primary key ( id ) > ); > > create view something as > select > id, > dat, > freshened > from > _something; > > create rule something_add as > on insert to something > do instead ( > insert into _something ( > --id, > dat, > freshened > ) values ( > --let id take care of itself, > NEW.dat, > current_timestamp > ); > ); > > create rule something_edit as > on update to something > do instead ( > update _something set > --id = leave it alone, > dat = NEW.dat, > freshened = current_timestamp > where > id = NEW.id > ; > ); > > then you can just > > insert into something (dat) values ('yada yada'); > update something set dat = 'here we go' where id = 23978; > > and "freshened" takes care of itself. > > -- > There are 10 kinds of people: > ones that get binary, and ones that don't. > > will@serensoft.com > http://sourceforge.net/projects/newbiedoc -- we need your brain! > http://www.dontUthink.com/ -- your brain needs us! > > Looking for a firewall? Do you think smoothwall sucks? You're > probably right... Try the folks at http://clarkconnect.org/ ! > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > >
See http://www.brasileiro.net:8080/postgres/cookbook/view-one-recipe.adp?recipe_id=10 Matthew Nuzum wrote: > One word of caution, I *think* you want quotes around the 'now()' > statement in your table definition. Otherwise your default value will be > the instant the *table* was created, not the instant your insert > happened. > > I too had an adjustment period when switching from MySQL to postgres. > However, I think you'll find that if you use a RULE to implement this > feature you will soon become addicted to PostgreSQL's advanced feature > set. > > You can learn more about rules by going to > http://www.ca.postgresql.org/docs/aw_pgsql_book/node124.html where you > can find a brief description and example, or for more detail, you can go > to > http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/rules-insert > .html > > I attained all of my database knowledge through hands on experience; > much of it on MySQL. Switching to PostgreSQL was somewhat traumatic > because it has a lot more power and a lot more features. Once I got a > handle on things like Views, Triggers and Rules, I have become somewhat > dependant on them and I haven't been able to use MySQL for anything more > than the most basic of applications. I guess that's just a warning... > There may be no turning back. > > -- > Matthew Nuzum > www.bearfruit.org > cobalt@bearfruit.org > =20 > > >>-----Original Message----- >>From: Bj=F6rn Metzdorf [mailto:bm@turtle-entertainment.de] >>Sent: Thursday, January 23, 2003 7:32 AM >>To: Alan T. Miller; pgsql-general@postgresql.org >>Subject: Re: I was spoiled by the MySQL timestamp field >>=20 >> >>>As someone who is just getting started with PostygreSQL from years >> >>working >> >>>with MySQL, it appears that the timestamp data type does not behave > > in > >>the >> >>>way it did with MySQL. I got used to just defining a column as a >> >>timestamp >> >>>and letting the database throw the latest time stamp in there > > whenever a > >>row >> >>>was updated. Is there anything simular in PosgreSQL? How can I >> >>accomplish >> >>>something simular inside the database, or am I stuck populating the >> >>field >>in >> >>>some manner as in the following example >> >>=20 >>There is no such datatype in postgresql. If you just need the current > > time > >>inserted on INSERT and not on UPDATE, then you can declare that column >>with >>"default now()". Else you will need to install a trigger, then you can >>have >>exactly the same behaviour as with mysql. >>=20 >>Regards, >>Bjoern > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
On Fri, Jan 24, 2003 at 09:57:04AM +1300, Luke Pascoe wrote: > Why not just use a DEFAULT NOW()? > ie: > CREATE TABLE blah ( > time TIMESTAMP NOT NULL DEFAULT NOW() > ); > > I know it works for DATETIME types, don't know about TIMESTAMP > but I assume it would be the same. great for INSERT, but rather inert on UPDATE. create table _gribnif ( created timestamp default current_timestamp, modified timestamp, other text, ); create view gribnif ... create rule gribnif_edit as on update to gribnif do instead ( update _gribnif set MODIFIED = CURRENT_TIMESTAMP, other = NEW.other where id = NEW.id ; ); -- There are 10 kinds of people: ones that get binary, and ones that don't. will@serensoft.com http://sourceforge.net/projects/newbiedoc -- we need your brain! http://www.dontUthink.com/ -- your brain needs us! Looking for a firewall? Do you think smoothwall sucks? You're probably right... Try the folks at http://clarkconnect.org/ !
At 03:02 PM 1/23/03 -0500, Matthew Nuzum wrote: >One word of caution, I *think* you want quotes around the 'now()' >statement in your table definition. Otherwise your default value will be >the instant the *table* was created, not the instant your insert >happened. Would plain: default 'now' work? If it does (it seems to), what is the difference? Link.
What's the difference between 'now()' and 'current_timestamp' ? will trillich wrote: > > On Fri, Jan 24, 2003 at 09:57:04AM +1300, Luke Pascoe wrote: > > Why not just use a DEFAULT NOW()? > > ie: > > CREATE TABLE blah ( > > time TIMESTAMP NOT NULL DEFAULT NOW() > > ); > > > > I know it works for DATETIME types, don't know about TIMESTAMP > > but I assume it would be the same. > > great for INSERT, but rather inert on UPDATE. > > create table _gribnif ( > created timestamp default current_timestamp, > modified timestamp, > other text, > ); > > create view gribnif ... > > create rule gribnif_edit as > on update to gribnif > do instead ( > update _gribnif set > MODIFIED = CURRENT_TIMESTAMP, > other = NEW.other > where > id = NEW.id > ; > ); > > -- > There are 10 kinds of people: > ones that get binary, and ones that don't. > > will@serensoft.com > http://sourceforge.net/projects/newbiedoc -- we need your brain! > http://www.dontUthink.com/ -- your brain needs us! > > Looking for a firewall? Do you think smoothwall sucks? You're > probably right... Try the folks at http://clarkconnect.org/ ! > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Carpe Dancem ;-) ----------------------------------------------------------------- Remember your friends while they are alive ----------------------------------------------------------------- Sincerely, Dennis Gearon
Dennis Gearon <gearond@cvc.net> writes: > What's the difference between 'now()' and 'current_timestamp' ? None AFAIK. now() is historical Postgres usage, CURRENT_TIMESTAMP is the SQL-spec-mandated spelling of the same functionality. CURRENT_TIMESTAMP with an argument does something a bit different, though. http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT regards, tom lane
On Fri, Jan 24, 2003 at 10:55:26AM -0500, Tom Lane wrote: > Dennis Gearon <gearond@cvc.net> writes: > > What's the difference between 'now()' and 'current_timestamp' ? > > None AFAIK. now() is historical Postgres usage, CURRENT_TIMESTAMP is > the SQL-spec-mandated spelling of the same functionality. > > CURRENT_TIMESTAMP with an argument does something a bit different, > though. > > http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT the one caveat is outlined at the bottom of that very page: All the date/time data types also accept the special literal value now to specify the current date and time. Thus, the following three all return the same result: SELECT CURRENT_TIMESTAMP; SELECT now(); SELECT TIMESTAMP 'now'; Note: You do not want to use the third form when specifying a DEFAULT clause while creating a table. The system will convert now to a timestamp as soon as the constant is parsed, so that when the default value is needed, the ##TIME OF THE TABLE CREATION## would be used! The first two forms will not be evaluated until the default value is used, because they are function calls. Thus they will give the desired behavior of defaulting to the time of row insertion. [emphasis mine] so careful using "now" when you mean to use "now()". or better yet, "current_timestamp" for consistency. -- There are 10 kinds of people: ones that get binary, and ones that don't. will@serensoft.com http://sourceforge.net/projects/newbiedoc -- we need your brain! http://www.dontUthink.com/ -- your brain needs us! Looking for a firewall? Do you think smoothwall sucks? You're probably right... Try the folks at http://clarkconnect.org/ !
The difference is when now() gets interpreted into a date. Someone please correct me if I'm wrong... If you create a table at 15:00 Jan 26, 2003 and you use an unquoted now() as the default value for a field, then each record will have it's default value as 15:00 Jan 26, 2003. If you quote the now(), then the default value for each newly created record will be now() which is interpreted at the moment the record is created. I've used different rdbms so please forgive me if I'm documenting the characteristics of a different system. -- Matthew Nuzum www.bearfruit.org cobalt@bearfruit.org > -----Original Message----- > From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my] > Sent: Friday, January 24, 2003 4:47 AM > To: Matthew Nuzum; 'Björn Metzdorf'; 'Alan T. Miller'; pgsql- > general@postgresql.org > Subject: Re: [GENERAL] I was spoiled by the MySQL timestamp field > > At 03:02 PM 1/23/03 -0500, Matthew Nuzum wrote: > > >One word of caution, I *think* you want quotes around the 'now()' > >statement in your table definition. Otherwise your default value will be > >the instant the *table* was created, not the instant your insert > >happened. > > Would plain: default 'now' work? If it does (it seems to), what is the > difference? > > Link.
"Matthew Nuzum" <cobalt@bearfruit.org> writes: > The difference is when now() gets interpreted into a date. Someone > please correct me if I'm wrong... > If you create a table at 15:00 Jan 26, 2003 and you use an unquoted > now() as the default value for a field, then each record will have it's > default value as 15:00 Jan 26, 2003. You're wrong... now() is a function call and will not be folded into a constant. AFAIR, the only case that does get folded to a constant is ... mycol timestamp default timestamp 'now' since "timestamp 'now'" is Postgres' notation for a literal constant of a specific datatype (timestamp in this case). The string 'now' is immediately fed to the timestamp datatype's input converter, and behold it produces the current time, which is then stored as a timestamp constant. The notation 'now()' that various people have suggested is in fact invalid, being not a function call but a literal --- but it seems that the timestamp input converter is sloppy about detecting trailing garbage in its input string. You should get a "Bad timestamp external representation" error from it, but at the moment you don't. You can easily check the behavior for yourself rather than relying on other people's assertions. For example: regression=# create table t1 (f1 timestamp default now(), regression(# f2 timestamp default 'now', regression(# f3 timestamp default timestamp 'now'); CREATE TABLE regression=# \d t1 Table "public.t1" Column | Type | Modifiers --------+-----------------------------+------------------------------------------------------------------- f1 | timestamp without time zone | default now() f2 | timestamp without time zone | default 'now' f3 | timestamp without time zone | default '2003-01-26 16:51:11.246954'::timestamp without time zone and even more directly: regression=# insert into t1 default values; INSERT 1003028 1 regression=# insert into t1 default values; INSERT 1003029 1 regression=# select * from t1; f1 | f2 | f3 ----------------------------+----------------------------+---------------------------- 2003-01-26 16:58:13.173561 | 2003-01-26 16:58:13.173561 | 2003-01-26 16:51:11.246954 2003-01-26 16:58:14.323162 | 2003-01-26 16:58:14.323162 | 2003-01-26 16:51:11.246954 (2 rows) (BTW, the reason 'now' without "timestamp" in front works is that this is not a timestamp literal but a text literal, which will be coerced to timestamp at runtime.) regards, tom lane
At 05:00 PM 1/26/03 -0500, Tom Lane wrote: >(BTW, the reason 'now' without "timestamp" in front works is that this >is not a timestamp literal but a text literal, which will be coerced >to timestamp at runtime.) Is it known at the moment which of those methods the Postgresql team are aiming to continue supporting for the near/medium future? e.g. current_timestamp is guaranteed. now() for the forseeable future. 'now' for the next few versions. Thanks, Link.
Lincoln Yeoh <lyeoh@pop.jaring.my> writes: > At 05:00 PM 1/26/03 -0500, Tom Lane wrote: >> (BTW, the reason 'now' without "timestamp" in front works is that this >> is not a timestamp literal but a text literal, which will be coerced >> to timestamp at runtime.) > Is it known at the moment which of those methods the Postgresql team are > aiming to continue supporting for the near/medium future? AFAIK there are no plans to break any of them; though certainly CURRENT_TIMESTAMP is the most future-proof, being spec-mandated. regards, tom lane
On Sun, Jan 26, 2003 at 05:00:36PM -0500, Tom Lane wrote: > now() is a function call and will not be folded into a constant. > You can easily check the behavior for yourself rather than relying on > other people's assertions. note -- this is ALWAYS a good idea. especially when it's *me* giving the assertion :) reminds me of a high-school prof i had. he'd stop in the middle of a lesson and ask if anybody knew the meaning of a word he'd just used ("mote", for example); if not, he'd supply a definition (small fly) and go on. later we'd have a vocabulary test; any answers that parroted what he'd said were always marked wrong, as he'd just made them up off-the-cuff. (mote is a particle, as in gritty dust.) double-check for yourself. don't take anybody's word for it. KNOW, don't trust. the best way to find out, is to FIND OUT. -- There are 10 kinds of people: ones that get binary, and ones that don't. will@serensoft.com http://sourceforge.net/projects/newbiedoc -- we need your brain! http://www.dontUthink.com/ -- your brain needs us! Looking for a firewall? Do you think smoothwall sucks? You're probably right... Try the folks at http://clarkconnect.org/ !
At 11:59 PM 1/26/03 -0600, will trillich wrote: >On Sun, Jan 26, 2003 at 05:00:36PM -0500, Tom Lane wrote: > > now() is a function call and will not be folded into a constant. > > > You can easily check the behavior for yourself rather than relying on > > other people's assertions. > >note -- this is ALWAYS a good idea. especially when it's *me* >giving the assertion :) It's still good hear from the developers what they think the behaviour should be, and would be. If the devs, docs and software agree then it'll be fine to use the feature. Otherwise if possible, it might be a better idea to use a different feature to achieve what I want. Don't want to use a behaviour will soon be deprecated/changed. Of course, it's not good to pester you guys for everything either. But current_timestamp vs now() vs 'now' would probably be a popular feature enough to clarify. Hope this doesn't waste too much bandwidth ;). Link.
FYI, I usually prefer triggers in cases where you want to modify/check the row as it is being processed, and rules for modifying other rows/tables as part of row processing, but both do work. --------------------------------------------------------------------------- will trillich wrote: > On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote: > > As someone who is just getting started with PostygreSQL from > > years working with MySQL, it appears that the timestamp data > > type does not behave in the way it did with MySQL. I got used > > to just defining a column as a timestamp and letting the > > database throw the latest time stamp in there whenever a row > > was updated. Is there anything simular in PosgreSQL? How can I > > accomplish something simular inside the database, or am I stuck > > populating the field in some manner as in the following example > > > > update blah blah blah timestamp = NOW() > > triggers or rules can do that with any timestamp (or timestamp(0) > -- no partial-seconds) field. here's a "rules" approach: > > create table _something ( > id serial, > dat text, > freshened timestamp(0), > primary key ( id ) > ); > > create view something as > select > id, > dat, > freshened > from > _something; > > create rule something_add as > on insert to something > do instead ( > insert into _something ( > --id, > dat, > freshened > ) values ( > --let id take care of itself, > NEW.dat, > current_timestamp > ); > ); > > create rule something_edit as > on update to something > do instead ( > update _something set > --id = leave it alone, > dat = NEW.dat, > freshened = current_timestamp > where > id = NEW.id > ; > ); > > then you can just > > insert into something (dat) values ('yada yada'); > update something set dat = 'here we go' where id = 23978; > > and "freshened" takes care of itself. > > -- > There are 10 kinds of people: > ones that get binary, and ones that don't. > > will@serensoft.com > http://sourceforge.net/projects/newbiedoc -- we need your brain! > http://www.dontUthink.com/ -- your brain needs us! > > Looking for a firewall? Do you think smoothwall sucks? You're > probably right... Try the folks at http://clarkconnect.org/ ! > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sun, Jan 26, 2003 at 05:00:36PM -0500, Tom Lane wrote: > "Matthew Nuzum" <cobalt@bearfruit.org> writes: > > The difference is when now() gets interpreted into a date. Someone > > please correct me if I'm wrong... > now() is a function call and will not be folded into a constant. I know it's not really related, but I think it might be worth noting here, also, that now() is (was? I don't have anything later than 7.2 handy for testing at the moment) resolves to the beginning of the transaction. If you need time to move forward during your transaction, you need timeofday(). A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
HI will we see sort by relevance at tsearch alpha version? :) -- WBR, sector119
Attachment
On Tue, 28 Jan 2003 sector119@mail.ru wrote: > HI > > will we see sort by relevance at tsearch alpha version? :) > not sure. We concentrate our efforts, well, Teodor is working on better configurability of tsearch like OpenFTS does. It's not difficult to add rather naive relevance based on position of lexem in document, for example. The question is do you like such kind of relevancy ? Real ranking support (as in OpenFTS) require separate tables to maintain coordinate information. We want to keep tsearch as simple as it's and now we just add better and friendly configurability. Do we need complicate tsearch ? We already have OpenFTS which has most features people requested. Regards, Oleg > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Hi! I think that this would be nice. OpenFTS is great, but i would be great if this would be implement in real pg functions. I think that indexim would be great if pg make it by itself. Also it could be great if we could define order of weight of columns. bye Uros I On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su> wrote: > On Tue, 28 Jan 2003 sector119@mail.ru wrote: > > > HI > > > > will we see sort by relevance at tsearch alpha version? :) > > > > not sure. We concentrate our efforts, well, Teodor is working > on > better configurability of tsearch like OpenFTS does. > > It\\\'s not difficult to add rather naive relevance based on > position > of lexem in document, for example. The question is do you like > such > kind of relevancy ? Real ranking support (as in OpenFTS) > require > separate tables to maintain coordinate information. > We want to keep tsearch as simple as it\\\'s and now we just add > better and friendly configurability. Do we need complicate > tsearch ? > We already have OpenFTS which has most features people > requested. >
At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message: >We want to keep tsearch as simple as it's and now we just add >better and friendly configurability. Do we need complicate tsearch ? Sometimes you need that because some other app is putting data into database. >We already have OpenFTS which has most features people requested. But integration into other stuff could use some work.
On Tue, 28 Jan 2003, Uros Gruber wrote: > Hi! > > I think that this would be nice. OpenFTS is great, but i would > be great if this would be implement in real pg functions. > > I think that indexim would be great if pg make it by itself. > > Also it could be great if we could define order of weight of > columns. Could you elaborate this ? > > bye Uros > > I > On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su> > wrote: > > > On Tue, 28 Jan 2003 sector119@mail.ru wrote: > > > > > HI > > > > > > will we see sort by relevance at tsearch alpha version? :) > > > > > > > not sure. We concentrate our efforts, well, Teodor is working > > on > > better configurability of tsearch like OpenFTS does. > > > > It\\\'s not difficult to add rather naive relevance based on > > position > > of lexem in document, for example. The question is do you > like > > such > > kind of relevancy ? Real ranking support (as in OpenFTS) > > require > > separate tables to maintain coordinate information. > > We want to keep tsearch as simple as it\\\'s and now we just > add > > better and friendly configurability. Do we need complicate > > tsearch ? > > We already have OpenFTS which has most features people > > requested. > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Tue, 28 Jan 2003, Tomaz Borstnar wrote: > At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message: > >We want to keep tsearch as simple as it's and now we just add > >better and friendly configurability. Do we need complicate tsearch ? > > Sometimes you need that because some other app is putting data into database. > So, you'll end up with something like OpenFTS, which was designed as *engine* to be integrated into other apps. The real problem is that OpenFTS is written in perl and porting to other languages is difficult task. new tsearch already has some features of OpenFTS and we're slowly moving to idea we should rewrite OpenFTS in 'C', so writing interfaces would be much simpler. There is major problem with moving ALL features of OpenFTS to tsearch we don't know how to resolve - generation of headlines, text fragments with hilighted query terms. Once we resolve that we could concentrate on tsearch with ranking support. > >We already have OpenFTS which has most features people requested. > > But integration into other stuff could use some work. > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Hi! OpenFTS is great so far. But for example. We are working on directory engine and we would like to use some ranking on data we get from tsearch. The we have data like Page caption, description, keywords, url, page content .... and then we have another project we we search on complitely different kind of data. Using full text search in this scenario is very easy to use, because everything is in db and this is done on db level. Developer do not need to worry about that how to index something. It great because you can say this column is fulltext indexed. Second stage is ordering data you get from tsearch and thats where openFTS comes. But you have to make some middle ware which is great, but we need to focus on other problems not on middle ware. Moving this to C would be great but not solution to all of us we want to meka our searches good. I think relkov and relor is good for start and should be going that way. I think that everybody can very simple acomplish hilightning and generation of headlines once they get result ordered. As i say in my mail before and Oleg ask me "Could you elaborate this ?". I try to make some changes openFTS special in relkov and relkor. But i'm not god in advanced C programing so i spend a lot of time to find out what exactly code does. And here is my idea what would be great if this is possible to make, because i don't realy know how pg internaly works. Let say we create some table where we want to use full text search. CREATE table ..... .. mycolumn varchar, another_column varchar, .... fulltext(mycolumn,another_column) } the system then make all necessary index tables where those positions would be saved when some data is inserted. I don't know if this is possible to make somwehere in backgound so user don actualy se those tables, but this is not a problem. Parsing search words can anybody easily make in their own language. Or he could use OpenFTS functionality. I made it for PHP. So when you have those search words we passed it to sql query. something like this. SELECT mycolumn,another_column FROM mytable WHERE mycolumn @ 'search string' AND another_column @ 'search string'; This is done by tsearch and we get data searched but not orderd by relevance. For that we add something in that way SELECT mycolumn,another_column,rank() AS sumofrank FROM mytable WHERE....... ORDER my sumofrank I'll write this rank here for better understanding rank({mycolumn=>0.01},{another_column=>0.001},'search string') AS sumofrank This would read that mycolumn have base weight 0.01 and another column 0.001, so if search string is found in beginig of another column it would be ranked lower than same string found in mycolumn in the middle of it. Those weight could be summed. With this could be possible to make order what column is more important not only generaly but for every query we make. Sintax is just for easier understanding what i'm trying to solve. So far we orderd aout data and then we could make hilighning and stuff in any language we want. I hope everybody undestands what is my idea and i would like to help i just have to learn more from the code and what internaly is done with that data. I make some ranking in PHP but it was not fast becase there were a lot of data etc and php is not as fast as C is. But i get pretty results and also the concept how to rank something. I could also be made some rule engine how to rank something, but i think that first of all we have to start on something trivial and simple. And when this works we move to advanced. Let say we check if text is bold or is in CAPS... -- bye, Uros Tuesday, January 28, 2003, 8:11:36 PM, you wrote: OB> On Tue, 28 Jan 2003, Tomaz Borstnar wrote: >> At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message: >> >We want to keep tsearch as simple as it's and now we just add >> >better and friendly configurability. Do we need complicate tsearch ? >> >> Sometimes you need that because some other app is putting data into database. >> OB> So, you'll end up with something like OpenFTS, which was designed as OB> *engine* to be integrated into other apps. The real problem is that OB> OpenFTS is written in perl and porting to other languages is OB> difficult task. new tsearch already has some features of OpenFTS and OB> we're slowly moving to idea we should rewrite OpenFTS in 'C', OB> so writing interfaces would be much simpler. OB> There is major problem with moving ALL features of OpenFTS to tsearch OB> we don't know how to resolve - generation of headlines, text fragments OB> with hilighted query terms. Once we resolve that we could concentrate OB> on tsearch with ranking support.
Hi! I think that this would be nice. OpenFTS is great, but i would be great if this would be implement in real pg functions. I think that indexim would be great if pg make it by itself. Also it could be great if we could define order of weight of columns. bye Uros I On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su> wrote: > On Tue, 28 Jan 2003 sector119@mail.ru wrote: > > > HI > > > > will we see sort by relevance at tsearch alpha version? :) > > > > not sure. We concentrate our efforts, well, Teodor is working > on > better configurability of tsearch like OpenFTS does. > > It\\\'s not difficult to add rather naive relevance based on > position > of lexem in document, for example. The question is do you like > such > kind of relevancy ? Real ranking support (as in OpenFTS) > require > separate tables to maintain coordinate information. > We want to keep tsearch as simple as it\\\'s and now we just add > better and friendly configurability. Do we need complicate > tsearch ? > We already have OpenFTS which has most features people > requested. > > > Regards, > Oleg > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org > -- Any excuse will serve a tyrant. -- Aesop
Hi, I guess what we're looking for is something on the order (as much as I hate using it as a reference) of MySQL's full text search which does offer some ranking. Just putting ranking alone in tsearch would be a huge benefit. Users can then decide in their own language how to display results, especially since those results may not necessarily require titles or description fragments. For example, we have several huge tables that have the following columns: > id > tbltype > title > description Basically, our customer will lookup words that are contained in title and description, so we make an additional table like: > id > tblid (id of the source table) > tblsource (which table) > content (txtidx) Then we can use tsearch to search the second table (we do now), and once we retrieve the id's that we want, we can display results from one or more source tables. Just putting in ranking in tsearch would solve all these problems. - Ericson Smith http://www.did-it.com http://www.weightlossfriends.com On Tue, 2003-01-28 at 14:00, Oleg Bartunov wrote: > On Tue, 28 Jan 2003, Uros Gruber wrote: > > > Hi! > > > > I think that this would be nice. OpenFTS is great, but i would > > be great if this would be implement in real pg functions. > > > > I think that indexim would be great if pg make it by itself. > > > > Also it could be great if we could define order of weight of > > columns. > > Could you elaborate this ? > > > > > bye Uros > > > > I > > On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su> > > wrote: > > > > > On Tue, 28 Jan 2003 sector119@mail.ru wrote: > > > > > > > HI > > > > > > > > will we see sort by relevance at tsearch alpha version? :) > > > > > > > > > > not sure. We concentrate our efforts, well, Teodor is working > > > on > > > better configurability of tsearch like OpenFTS does. > > > > > > It\\\'s not difficult to add rather naive relevance based on > > > position > > > of lexem in document, for example. The question is do you > > like > > > such > > > kind of relevancy ? Real ranking support (as in OpenFTS) > > > require > > > separate tables to maintain coordinate information. > > > We want to keep tsearch as simple as it\\\'s and now we just > > add > > > better and friendly configurability. Do we need complicate > > > tsearch ? > > > We already have OpenFTS which has most features people > > > requested. > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Good ideas, I second them entirely. We're using tsearch as well, and would like to see ranking in this module. It would be great to have it on the DB level, instead of middleware (perl, etc). Apparently there is built in proximity ranking? I guess one of the problems with using the OpenFTS front end is integration into existing systems that use different languages and methodologies. Most of the time, you tend to want those results delivered within your application, instead of outside of it in a general purpose search engine. Also, in many typical applications, the data is not necessarily title/description based, for instance, we need to lookup data from just a title, or sometimes a content column, and maybe even from our keywords table. - Ericson Smith http://www.weightlossfriends.com On Tue, 2003-01-28 at 17:48, Uros Gruber wrote: > Hi! > > OpenFTS is great so far. But for example. We are working on > directory engine and we would like to use some ranking on > data we get from tsearch. The we have data like Page caption, > description, keywords, url, page content .... and then we > have another project we we search on complitely different > kind of data. > > Using full text search in this scenario is very easy to use, > because everything is in db and this is done on db level. > Developer do not need to worry about that how to index > something. It great because you can say this column is > fulltext indexed. > > Second stage is ordering data you get from tsearch and thats > where openFTS comes. But you have to make some middle ware > which is great, but we need to focus on other problems not on > middle ware. > > Moving this to C would be great but not solution to all of us > we want to meka our searches good. > > I think relkov and relor is good for start and should be > going that way. I think that everybody can very simple > acomplish hilightning and generation of headlines once they > get result ordered. > > As i say in my mail before and Oleg ask me "Could you > elaborate this ?". I try to make some changes openFTS special > in relkov and relkor. But i'm not god in advanced C > programing so i spend a lot of time to find out what exactly > code does. > > And here is my idea what would be great if this is possible > to make, because i don't realy know how pg internaly works. > > Let say we create some table where we want to use full text > search. > > CREATE table ..... > .. > mycolumn varchar, > another_column varchar, > .... > fulltext(mycolumn,another_column) > > } > the system then make all necessary index tables where those > positions would be saved when some data is inserted. I don't > know if this is possible to make somwehere in backgound so > user don actualy se those tables, but this is not a problem. > > Parsing search words can anybody easily make in their own > language. Or he could use OpenFTS functionality. I made it > for PHP. So when you have those search words we passed it to > sql query. > > something like this. > > SELECT mycolumn,another_column FROM mytable WHERE mycolumn @ > 'search string' AND another_column @ 'search string'; > > This is done by tsearch and we get data searched but not > orderd by relevance. > > For that we add something in that way > > SELECT mycolumn,another_column,rank() AS sumofrank FROM mytable WHERE....... ORDER my > sumofrank > > I'll write this rank here for better understanding > > rank({mycolumn=>0.01},{another_column=>0.001},'search string') AS sumofrank > > This would read that mycolumn have base weight 0.01 and > another column 0.001, so if search string is found in beginig > of another column it would be ranked lower than same string > found in mycolumn in the middle of it. Those weight could be > summed. With this could be possible to make order what column > is more important not only generaly but for every query we > make. > > Sintax is just for easier understanding what i'm trying to > solve. > > So far we orderd aout data and then we could make hilighning > and stuff in any language we want. > > I hope everybody undestands what is my idea and i would like > to help i just have to learn more from the code and what > internaly is done with that data. > > I make some ranking in PHP but it was not fast becase there > were a lot of data etc and php is not as fast as C is. But i > get pretty results and also the concept how to rank > something. > > I could also be made some rule engine how to rank something, > but i think that first of all we have to start on something > trivial and simple. And when this works we move to advanced. > Let say we check if text is bold or is in CAPS... > > -- > bye, > Uros > > Tuesday, January 28, 2003, 8:11:36 PM, you wrote: > > OB> On Tue, 28 Jan 2003, Tomaz Borstnar wrote: > > >> At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message: > >> >We want to keep tsearch as simple as it's and now we just add > >> >better and friendly configurability. Do we need complicate tsearch ? > >> > >> Sometimes you need that because some other app is putting data into database. > >> > > OB> So, you'll end up with something like OpenFTS, which was designed as > OB> *engine* to be integrated into other apps. The real problem is that > OB> OpenFTS is written in perl and porting to other languages is > OB> difficult task. new tsearch already has some features of OpenFTS and > OB> we're slowly moving to idea we should rewrite OpenFTS in 'C', > OB> so writing interfaces would be much simpler. > OB> There is major problem with moving ALL features of OpenFTS to tsearch > OB> we don't know how to resolve - generation of headlines, text fragments > OB> with hilighted query terms. Once we resolve that we could concentrate > OB> on tsearch with ranking support. > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On 28 Jan 2003, eric@did-it.com wrote: > Hi, > > I guess what we're looking for is something on the order (as much as I > hate using it as a reference) of MySQL's full text search which does > offer some ranking. > > Just putting ranking alone in tsearch would be a huge benefit. Users can > then decide in their own language how to display results, especially > since those results may not necessarily require titles or description > fragments. > > For example, we have several huge tables that have the following > columns: > > > id > > tbltype > > title > > description > > Basically, our customer will lookup words that are contained in title > and description, so we make an additional table like: > > > id > > tblid (id of the source table) > > tblsource (which table) > > content (txtidx) > > Then we can use tsearch to search the second table (we do now), and once > we retrieve the id's that we want, we can display results from one or > more source tables. Just putting in ranking in tsearch would solve all > these problems. Hmm, people used to concatenation to get the same result. Do you really need that table ? Your problem doesn't relate to ranking of results. We could add some ranking support based on local (per-document) statistics. Keeping global statistics, for example, TFxIDF, would complicate tsearch and maintaining of indices. Proximity ranking as in OpenFTS require more options in tsearch configuration. Let us think about ranking later after we implement friendly interface. > > - Ericson Smith > http://www.did-it.com > http://www.weightlossfriends.com > > > On Tue, 2003-01-28 at 14:00, Oleg Bartunov wrote: > > On Tue, 28 Jan 2003, Uros Gruber wrote: > > > > > Hi! > > > > > > I think that this would be nice. OpenFTS is great, but i would > > > be great if this would be implement in real pg functions. > > > > > > I think that indexim would be great if pg make it by itself. > > > > > > Also it could be great if we could define order of weight of > > > columns. > > > > Could you elaborate this ? > > > > > > > > bye Uros > > > > > > I > > > On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su> > > > wrote: > > > > > > > On Tue, 28 Jan 2003 sector119@mail.ru wrote: > > > > > > > > > HI > > > > > > > > > > will we see sort by relevance at tsearch alpha version? :) > > > > > > > > > > > > > not sure. We concentrate our efforts, well, Teodor is working > > > > on > > > > better configurability of tsearch like OpenFTS does. > > > > > > > > It\\\'s not difficult to add rather naive relevance based on > > > > position > > > > of lexem in document, for example. The question is do you > > > like > > > > such > > > > kind of relevancy ? Real ranking support (as in OpenFTS) > > > > require > > > > separate tables to maintain coordinate information. > > > > We want to keep tsearch as simple as it\\\'s and now we just > > > add > > > > better and friendly configurability. Do we need complicate > > > > tsearch ? > > > > We already have OpenFTS which has most features people > > > > requested. > > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > Sternberg Astronomical Institute, Moscow University (Russia) > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg, We actually have several somewhat similar tables (A, B, C, D, E...) that have some textual/varchar content. Thus we make a search table Z that concatenates the textual info from the first tables. Sure, we could probably use unions and such the like, but performance reasons prohibit that scenario :-) Its much better to search the search table, then show the relevant data from the source tables based on ranked results. - Ericson Smith On Wed, 2003-01-29 at 03:37, Oleg Bartunov wrote: > On 28 Jan 2003, eric@did-it.com wrote: > > > Hi, > > > > I guess what we're looking for is something on the order (as much as I > > hate using it as a reference) of MySQL's full text search which does > > offer some ranking. > > > > Just putting ranking alone in tsearch would be a huge benefit. Users can > > then decide in their own language how to display results, especially > > since those results may not necessarily require titles or description > > fragments. > > > > For example, we have several huge tables that have the following > > columns: > > > > > id > > > tbltype > > > title > > > description > > > > Basically, our customer will lookup words that are contained in title > > and description, so we make an additional table like: > > > > > id > > > tblid (id of the source table) > > > tblsource (which table) > > > content (txtidx) > > > > Then we can use tsearch to search the second table (we do now), and once > > we retrieve the id's that we want, we can display results from one or > > more source tables. Just putting in ranking in tsearch would solve all > > these problems. > > Hmm, people used to concatenation to get the same result. Do you really > need that table ? Your problem doesn't relate to ranking of results. > > We could add some ranking support based on local (per-document) statistics. > Keeping global statistics, for example, TFxIDF, would complicate tsearch > and maintaining of indices. Proximity ranking as in OpenFTS require > more options in tsearch configuration. Let us think about ranking later > after we implement friendly interface. > > > > > - Ericson Smith > > http://www.did-it.com > > http://www.weightlossfriends.com > > > > > > On Tue, 2003-01-28 at 14:00, Oleg Bartunov wrote: > > > On Tue, 28 Jan 2003, Uros Gruber wrote: > > > > > > > Hi! > > > > > > > > I think that this would be nice. OpenFTS is great, but i would > > > > be great if this would be implement in real pg functions. > > > > > > > > I think that indexim would be great if pg make it by itself. > > > > > > > > Also it could be great if we could define order of weight of > > > > columns. > > > > > > Could you elaborate this ? > > > > > > > > > > > bye Uros > > > > > > > > I > > > > On 28.01.2003 at 11:53:26, Oleg Bartunov <oleg@sai.msu.su> > > > > wrote: > > > > > > > > > On Tue, 28 Jan 2003 sector119@mail.ru wrote: > > > > > > > > > > > HI > > > > > > > > > > > > will we see sort by relevance at tsearch alpha version? :) > > > > > > > > > > > > > > > > not sure. We concentrate our efforts, well, Teodor is working > > > > > on > > > > > better configurability of tsearch like OpenFTS does. > > > > > > > > > > It\\\'s not difficult to add rather naive relevance based on > > > > > position > > > > > of lexem in document, for example. The question is do you > > > > like > > > > > such > > > > > kind of relevancy ? Real ranking support (as in OpenFTS) > > > > > require > > > > > separate tables to maintain coordinate information. > > > > > We want to keep tsearch as simple as it\\\'s and now we just > > > > add > > > > > better and friendly configurability. Do we need complicate > > > > > tsearch ? > > > > > We already have OpenFTS which has most features people > > > > > requested. > > > > > > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > > > > Regards, > > > Oleg > > > _____________________________________________________________ > > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > > > Sternberg Astronomical Institute, Moscow University (Russia) > > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 3: if posting/reading through Usenet, please send an appropriate > > > subscribe-nomail command to majordomo@postgresql.org so that your > > > message can get through to the mailing list cleanly > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 >