Thread: Tsearch2 / Create rule on select
Dear Friends, I am very impressed by TSearch2 and would like to thank Oleg and the team for their hard work. I would like to migrate a phpBB forum with more that 200.000 messages to TSeach. Full text searches have become a bottleneck. When there are more than 200 users, the server slow dramatically. I need TSearch! So ... I installed TSearch2, a French dictionnary, everything is okay. My database is PostgreSQL 8.2.3, UTF-8. My questions now: 1) Should I migrate phpBB to TSeach2 in PHP code, rewriting the queries, which is quite obvious or easy. OR 2) Is there a smart way to catch queries on the fly in PostgreSQL, for example using "CREATE RULE "_RETURN" AS ON SELECT TO t1 DO INSTEAD " or the like. The idea would be that when I search on a text field, PostgreSQL rule system would rewrite the query to seach using Tsearch2. Is there any tool in TSeach2 doing this query rewriting rule? Obviously, it does not seem to exist, but I would like to be sure. 3) Gin indexes Are Gin indexes recommended for large databases? How to calculate the size in memory of an index? Kind regards, Jean-Michel
On Thu, Mar 08, 2007 at 09:56:23PM +0100, Jean-Michel Pouré wrote: > Dear Friends, > > I am very impressed by TSearch2 and would like to thank Oleg and the > team for their hard work. > > I would like to migrate a phpBB forum with more that 200.000 messages to > TSeach. Full text searches have become a bottleneck. When there are more > than 200 users, the server slow dramatically. I need TSearch! > > So ... I installed TSearch2, a French dictionnary, everything is okay. > My database is PostgreSQL 8.2.3, UTF-8. > > My questions now: > 1) Should I migrate phpBB to TSeach2 in PHP code, rewriting the queries, > which is quite obvious or easy. Yes. > OR > > 2) Is there a smart way to catch queries on the fly in PostgreSQL, for > example using > > "CREATE RULE "_RETURN" AS > ON SELECT TO t1 > DO INSTEAD " > > or the like. I don't think you can do that on-the-fly. But even if you can, if it's reasonably easy to rewrite the query in the PHP, you should do that instead IMHO. > 3) Gin indexes > Are Gin indexes recommended for large databases? Yes. At least if you have lots of search activity, they can be a *lot* faster than GiST (see for example http://people.planetpostgresql.org/mha/index.php?/archives/112-GIN-performance-postgresql.org-websearch-part-2.html) > How to calculate the size in memory of an index? No idea. Assuming you want to do it beforehand. otherwise, just create the index and see how large it got? //Magnus
Le vendredi 09 mars 2007 à 10:58 +0100, Magnus Hagander a écrit : > No idea. Assuming you want to do it beforehand. otherwise, just create > the index and see how large it got? Thank you for your comments. I will add TSeach2 support to phpBB 3.x soon. I had incredible response time on simple queries on more than plain-text 200.000 row. About 1 millisecond! Incredible but true. explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@ 'jmp'::tsquery limit 100; Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824 rows=100 loops=1) -> Index Scan using idxfti_idx on phpbb_posts_text (cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418 rows=100 loops=1) Index Cond: (idxfti @@ '''jmp'''::tsquery) [color=red]Total runtime: 1.068 ms TSearch2 will be used by wikimedia shortly (it is supported in beta version). IMHO, it would be better if PostgreSQL parser was able to find Tseach indexes alone, rewritting the query automatically. Maybe it is on the radar list of hackers. Kind regards, Jean-Michel
On Fri, 9 Mar 2007, Jean-Michel Pour? wrote: > Le vendredi 09 mars 2007 ЪЪ 10:58 +0100, Magnus Hagander a ЪЪcrit : >> No idea. Assuming you want to do it beforehand. otherwise, just create >> the index and see how large it got? > > Thank you for your comments. I will add TSeach2 support to phpBB 3.x > soon. > > I had incredible response time on simple queries on more than plain-text > 200.000 row. About 1 millisecond! Incredible but true. > > explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@ > 'jmp'::tsquery limit 100; > > Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824 > rows=100 loops=1) > -> Index Scan using idxfti_idx on phpbb_posts_text > (cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418 > rows=100 loops=1) > Index Cond: (idxfti @@ '''jmp'''::tsquery) > [color=red]Total runtime: 1.068 ms > > TSearch2 will be used by wikimedia shortly (it is supported in beta > version). I've already seen tsearch2 in 1.9.X release. > > IMHO, it would be better if PostgreSQL parser was able to find Tseach > indexes alone, rewritting the query automatically. Maybe it is on the > radar list of hackers. btw, we're working on new FTS feature of PostgreSQL, it's basically tsearch integrated into the pg core. But it has some new features and new SQL commands for configuring of FTS, so if the matter is not pressing I'd recommend to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc > > Kind regards, > Jean-Michel > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Le samedi 10 mars 2007 06:32, Oleg Bartunov a écrit : > On Fri, 9 Mar 2007, Jean-Michel Pour? wrote: > > Le vendredi 09 mars 2007 ЪЪ 10:58 +0100, Magnus Hagander a ЪЪcrit : > >> No idea. Assuming you want to do it beforehand. otherwise, just create > >> the index and see how large it got? > > > > Thank you for your comments. I will add TSeach2 support to phpBB 3.x > > soon. > > > > I had incredible response time on simple queries on more than plain-text > > 200.000 row. About 1 millisecond! Incredible but true. > > > > explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@ > > 'jmp'::tsquery limit 100; > > > > Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824 > > rows=100 loops=1) > > -> Index Scan using idxfti_idx on phpbb_posts_text > > (cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418 > > rows=100 loops=1) > > Index Cond: (idxfti @@ '''jmp'''::tsquery) > > [color=red]Total runtime: 1.068 ms > > > > TSearch2 will be used by wikimedia shortly (it is supported in beta > > version). > > I've already seen tsearch2 in 1.9.X release unfortunely, rev 20094 wich start to correctly handle tsearch2 operator in query parser is not in the 1.9.3 mediawiki svn tag but in the branch and trunk. > > > IMHO, it would be better if PostgreSQL parser was able to find Tseach > > indexes alone, rewritting the query automatically. Maybe it is on the > > radar list of hackers. > > btw, we're working on new FTS feature of PostgreSQL, it's basically > tsearch integrated into the pg core. But it has some new features and > new SQL commands for configuring of FTS, so if the matter is not pressing > I'd recommend to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc > > > Kind regards, > > Jean-Michel > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: Don't 'kill -9' the postmaster > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Le samedi 10 mars 2007 à 08:32 +0300, Oleg Bartunov a écrit : > btw, we're working on new FTS feature of PostgreSQL, it's basically > tsearch integrated into the pg core. But it has some new features and > new SQL commands for configuring of FTS, so if the matter is not > pressing I'd recommend > to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc Great. Thank you very much Oleg. As suggested, I will play with tsearch_core-0.37.gz patch. After testing first, I might be interested in using tseach_core on a production database. It is a community forum with 200.000 messages. tseach-core would only be used for searching messages. There is no money at stake, the database is backed-up twice a day using cron jobs. Whenever a bug happens, are there chances that PostgreSQL database gets perverted? On only the parser? If there is no chance to destroy the database, I may be interested in using Tsearch-core in production, quickly. When will tsearch-core become part of CVS-head? Kind regards, Jean-Michel
Jean-Michel Pouré wrote: > IMHO, it would be better if PostgreSQL parser was able to find Tseach > indexes alone, rewritting the query automatically. Maybe it is on the > radar list of hackers. How would you "rewrite a query automatically"? PostgreSQL will find it automatically when you attempt to do a match using the fulltext search operator, which is the only one supported by tsearch. I'm not sure what else it could do? //Magnus