Thread: TSearch2 vs. Apache Lucene
Greetings all, I'm going to do a performance comparison with DocMgr and PG81/TSearch2 on one end, and Apache Lucene on the other end. In order to do this, I'm going to create a derivative of the docmgr-autoimport script so that I can specify one file to import at a time. I'll then create a Perl script which logs all details (such as timing, etc.) as the test progresses. As test data, I have approximately 9,000 text files from Project Gutenberg ranging in size from a few hundred bytes to 4.5M. I plan to test the speed of import of each file. Then, I plan to write a web-robot in Perl that will test the speed and number of results returned. Can anyone think of a validation of this test, or how I should configure PG to maximise import and search speed? Can I maximise search speed and import speed, or are those things mutually exclusive? (Note that this will be run on limited hardware - 900MHz Athlon with 512M of ram) Has anyone ever compared TSearch2 to Lucene, as far as performance is concerned? Thanks, -Josh
> Has anyone ever compared TSearch2 to Lucene, as far as performance is > concerned? I'll stay away from TSearch2 until it is fully integrated in the postgres core (like "create index foo_text on foo (texta, textb) USING TSearch2"). Because a full integration is unlikely to happen in the near future (as far as I know), I'll stick to Lucene. Mike
Folks, tsearch2 and Lucene are very different search engines, so it'd be unfair comparison. If you need full access to metadata and instant indexing you, probably, find tsearch2 is more suitable then Lucene. But, if you could live without that features and need to search read only archives you need Lucene. Tsearch2 integration into pgsql would be cool, but, I see no problem to use tsearch2 as an official extension module. After completing our todo, which we hope will likely happens for 8.2 release, you could forget about Lucene and other engines :) We'll be available for developing in spring and we estimate about three months for our todo, so, it's really doable. Oleg On Tue, 6 Dec 2005, Michael Riess wrote: > >> Has anyone ever compared TSearch2 to Lucene, as far as performance is >> concerned? > > I'll stay away from TSearch2 until it is fully integrated in the postgres > core (like "create index foo_text on foo (texta, textb) USING TSearch2"). > Because a full integration is unlikely to happen in the near future (as far > as I know), I'll stick to Lucene. > > Mike > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > Folks, > > tsearch2 and Lucene are very different search engines, so it'd be unfair > comparison. If you need full access to metadata and instant indexing > you, probably, find tsearch2 is more suitable then Lucene. But, if > you could live without that features and need to search read only > archives you need Lucene. > > Tsearch2 integration into pgsql would be cool, but, I see no problem to > use tsearch2 as an official extension module. After completing our > todo, which we hope will likely happens for 8.2 release, you could > forget about Lucene and other engines :) We'll be available for developing > in spring and we estimate about three months for our todo, so, it's > really doable. Agreed. There isn't anything magical about a plug-in vs something integrated, as least in PostgreSQL. In other database, plug-ins can't fully function as integrated, but in PostgreSQL, everything is really a plug-in because it is all abstracted. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Oleg Bartunov wrote: >> Tsearch2 integration into pgsql would be cool, but, I see no problem to >> use tsearch2 as an official extension module. > Agreed. There isn't anything magical about a plug-in vs something > integrated, as least in PostgreSQL. The quality gap between contrib and the main system is a lot smaller than it used to be, at least for those contrib modules that have regression tests. Main and contrib get equal levels of testing from the buildfarm, so they're about on par as far as portability goes. We could never say that before 8.1 ... (Having said that, I think that tsearch2 will eventually become part of core, but probably not for awhile yet.) regards, tom lane
Bruce Momjian schrieb: > Oleg Bartunov wrote: >> Folks, >> >> tsearch2 and Lucene are very different search engines, so it'd be unfair >> comparison. If you need full access to metadata and instant indexing >> you, probably, find tsearch2 is more suitable then Lucene. But, if >> you could live without that features and need to search read only >> archives you need Lucene. >> >> Tsearch2 integration into pgsql would be cool, but, I see no problem to >> use tsearch2 as an official extension module. After completing our >> todo, which we hope will likely happens for 8.2 release, you could >> forget about Lucene and other engines :) We'll be available for developing >> in spring and we estimate about three months for our todo, so, it's >> really doable. > > Agreed. There isn't anything magical about a plug-in vs something > integrated, as least in PostgreSQL. In other database, plug-ins can't > fully function as integrated, but in PostgreSQL, everything is really a > plug-in because it is all abstracted. I only remember evaluating TSearch2 about a year ago, and when I read statements like "Vacuum and/or database dump/restore work differently when using TSearch2, sql scripts need to be executed etc." I knew that I would not want to go there. But I don't doubt that it works, and that it is a sane concept.
Michael Riess wrote: > Bruce Momjian schrieb: > > Oleg Bartunov wrote: > >> Folks, > >> > >> tsearch2 and Lucene are very different search engines, so it'd be unfair > >> comparison. If you need full access to metadata and instant indexing > >> you, probably, find tsearch2 is more suitable then Lucene. But, if > >> you could live without that features and need to search read only > >> archives you need Lucene. > >> > >> Tsearch2 integration into pgsql would be cool, but, I see no problem to > >> use tsearch2 as an official extension module. After completing our > >> todo, which we hope will likely happens for 8.2 release, you could > >> forget about Lucene and other engines :) We'll be available for developing > >> in spring and we estimate about three months for our todo, so, it's > >> really doable. > > > > Agreed. There isn't anything magical about a plug-in vs something > > integrated, as least in PostgreSQL. In other database, plug-ins can't > > fully function as integrated, but in PostgreSQL, everything is really a > > plug-in because it is all abstracted. > > > I only remember evaluating TSearch2 about a year ago, and when I read > statements like "Vacuum and/or database dump/restore work differently > when using TSearch2, sql scripts need to be executed etc." I knew that I > would not want to go there. > > But I don't doubt that it works, and that it is a sane concept. Good point. I think we had some problems at that point because the API was improved between versions. Even if it had been integrated, we might have had the same problem. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On 6 Dec 2005, at 16:47, Joshua Kramer wrote: > Has anyone ever compared TSearch2 to Lucene, as far as performance > is concerned? In our experience (small often-updated documents) Lucene leaves tsearch2 in the dust. This probably has a lot to do with our usage pattern though. For our usage it's very beneficial to have the index on a separate machine to the data, however in many cases this won't make sense. Lucene is also a lot easier to "cluster" than Postgres (it's simply a matter of NFS-mounting the index). Russ Garrett russ@last.fm
... So you'll avoid a non-core product and instead only use another non-core product...? Chris Michael Riess wrote: > >> Has anyone ever compared TSearch2 to Lucene, as far as performance is >> concerned? > > > I'll stay away from TSearch2 until it is fully integrated in the > postgres core (like "create index foo_text on foo (texta, textb) USING > TSearch2"). Because a full integration is unlikely to happen in the near > future (as far as I know), I'll stick to Lucene. > > Mike > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
No, my problem is that using TSearch2 interferes with other core components of postgres like (auto)vacuum or dump/restore. > ... > > So you'll avoid a non-core product and instead only use another non-core > product...? > > Chris > > Michael Riess wrote: >> >>> Has anyone ever compared TSearch2 to Lucene, as far as performance is >>> concerned? >> >> >> I'll stay away from TSearch2 until it is fully integrated in the >> postgres core (like "create index foo_text on foo (texta, textb) USING >> TSearch2"). Because a full integration is unlikely to happen in the >> near future (as far as I know), I'll stick to Lucene. >> >> Mike >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
> No, my problem is that using TSearch2 interferes with other core > components of postgres like (auto)vacuum or dump/restore. That's nonsense...seriously. The only trick with dump/restore is that you have to install the tsearch2 shared library before restoring. That's the same as all contribs though. Chris
Christopher Kings-Lynne schrieb: >> No, my problem is that using TSearch2 interferes with other core >> components of postgres like (auto)vacuum or dump/restore. > > That's nonsense...seriously. > > The only trick with dump/restore is that you have to install the > tsearch2 shared library before restoring. That's the same as all > contribs though. Well, then it changed since I last read the documentation. That was about a year ago, and since then we are using Lucene ... and as it works quite nicely, I see no reason to switch to TSearch2. Including it with the pgsql core would make it much more attractive to me, as it seems to me that once included into the core, features seem to be more stable. Call me paranoid, if you must ... ;-) > > Chris > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >