Thread: Comparison of Oracle and PostgreSQL full text search
Thanks to some very helpful input here in earlier threads, I was finally able to pull together a working prototype Full Text Search 'engine' on PostgreSQL and compare it directly to the way the production Oracle Text works. The good news is that PostgreSQL is bloody fast! The slightly iffy news is that the boss is now moaning about possible training costs! For what it's worth, I wrote up the performance comparison here: http://diznix.com/dizwell/archives/153 Maybe it will be of use to anyone else wondering if it's possible to do full text search and save a couple hundred thousand dollars whilst you're at it! Regards HJR
On Tue, Jul 27, 2010 at 7:58 PM, Howard Rogers <hjr@diznix.com> wrote: > Thanks to some very helpful input here in earlier threads, I was > finally able to pull together a working prototype Full Text Search > 'engine' on PostgreSQL and compare it directly to the way the > production Oracle Text works. The good news is that PostgreSQL is > bloody fast! The slightly iffy news is that the boss is now moaning > about possible training costs! Someone running Oracle is complaining about training costs? That seems a bit like complaining about needing to give the bellboy a $1 tip at a $1k a night hotel. > For what it's worth, I wrote up the performance comparison here: > http://diznix.com/dizwell/archives/153 Cool, I'll read up. Thanks.
Howard Rogers, 28.07.2010 03:58: > Thanks to some very helpful input here in earlier threads, I was > finally able to pull together a working prototype Full Text Search > 'engine' on PostgreSQL and compare it directly to the way the > production Oracle Text works. The good news is that PostgreSQL is > bloody fast! The slightly iffy news is that the boss is now moaning > about possible training costs! Why is it that managers always see short term savings but fail to see longterm expenses? > For what it's worth, I wrote up the performance comparison here: > http://diznix.com/dizwell/archives/153 > > Maybe it will be of use to anyone else wondering if it's possible to > do full text search and save a couple hundred thousand dollars whilst > you're at it! Very interesting reading. Would you mind sharing the tables, index structures and search queries that you used (both for Oracle and Postgres)? Regards Thomas
2010/7/28 Thomas Kellerer <spam_eater@gmx.net>: > Why is it that managers always see short term savings but fail to see > longterm expenses? It's all about CAPEX vs OPEX, baby! Besides jokes, it's actually myopia. Because they ALREADY spent money for training they don't see the need for extra training (and costs), as if people would remain there forever and knowledge is a definitive thing! THe point would be to put costs in a time perspective, that is, how much would it cost in, say, 5 years, with PG and the same for Oracle. -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
Howard, that was a great read! I especially like your sentence """ Considering that any search containing more than a half-dozen search terms is more like an essay than a realistic search; and considering that returning half a million matches is more a data dump than a sensible search facility,""" which really pulls some benchmark-perspectives back into real live. Thank you, Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 no fx, no carrier pigeon - Using PostgreSQL is mostly about sleeping well at night.
On Tue, Jul 27, 2010 at 9:58 PM, Howard Rogers <hjr@diznix.com> wrote: > Thanks to some very helpful input here in earlier threads, I was > finally able to pull together a working prototype Full Text Search > 'engine' on PostgreSQL and compare it directly to the way the > production Oracle Text works. The good news is that PostgreSQL is > bloody fast! The slightly iffy news is that the boss is now moaning > about possible training costs! > > For what it's worth, I wrote up the performance comparison here: > http://diznix.com/dizwell/archives/153 I always thought there is a clause in their user agreement preventing the users from publishing benchmarks like that. I must be mistaken.
zhong ming wu wrote: > > On Tue, Jul 27, 2010 at 9:58 PM, Howard Rogers <hjr@diznix.com> wrote: > > > For what it's worth, I wrote up the performance comparison here: > > http://diznix.com/dizwell/archives/153 > > I always thought there is a clause in their user agreement preventing > the users from publishing benchmarks like that. I must be mistaken. Perhaps not as I remember such issues a few years when the company I worked at profiled postgres against Oracle. Oracle doesn't want poorly-tuned systems being used as benchmarks. Or so they claim. Our tests -- very much oriented at postGIS found Oracle to be between 5 and 15% _faster_ depending on the specifics of the task. We decided to go with postgres given the price difference (several hundred thousand dollars for Oracle in the configuration we needed vs. zip for postgres -- we already had trained postgres DBAs). YMMV. Greg Williamson -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
zhong ming wu wrote: > I always thought there is a clause in their user agreement preventing > the users from publishing benchmarks like that. I must be mistaken. No you're correct. Currently, to download the current Oracle 11.2g, one must agree to: http://www.oracle.com/technetwork/licenses/standard-license-152015.html which contains: <quote> [...] You may not: [...] - disclose results of any program benchmark tests without our prior consent. [...] </quote> Not having such frustrating license terms is also what makes PostgreSQL a nicer alternative! Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
On Tue, 27 Jul 2010 23:24:12 -0600, Scott Marlowe <scott.marlowe@gmail.com> wrote: > > Someone running Oracle is complaining about training costs? That > seems a bit like complaining about needing to give the bellboy a $1 > tip at a $1k a night hotel. Depending on how they are running their licensing, (user/processor/standard/enterprise) Oracle can actually be reasonable in the sense of a commercial database. That said, PostgreSQL training is cheap. If you have enough people in your org to justify a on-site training, a 2 day Administration + Performance + Maintenance class is only 5k (remember on-site). Over time though, there is no question that hands down PostgreSQL will save you money. You can get an Enterprise class support contract for 500.00 a month per server. Sincerely, Joshua D. Drake -- PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Thomas Kellerer <spam_eater@gmx.net> writes: > Howard Rogers, 28.07.2010 03:58: >> For what it's worth, I wrote up the performance comparison here: >> http://diznix.com/dizwell/archives/153 > Very interesting reading. Indeed. > Would you mind sharing the tables, index structures and search queries that you used (both for Oracle and Postgres)? What I'd be interested in is EXPLAIN ANALYZE results. In particular, I wonder if the planner was switching from indexscan to seqscan plans for the cases where many rows would be returned, and if so whether it got the cutover point right. regards, tom lane
On Wed, Jul 28, 2010 at 8:38 PM, Daniel Verite <daniel@manitou-mail.org> wrote: > zhong ming wu wrote: > >> I always thought there is a clause in their user agreement preventing >> the users from publishing benchmarks like that. I must be mistaken. > > No you're correct. Currently, to download the current Oracle 11.2g, one must > agree to: > http://www.oracle.com/technetwork/licenses/standard-license-152015.html > > which contains: > <quote> > [...] > You may not: > [...] > - disclose results of any program benchmark tests without our prior consent. > [...] > </quote> > > Not having such frustrating license terms is also what makes PostgreSQL a > nicer alternative! > > Best regards, > -- > Daniel > PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Hi Daniel: Nice catch -the thing is, you've linked to the *technet* license. The one you sign up to when you download the product for free, for development, prototyping and self-learning purposes. That's not the same license as the one you sign up to when you pay them stacks of cash for the 'proper' product for a production deployment (which I haven't read lately, so I can't say the same silly term isn't in there, but I'm just saying: the license you linked to is not the one that applies). Also, I would argue that what I did was not a 'benchmark test'. We capture the results and timings of queries as part of our production application, for management and review purposes. Those are real results, experienced by real users... not what I'd call a benchmark "test". (The PostgreSQL results are, certainly, an artificial benchmark, but then the Oracle license doesn't cover those, happily!) Regards HJR
On 28/07/10 02:58, Howard Rogers wrote: > For what it's worth, I wrote up the performance comparison here: > http://diznix.com/dizwell/archives/153 Thanks very much Howard. It might be my schoolboy-physics ability to fit a curve to two data points, but does anyone else think that the second and third graphs look like a sinusoidal variation overlaid on a steadily increasing baseline? -- Richard Huxton Archonet Ltd
Greg Williamson wrote: > Our tests -- very much oriented at postGIS found Oracle to be between 5 > and 15% _faster_ depending on the specifics of the task. We decided to go > with postgres given the price difference (several hundred thousand dollars for > Oracle in the configuration we needed vs. zip for postgres -- we already had > trained postgres DBAs). Interesting. Do you have information about the versions of PostGIS/GEOS that you were using? Of course we'd be very interested to see examples of test cases with bad performance on the postgis-users list so that we can improve them. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs
On 28 July 2010 02:58, Howard Rogers <hjr@diznix.com> wrote: > For what it's worth, I wrote up the performance comparison here: > http://diznix.com/dizwell/archives/153 > Thanks, very interesting results. I wonder, are the results being sorted by the database? The performance degradation for large numbers of results might be explained by it switching over from an internal to an external sort, in which case tweaking work_mem might make a difference. Of course this is pure speculation without the EXPLAIN ANALYSE output. Regards, Dean
Greg Williamson wrote: > Our tests -- very much oriented at postGIS found Oracle to be between 5 > and 15% _faster_ depending on the specifics of the task. We decided to go > with postgres given the price difference (several hundred thousand dollars for > Oracle in the configuration we needed vs. zip for postgres -- we already had > trained postgres DBAs). > Can always throw the licensing savings toward larger hardware too; $100K buys a pretty big server nowadays. At the FAA's talk about their internal deployment of PostgreSQL: https://www.postgresqlconference.org/2010/east/talks/faa_airports_gis_and_postgresql They were reporting that some of their difficult queries were dramatically faster on PostgreSQL; I vaguely recall one of them was 100X the speed it ran under Oracle Spatial. It was crazy. As always this sort of thing is very workload dependent. There are certainly queries (such as some of the ones from the TPC-H that big DB vendors optimize for) that can be 100X faster on Oracle too. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Thu, Jul 29, 2010 at 5:42 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Greg Williamson wrote: >> >> Our tests -- very much oriented at postGIS found Oracle to be between 5 >> and 15% _faster_ depending on the specifics of the task. We decided to go >> with postgres given the price difference (several hundred thousand dollars >> for >> Oracle in the configuration we needed vs. zip for postgres -- we already >> had >> trained postgres DBAs). >> > > Can always throw the licensing savings toward larger hardware too; $100K > buys a pretty big server nowadays. Hear hear! You can get a quad x 12 core (48 cores total) server with 128G ram and 32 15k6 hard drives for well under $25k nowadays. For $50k or so you can throw 100 hard drives at the problem.
On Thu, Jul 29, 2010 at 10:33 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 28 July 2010 02:58, Howard Rogers <hjr@diznix.com> wrote: >> For what it's worth, I wrote up the performance comparison here: >> http://diznix.com/dizwell/archives/153 >> > > Thanks, very interesting results. I wonder, are the results being > sorted by the database? The performance degradation for large numbers > of results might be explained by it switching over from an internal to > an external sort, in which case tweaking work_mem might make a > difference. > > Of course this is pure speculation without the EXPLAIN ANALYSE output. > > Regards, > Dean Yes, the results were being sorted. I did various tests, changing work_mem, shared_buffers and much else, one by one, until I arrived at the combination of settings that gave me the best 'total search time' results. Personally, I couldn't see any difference in the explain plans, but I was in a bit of a hurry and I may have missed it. For the search term 'woman', which matches 1,590,275 documents, here's the explain plan: "Sort (cost=185372.88..185372.93 rows=20 width=312) (actual time=10537.152..10537.154 rows=20 loops=1)" " Sort Key: a.rf, a.sort_id" " Sort Method: quicksort Memory: 48kB" " -> Result (cost=109119.55..185372.45 rows=20 width=312) (actual time=4309.020..10537.116 rows=20 loops=1)" " -> Append (cost=109119.55..185372.45 rows=20 width=312) (actual time=4309.018..10537.108 rows=20 loops=1)" " -> Subquery Scan a (cost=109119.55..109119.68 rows=10 width=312) (actual time=4309.018..4309.026 rows=10 loops=1)" " -> Limit (cost=109119.55..109119.58 rows=10 width=641) (actual time=4309.016..4309.019 rows=10 loops=1)" " -> Sort (cost=109119.55..109121.94 rows=957 width=641) (actual time=4309.014..4309.015 rows=10 loops=1)" " Sort Key: search_rm.sort_id" " Sort Method: top-N heapsort Memory: 35kB" " -> Bitmap Heap Scan on search_rm (cost=6651.07..109098.87 rows=957 width=641) (actual time=272.851..4021.458 rows=583275 loops=1)" " Recheck Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''woman'''::tsquery)" " Filter: ((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND (bitand(filetype, 1) > 0))" " -> Bitmap Index Scan on rmsearch_idx (cost=0.00..6650.83 rows=25826 width=0) (actual time=165.711..165.711 rows=586235 loops=1)" " Index Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''woman'''::tsquery)" " -> Subquery Scan b (cost=76252.65..76252.77 rows=10 width=312) (actual time=6228.073..6228.080 rows=10 loops=1)" " -> Limit (cost=76252.65..76252.67 rows=10 width=727) (actual time=6228.072..6228.075 rows=10 loops=1)" " -> Sort (cost=76252.65..76254.29 rows=655 width=727) (actual time=6228.071..6228.072 rows=10 loops=1)" " Sort Key: search_rf.sort_id" " Sort Method: top-N heapsort Memory: 38kB" " -> Bitmap Heap Scan on search_rf (cost=5175.18..76238.49 rows=655 width=727) (actual time=363.684..5748.279 rows=1007000 loops=1)" " Recheck Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''woman'''::tsquery)" " Filter: ((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND (bitand(filetype, 1) > 0))" " -> Bitmap Index Scan on rfsearch_idx (cost=0.00..5175.02 rows=17694 width=0) (actual time=242.859..242.859 rows=1030282 loops=1)" " Index Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''woman'''::tsquery)" "Total runtime: 10538.832 ms" And here's the plan for the search term "clover", which matches only 2,808 records in total: " -> Result (cost=109119.55..185372.45 rows=20 width=312) (actual time=16.807..23.990 rows=20 loops=1)" " -> Append (cost=109119.55..185372.45 rows=20 width=312) (actual time=16.806..23.985 rows=20 loops=1)" " -> Subquery Scan a (cost=109119.55..109119.68 rows=10 width=312) (actual time=16.806..16.812 rows=10 loops=1)" " -> Limit (cost=109119.55..109119.58 rows=10 width=641) (actual time=16.805..16.807 rows=10 loops=1)" " -> Sort (cost=109119.55..109121.94 rows=957 width=641) (actual time=16.804..16.805 rows=10 loops=1)" " Sort Key: search_rm.sort_id" " Sort Method: top-N heapsort Memory: 35kB" " -> Bitmap Heap Scan on search_rm (cost=6651.07..109098.87 rows=957 width=641) (actual time=1.054..15.577 rows=1807 loops=1)" " Recheck Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''clover'''::tsquery)" " Filter: ((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND (bitand(filetype, 1) > 0))" " -> Bitmap Index Scan on rmsearch_idx (cost=0.00..6650.83 rows=25826 width=0) (actual time=0.615..0.615 rows=1807 loops=1)" " Index Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''clover'''::tsquery)" " -> Subquery Scan b (cost=76252.65..76252.77 rows=10 width=312) (actual time=7.161..7.166 rows=10 loops=1)" " -> Limit (cost=76252.65..76252.67 rows=10 width=727) (actual time=7.161..7.163 rows=10 loops=1)" " -> Sort (cost=76252.65..76254.29 rows=655 width=727) (actual time=7.160..7.161 rows=10 loops=1)" " Sort Key: search_rf.sort_id" " Sort Method: top-N heapsort Memory: 35kB" " -> Bitmap Heap Scan on search_rf (cost=5175.18..76238.49 rows=655 width=727) (actual time=0.433..6.642 rows=1001 loops=1)" " Recheck Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''clover'''::tsquery)" " Filter: ((bitand(sales_method_code, 1) > 0) AND (bitand(subsiter, 1) > 0) AND (bitand(filetype, 1) > 0))" " -> Bitmap Index Scan on rfsearch_idx (cost=0.00..5175.02 rows=17694 width=0) (actual time=0.250..0.250 rows=1045 loops=1)" " Index Cond: (to_tsvector('english'::regconfig, (textsearch)::text) @@ '''clover'''::tsquery)" "Total runtime: 24.143 ms" I can't see any change to the sorting behaviour there. Work_mem was set to 4096MB, shared buffers to 12228MB, temp_buffers to 1024MB, effective_cache_size to 18442MB. Sadly, I won't be able to provide much further analysis or information, because the box concerned is being wiped. The MD decided that, as a matter of corporate governance, he couldn't punt the company on PostgreSQL, so my experimenting days are over. Back to Oracle: slower, but with a support contract he can sue on, I guess! Regards HJR
On 30 July 2010 00:38, Howard Rogers <hjr@diznix.com> wrote: > I can't see any change to the sorting behaviour there. Work_mem was > set to 4096MB, shared buffers to 12228MB, temp_buffers to 1024MB, > effective_cache_size to 18442MB. > Ah yes. The sorting idea was a complete red herring. The top-N heapsort to pick the 10 best results will never use much memory. It looks like it all boils down to the sheer number of matches against the search term that have to be considered in the first case. Others on this list might have better ideas as to whether this can be improved upon. > Sadly, I won't be able to provide much further analysis or > information, because the box concerned is being wiped. The MD decided > that, as a matter of corporate governance, he couldn't punt the > company on PostgreSQL, so my experimenting days are over. Back to > Oracle: slower, but with a support contract he can sue on, I guess! > Yeah, I've been there too. Thanks and good luck, Dean > Regards > HJR >
Howard Rogers <hjr@diznix.com> writes: > Sadly, I won't be able to provide much further analysis or > information, because the box concerned is being wiped. The MD decided > that, as a matter of corporate governance, he couldn't punt the > company on PostgreSQL, so my experimenting days are over. Back to > Oracle: slower, but with a support contract he can sue on, I guess! Too bad. I'm sure EnterpriseDB or one of the other PG support companies would be happy to sell you a support contract, if having somebody to sue is an essential part of happiness. regards, tom lane
On Thu, Jul 29, 2010 at 8:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > $50k or so you can throw 100 hard drives at the problem. Or even one of these: http://www.ramsan.com/products/ramsan-620.asp :-)
Tom Lane wrote: > I'm sure EnterpriseDB or one of the other PG support companies > would be happy to sell you a support contract, if having somebody to sue > is an essential part of happiness. > And on a good day, access to someone with the source code who will actually be motivated to fix your problem, so you don't even have to sue them! Such a novel concept. I find the idea that anyone is going to sue Oracle for breach of contract and actually get anything out of it beyond a refund kind of hilarious myself. Sure, your lawyers have done a better job of protecting you than Larry Ellison's...good luck with that. (Hint: Oracle is currently fending a lawsuit from the US Department of Justice) -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > Greg Williamson wrote: > > Our tests -- very much oriented at postGIS found Oracle to be between 5 > > and 15% _faster_ depending on the specifics of the task. We decided to go > > with postgres given the price difference (several hundred thousand dollars for > > Oracle in the configuration we needed vs. zip for postgres -- we already had > > trained postgres DBAs). > > > > Can always throw the licensing savings toward larger hardware too; $100K > buys a pretty big server nowadays. At the FAA's talk about their > internal deployment of PostgreSQL: > https://www.postgresqlconference.org/2010/east/talks/faa_airports_gis_and_postgresql > > They were reporting that some of their difficult queries were > dramatically faster on PostgreSQL; I vaguely recall one of them was 100X > the speed it ran under Oracle Spatial. It was crazy. As always this > sort of thing is very workload dependent. There are certainly queries > (such as some of the ones from the TPC-H that big DB vendors optimize > for) that can be 100X faster on Oracle too. The FAA reported something like that at PG East about Oracle vs. Postgres performance with GIS data. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Greg Smith wrote: > Tom Lane wrote: > > I'm sure EnterpriseDB or one of the other PG support companies > > would be happy to sell you a support contract, if having somebody to sue > > is an essential part of happiness. > > > > And on a good day, access to someone with the source code who will > actually be motivated to fix your problem, so you don't even have to sue > them! Such a novel concept. I find the idea that anyone is going to > sue Oracle for breach of contract and actually get anything out of it > beyond a refund kind of hilarious myself. Sure, your lawyers have done > a better job of protecting you than Larry Ellison's...good luck with > that. (Hint: Oracle is currently fending a lawsuit from the US > Department of Justice) Companies that make unwise decisions like this eventually will be overtaken by smarter companies. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Did the FAA ever publish slides of those talks? Sure wish I could see them... :) P. On 2010-08-11, at 6:58 PM, Bruce Momjian <bruce@momjian.us> wrote: > Greg Smith wrote: >> Greg Williamson wrote: >>> Our tests -- very much oriented at postGIS found Oracle to be between 5 >>> and 15% _faster_ depending on the specifics of the task. We decided to go >>> with postgres given the price difference (several hundred thousand dollars for >>> Oracle in the configuration we needed vs. zip for postgres -- we already had >>> trained postgres DBAs). >>> >> >> Can always throw the licensing savings toward larger hardware too; $100K >> buys a pretty big server nowadays. At the FAA's talk about their >> internal deployment of PostgreSQL: >> https://www.postgresqlconference.org/2010/east/talks/faa_airports_gis_and_postgresql >> >> They were reporting that some of their difficult queries were >> dramatically faster on PostgreSQL; I vaguely recall one of them was 100X >> the speed it ran under Oracle Spatial. It was crazy. As always this >> sort of thing is very workload dependent. There are certainly queries >> (such as some of the ones from the TPC-H that big DB vendors optimize >> for) that can be 100X faster on Oracle too. > > The FAA reported something like that at PG East about Oracle vs. > Postgres performance with GIS data. > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general