Thread: Full text search with ORDER BY performance issue
Hello, I'm having a bit of an issue with full text search (using tsvectors) on PostgreSQL 8.4. I have a rather large table (around 12M rows) and want to use full text search in it (just for one of the columns). Just doing a plainto_tsquery works reasonably fast (I have a GIN index on the column in question, "comment_tsv"), but it becomes unbearably slow when I want to make it order by another field ("timestamp"). Here's an example query: SELECT * FROM a WHERE comment_tsv @@ plainto_tsquery('love') ORDER BY timestamp DESC LIMIT 24 OFFSET 0; I asked in #postgresql and was told that there are two possible plans for this query; the first scans the BTREE timestamp index, gets the ordering and then filters out the rows using text search; the second finds all rows matching the text search using the GIN index and then sorts them according to that field -- this much I already knew, in fact, I had to drop the BTREE index on "timestamp" to prevent the planner from choosing the first, since the first plan is completely useless to me, considering the table is so huge (suggestions on how to prevent the planner from picking the "wrong" plan are also appreciated). Obviously, this gets really slow when I try to search for common words and full text search returns a lot of rows to be ordered. I tried to make a GIN index on ("timestamp", "comment_tsv"), (using btree_gin from contrib) but it doesn't really do anything -- I was told on IRC this is because GIN doesn't provide support for ORDER BY, only BTREE can do that. Here's a couple of queries: archive=> explain analyze select * from a where comment_tsv @@ plainto_tsquery('love') order by timestamp desc limit 24 offset 0; QUERY PLAN ---------- Limit (cost=453248.73..453248.79 rows=24 width=281) (actual time=188441.047..188441.148 rows=24 loops=1) -> Sort (cost=453248.73..453882.82 rows=253635 width=281) (actual time=188441.043..188441.079 rows=24 loops=1) Sort Key: "timestamp" Sort Method: top-N heapsort Memory: 42kB -> Bitmap Heap Scan on a (cost=17782.16..446166.02 rows=253635 width=281) (actual time=2198.930..187948.050 rows=256378 loops=1) Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text)) -> Bitmap Index Scan on timestamp_comment_gin (cost=0.00..17718.75 rows=253635 width=0) (actual time=2113.664..2113.664 rows=259828 loops=1) Index Cond: (comment_tsv @@ plainto_tsquery('love'::text)) Total runtime: 188442.617 ms (9 rows) archive=> explain analyze select * from a where comment_tsv @@ plainto_tsquery('love') limit 24 offset 0; QUERY PLAN ---------- Limit (cost=0.00..66.34 rows=24 width=281) (actual time=14.632..53.647 rows=24 loops=1) -> Seq Scan on a (cost=0.00..701071.49 rows=253635 width=281) (actual time=14.629..53.588 rows=24 loops=1) Filter: (comment_tsv @@ plainto_tsquery('love'::text)) Total runtime: 53.731 ms (4 rows) First one runs painfully slow. Is there really no way to have efficient full text search results ordered by a separate field? I'm really open to all possibilities, at this point. Thanks.
Krade, On Sat, 18 Jul 2009, Krade wrote: > Here's a couple of queries: > > archive=> explain analyze select * from a where comment_tsv @@ > plainto_tsquery('love') order by timestamp desc limit 24 offset 0; > > QUERY PLAN > ---------- > Limit (cost=453248.73..453248.79 rows=24 width=281) (actual > time=188441.047..188441.148 rows=24 loops=1) > -> Sort (cost=453248.73..453882.82 rows=253635 width=281) (actual > time=188441.043..188441.079 rows=24 loops=1) > Sort Key: "timestamp" > Sort Method: top-N heapsort Memory: 42kB > -> Bitmap Heap Scan on a (cost=17782.16..446166.02 rows=253635 > width=281) (actual time=2198.930..187948.050 rows=256378 loops=1) > Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text)) > -> Bitmap Index Scan on timestamp_comment_gin > (cost=0.00..17718.75 rows=253635 width=0) (actual time=2113.664..2113.664 > rows=259828 loops=1) > Index Cond: (comment_tsv @@ > plainto_tsquery('love'::text)) > Total runtime: 188442.617 ms > (9 rows) > > archive=> explain analyze select * from a where comment_tsv @@ > plainto_tsquery('love') limit 24 offset 0; > > QUERY PLAN > ---------- > Limit (cost=0.00..66.34 rows=24 width=281) (actual time=14.632..53.647 > rows=24 loops=1) > -> Seq Scan on a (cost=0.00..701071.49 rows=253635 width=281) (actual > time=14.629..53.588 rows=24 loops=1) > Filter: (comment_tsv @@ plainto_tsquery('love'::text)) > Total runtime: 53.731 ms > (4 rows) > > First one runs painfully slow. Hmm, everything is already written in explain :) In the first query 253635 rows should be readed from disk and sorted, while in the second query only 24 (random) rows readed from disk, so there is 4 magnitudes difference and in the worst case you should expected time for the 1st query about 53*10^4 ms. > > Is there really no way to have efficient full text search results ordered by > a separate field? I'm really open to all possibilities, at this point. > > Thanks. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Sun, Jul 19, 2009 at 12:07 AM, Krade<krade@krade.com> wrote: > archive=> explain analyze select * from a where comment_tsv @@ > plainto_tsquery('love') order by timestamp desc limit 24 offset 0; What happens if you make it: select * from ( select * from a where comment_tsv @@plainto_tsquery('love') ) xx order by xx.timestamp desc limit 24 offset 0; ?
Hello, thanks for your replies. On 7/20/2009 13:12, Oleg Bartunov wrote: > Hmm, everything is already written in explain :) In the first query > 253635 rows should be readed from disk and sorted, while in the > second query only 24 (random) rows readed from disk, so there is 4 > magnitudes > difference and in the worst case you should expected time for the 1st > query > about 53*10^4 ms. Yes, I do realize the first query is retrieving all the rows that match the full text search and sorting them, that's what I wanted to avoid. :) Since I only want 24 results at a time, I wanted to avoid having to get all the rows and sort them. I was wondering if there was any way to use, say, some index combination I'm not aware of, cluster the table according to an index or using a different query to get the same results. Well, to take advantage of the gin index on (timestamp, comment_tsv), I suppose could do something like this: archive=> explain analyze select * from a where comment_tsv @@ plainto_tsquery('love') and timestamp > cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer) order by timestamp limit 24 offset 0; QUERY PLAN ------------------ Limit (cost=17326.69..17326.75 rows=24 width=281) (actual time=3249.192..3249.287 rows=24 loops=1) -> Sort (cost=17326.69..17337.93 rows=4499 width=281) (actual time=3249.188..3249.221 rows=24 loops=1) Sort Key: "timestamp" Sort Method: top-N heapsort Memory: 39kB -> Bitmap Heap Scan on a (cost=408.80..17201.05 rows=4499 width=281) (actual time=3223.890..3240.484 rows=5525 loops=1) Recheck Cond: (("timestamp" > (floor((date_part('epoch'::text, now()) - 864000::double precision)))::integer) AND (comment_tsv @@ plainto_tsquery('love'::text))) -> Bitmap Index Scan on timestamp_comment_gin (cost=0.00..407.67 rows=4499 width=0) (actual time=3222.769..3222.769 rows=11242 loops=1) Index Cond: (("timestamp" > (floor((date_part('epoch'::text, now()) - 864000::double precision)))::integer) AND (comment_tsv @@ plainto_tsquery('love'::text))) Total runtime: 3249.957 ms (9 rows) Which only looks at the last 10 days and is considerably faster. Not perfect, but a lot better. But this requires a lot of application logic, for example, if I didn't get 24 results in the first query, I'd have to reissue the query with a larger time interval and it gets worse pretty fast. It strikes me as a really dumb thing to do. I'm really hitting a brick wall here, I can't seem to be able to provide reasonably efficient full text search that is ordered by date rather than random results from the database. On 7/20/2009 13:22, Marcin Stępnicki wrote: > What happens if you make it: > > > select * from ( > select * from a where comment_tsv @@plainto_tsquery('love') > ) xx > > order by xx.timestamp desc > limit 24 offset 0; > > ? Same query plan, I'm afraid.
Krade <krade@krade.com> wrote: > SELECT * FROM a WHERE comment_tsv @@ plainto_tsquery('love') > ORDER BY timestamp DESC LIMIT 24 OFFSET 0; Have you considered keeping rows "narrow" until you've identified your 24 rows? Something like: SELECT * FROM a WHERE id in ( SELECT id FROM a WHERE comment_tsv @@ plainto_tsquery('love') ORDER BY timestamp DESC LIMIT 24 OFFSET 0 ) ORDER BY timestamp DESC ; -Kevin
Hello, On 7/20/2009 22:42, Kevin Grittner wrote: > Have you considered keeping rows "narrow" until you've identified your > 24 rows? Something like: > > SELECT * FROM a > WHERE id in > ( > SELECT id FROM a > WHERE comment_tsv @@ plainto_tsquery('love') > ORDER BY timestamp DESC > LIMIT 24 OFFSET 0 > ) > ORDER BY timestamp DESC > ; > Good idea, but it doesn't really seem to do much. The query times are roughly the same.
Krade wrote: > SELECT * FROM a WHERE comment_tsv @@ plainto_tsquery('love') ORDER BY > timestamp DESC LIMIT 24 OFFSET 0; Have you tried make the full-text condition in a subselect with "offset 0" to stop the plan reordering? eg: select * from ( select * from a where comment_tsv @@ plainto_tsquery('love') offset 0 ) xx order by timestamp DESC limit 24 offset 0; See http://blog.endpoint.com/2009/04/offset-0-ftw.html -- -Devin
On 7/21/2009 2:13, Devin Ben-Hur wrote: > Have you tried make the full-text condition in a subselect with > "offset 0" to stop the plan reordering? > > eg: > > select * > from ( > select * from a where comment_tsv @@ plainto_tsquery('love') > offset 0 > ) xx > order by timestamp DESC > limit 24 > offset 0; > > > See http://blog.endpoint.com/2009/04/offset-0-ftw.html Yes, that does force the planner to always pick the full text index first rather than the timestamp index. I managed to force that by doing something a lot more drastic, I just dropped my timestamp index altogether, since I never used it for anything else. (I mentioned this in my original post) Though, that comment did make me try to readd it. I was pretty surprised, the planner was only doing backward searches on the timestamp index for very common words (therefore turning multi-minute queries into very fast ones), as opposed to trying to use the timestamp index for all queries. I wonder if this is related to tweaks to the planner in 8.4 or if it was just my statistics that got balanced out. I'm not entirely happy, because I still easily get minute long queries on common words, but where the planner choses to not use the timestamp index. The planner can't guess right all the time. But I think I might just do: select * from a where comment_tsv @@ plainto_tsquery('query') and timestamp > cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer) order by timestamp desc limit 24 offset 0; And if I get less than 24 rows, issue the regular query: select * from a where comment_tsv @@ plainto_tsquery('query') order by timestamp desc limit 24 offset 0; I pay the price of doing two queries when I could have done just one, and it does make almost all queries about 200 ms slower, but it does so at the expense of turning the few very slow queries into quick ones. Thanks for all the help.
On Mon, Jul 20, 2009 at 9:35 PM, Krade<krade@krade.com> wrote: > But I think I might just do: > select * from a where comment_tsv @@ plainto_tsquery('query') and timestamp >> cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer) > order by timestamp desc limit 24 offset 0; > > And if I get less than 24 rows, issue the regular query: > > select * from a where comment_tsv @@ plainto_tsquery('query') order by > timestamp desc limit 24 offset 0; Couldn't you do tge second query as a with query then run another query to limit that result to everything greater than now()-xdays ?
On Jul 21, 6:06 am, scott.marl...@gmail.com (Scott Marlowe) wrote: > On Mon, Jul 20, 2009 at 9:35 PM, Krade<kr...@krade.com> wrote: > > But I think I might just do: > > select * from a where comment_tsv @@ plainto_tsquery('query') and timestamp > >> cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer) > > order by timestamp desc limit 24 offset 0; > > > And if I get less than 24 rows, issue the regular query: > > > select * from a where comment_tsv @@ plainto_tsquery('query') order by > > timestamp desc limit 24 offset 0; > > Couldn't you do tge second query as a with query then run another > query to limit that result to everything greater than now()-xdays ? > > -- > Sent via pgsql-performance mailing list (pgsql-performa...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance Hi, There is a problem with GIN and GIST indexes, that they cannot be used by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it possible to use the b-tree columns in GIST or GIN to make the sort easier, but I have no idea how difficult it will be to implement it in current GIN or GIST structures. I think Oleg or even Tom will be the right people to ask it :) But even if it is possible it will not be implemented at least until 8.5 that will need a year to come, so until then... It is possible to strip your table in several smaller ones putting them on different machines and then splitting your query with DBLINK. This will distribute the burden of sorting to several machines that will have to sort smaller parts as well. After you have your 25 ids from each of the machines, you can merge them, sort again and limit as you wish. Doing large offsets will be still problematic but faster anyway in most reasonable offset ranges. (Load balancing tools like pg_pool can automate this task, but I do not have practical experience using them for that purposes) Yet another very interesting technology -- sphinx search (http:// www.sphinxsearch.com/). It can distribute data on several machines automatically, but it will be probably too expensive to start using (if your task is not your main one :)) as they do not have standard automation scripts, it does not support live updates (so you will always have some minutes delay), and this is a standalone service, that needs to be maintained and configured and synchronized with our main database separately (though you can use pg/python to access it from postgres). Good luck with your task :) -- Valentine Gogichashvili
On Tue, 21 Jul 2009, valgog wrote: > There is a problem with GIN and GIST indexes, that they cannot be used > by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it > possible to use the b-tree columns in GIST or GIN to make the sort > easier, but I have no idea how difficult it will be to implement it in > current GIN or GIST structures. I think Oleg or even Tom will be the > right people to ask it :) I can answer that one for GiST, having had a good look at GiST recently. There is simply no useful information about order in a GiST index for it to be used by an ORDER BY. The index structure is just too general, because it needs to cope with the situation where a particular object type does not have a well defined order, or where the "order" is unuseful for indexing. Matthew -- A good programmer is one who looks both ways before crossing a one-way street. Considering the quality and quantity of one-way streets in Cambridge, it should be no surprise that there are so many good programmers there.
On 7/21/2009 11:32, valgog wrote: > Hi, > > There is a problem with GIN and GIST indexes, that they cannot be used > by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it > possible to use the b-tree columns in GIST or GIN to make the sort > easier, but I have no idea how difficult it will be to implement it in > current GIN or GIST structures. I think Oleg or even Tom will be the > right people to ask it :) But even if it is possible it will not be > implemented at least until 8.5 that will need a year to come, so until > then... > Unfortunately, it's not even just the lack of ORDER BY support, btree_gin indexes seem to be broken under some circumstances. So I can't even use my idea to limit searches to the last 10 days. See this: http://pgsql.privatepaste.com/5219TutUMk The first query gives bogus results. It's not using the index correctly. timestamp_comment_gin is a GIN index on timestamp, comment_tsv. The timestamp column is an integer. The queries work right if I drop the index. Is this a bug in btree_gin? > It is possible to strip your table in several smaller ones putting > them on different machines and then splitting your query with DBLINK. > This will distribute the burden of sorting to several machines that > will have to sort smaller parts as well. After you have your 25 ids > from each of the machines, you can merge them, sort again and limit as > you wish. Doing large offsets will be still problematic but faster > anyway in most reasonable offset ranges. (Load balancing tools like > pg_pool can automate this task, but I do not have practical experience > using them for that purposes) > > Yet another very interesting technology -- sphinx search (http:// > www.sphinxsearch.com/). It can distribute data on several machines > automatically, but it will be probably too expensive to start using > (if your task is not your main one :)) as they do not have standard > automation scripts, it does not support live updates (so you will > always have some minutes delay), and this is a standalone service, > that needs to be maintained and configured and synchronized with our > main database separately (though you can use pg/python to access it > from postgres). > > Good luck with your task :) Yeah, I don't really have that sort of resources. This is a small hobby project (ie: no budget) that is growing a bit too large. I might just have to do text searches without time ordering. On 7/21/2009 5:06, Scott Marlowe wrote: > Couldn't you do tge second query as a with query then run another > query to limit that result to everything greater than now()-xdays ? > I suppose I could, but I have no way to do a fast query that does both a full text match and a < or > in the same WHERE due to the issue I described above, so my original plan won't work. A separate BTREE timestamp index obviously does nothing. And again, thank you for all the help.
On Tue, 21 Jul 2009, Krade wrote: > On 7/21/2009 11:32, valgog wrote: >> Hi, >> >> There is a problem with GIN and GIST indexes, that they cannot be used >> by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it >> possible to use the b-tree columns in GIST or GIN to make the sort >> easier, but I have no idea how difficult it will be to implement it in >> current GIN or GIST structures. I think Oleg or even Tom will be the >> right people to ask it :) But even if it is possible it will not be >> implemented at least until 8.5 that will need a year to come, so until >> then... >> > Unfortunately, it's not even just the lack of ORDER BY support, btree_gin > indexes seem to be broken under some circumstances. So I can't even use my > idea to limit searches to the last 10 days. > > See this: > http://pgsql.privatepaste.com/5219TutUMk > > The first query gives bogus results. It's not using the index correctly. > > timestamp_comment_gin is a GIN index on timestamp, comment_tsv. The timestamp > column is an integer. The queries work right if I drop the index. Is this a > bug in btree_gin? it'd be nice if you provide us data,so we can reproduce your problem >> It is possible to strip your table in several smaller ones putting >> them on different machines and then splitting your query with DBLINK. >> This will distribute the burden of sorting to several machines that >> will have to sort smaller parts as well. After you have your 25 ids >> from each of the machines, you can merge them, sort again and limit as >> you wish. Doing large offsets will be still problematic but faster >> anyway in most reasonable offset ranges. (Load balancing tools like >> pg_pool can automate this task, but I do not have practical experience >> using them for that purposes) >> >> Yet another very interesting technology -- sphinx search (http:// >> www.sphinxsearch.com/). It can distribute data on several machines >> automatically, but it will be probably too expensive to start using >> (if your task is not your main one :)) as they do not have standard >> automation scripts, it does not support live updates (so you will >> always have some minutes delay), and this is a standalone service, >> that needs to be maintained and configured and synchronized with our >> main database separately (though you can use pg/python to access it >> from postgres). >> >> Good luck with your task :) > Yeah, I don't really have that sort of resources. This is a small hobby > project (ie: no budget) that is growing a bit too large. I might just have to > do text searches without time ordering. > > On 7/21/2009 5:06, Scott Marlowe wrote: >> Couldn't you do tge second query as a with query then run another >> query to limit that result to everything greater than now()-xdays ? >> > I suppose I could, but I have no way to do a fast query that does both a full > text match and a < or > in the same WHERE due to the issue I described above, > so my original plan won't work. A separate BTREE timestamp index obviously > does nothing. > > And again, thank you for all the help. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Mon, Jul 20, 2009 at 8:12 AM, Oleg Bartunov<oleg@sai.msu.su> wrote: >> Here's a couple of queries: >> >> archive=> explain analyze select * from a where comment_tsv @@ >> plainto_tsquery('love') order by timestamp desc limit 24 offset 0; >> >> QUERY PLAN >> ---------- >> Limit (cost=453248.73..453248.79 rows=24 width=281) (actual >> time=188441.047..188441.148 rows=24 loops=1) >> -> Sort (cost=453248.73..453882.82 rows=253635 width=281) (actual >> time=188441.043..188441.079 rows=24 loops=1) >> Sort Key: "timestamp" >> Sort Method: top-N heapsort Memory: 42kB >> -> Bitmap Heap Scan on a (cost=17782.16..446166.02 rows=253635 >> width=281) (actual time=2198.930..187948.050 rows=256378 loops=1) >> Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text)) >> -> Bitmap Index Scan on timestamp_comment_gin >> (cost=0.00..17718.75 rows=253635 width=0) (actual time=2113.664..2113.664 >> rows=259828 loops=1) >> Index Cond: (comment_tsv @@ >> plainto_tsquery('love'::text)) >> Total runtime: 188442.617 ms >> (9 rows) >> >> archive=> explain analyze select * from a where comment_tsv @@ >> plainto_tsquery('love') limit 24 offset 0; >> >> QUERY PLAN >> ---------- >> Limit (cost=0.00..66.34 rows=24 width=281) (actual time=14.632..53.647 >> rows=24 loops=1) >> -> Seq Scan on a (cost=0.00..701071.49 rows=253635 width=281) (actual >> time=14.629..53.588 rows=24 loops=1) >> Filter: (comment_tsv @@ plainto_tsquery('love'::text)) >> Total runtime: 53.731 ms >> (4 rows) >> >> First one runs painfully slow. > > Hmm, everything is already written in explain :) In the first query 253635 > rows should be readed from disk and sorted, while in the > second query only 24 (random) rows readed from disk, so there is 4 > magnitudes > difference and in the worst case you should expected time for the 1st query > about 53*10^4 ms. If love is an uncommon word, there's no help for queries of this type being slow unless the GIN index can return the results in order. But if love is a common word, then it would be faster to do an index scan by timestamp on the baserel and then treat comment_tsv @@ plainto_tsquery('love') as a filter condition. Is this a selectivity estimation bug? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > If love is an uncommon word, there's no help for queries of this type > being slow unless the GIN index can return the results in order. But > if love is a common word, then it would be faster to do an index scan > by timestamp on the baserel and then treat comment_tsv @@ > plainto_tsquery('love') as a filter condition. Is this a selectivity > estimation bug? Doesn't look like it: estimated number of matches is 253635, actual is 259828, which is really astonishingly close considering what we have to work with. It's not clear though what fraction of the total that represents. regards, tom lane
> If love is an uncommon word, there's no help for queries of this type > being slow unless the GIN index can return the results in order. But > if love is a common word, then it would be faster to do an index scan > by timestamp on the baserel and then treat comment_tsv @@ > plainto_tsquery('love') as a filter condition. Is this a selectivity > estimation bug? If you have really lots of documents to index (this seems the case) perhaps you should consider Xapian. It is very easy to use (although, of course, tsearch integrated in Postgres is much easier since you have nothing to install), and it is *incredibly* fast. In my tests (2 years ago) with many gigabytes of stuff to search into, differences became obvious when the data set is much bigger than RAM. - Postgres' fulltext was 10-100x faster than MySQL fulltext on searches (lol) (and even a lot "more faster" on INSERTs...) - and Xapian was 10-100 times faster than Postgres' fulltext. (on a small table which fits in RAM, differences are small). Of course Xapian is not Postgres when you talk about update concurrency.......... (single writer => fulltext index updating background job is needed, a simple Python script does the job)
On Wed, Jul 29, 2009 at 10:22 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> If love is an uncommon word, there's no help for queries of this type >> being slow unless the GIN index can return the results in order. But >> if love is a common word, then it would be faster to do an index scan >> by timestamp on the baserel and then treat comment_tsv @@ >> plainto_tsquery('love') as a filter condition. Is this a selectivity >> estimation bug? > > Doesn't look like it: estimated number of matches is 253635, actual is > 259828, which is really astonishingly close considering what we have to > work with. It's not clear though what fraction of the total that > represents. Hmm, good point. It seems like it would be useful to force the planner into use the other plan and get EXPLAIN ANALYZE output for that for comparison purposes, but off the top of my head I don't know how to do that. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Hmm, good point. It seems like it would be useful to force the > planner into use the other plan and get EXPLAIN ANALYZE output for > that for comparison purposes, but off the top of my head I don't know > how to do that. The standard way is begin; drop index index_you_dont_want_used; explain problem-query; rollback; Ain't transactional DDL wonderful? (If this is a production system, you do have to worry about the DROP transiently locking the table; but if you put the above in a script rather than doing it by hand, it should be fast enough to not be a big problem.) regards, tom lane
On Wed, Jul 29, 2009 at 11:29 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Ain't transactional DDL wonderful? Yes. :-) ...Robert
I hate to be "that guy" but, Is this is still an issue 5 years later?? I can't seem to get Gin/btree to use my ORDER BY column with a LIMIT no matter what I try. My best idea was to cluster the database by the ORDER BY column and then just hope the index returns them in the order in the table... -- View this message in context: http://postgresql.1045698.n5.nabble.com/Full-text-search-with-ORDER-BY-performance-issue-tp2074171p5813083.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.