Re: Full text search with ORDER BY performance issue - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: Full text search with ORDER BY performance issue |
Date | |
Msg-id | Pine.LNX.4.64.0907212326060.8065@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Full text search with ORDER BY performance issue (Krade <krade@krade.com>) |
List | pgsql-performance |
On Tue, 21 Jul 2009, Krade wrote: > On 7/21/2009 11:32, valgog wrote: >> Hi, >> >> There is a problem with GIN and GIST indexes, that they cannot be used >> by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it >> possible to use the b-tree columns in GIST or GIN to make the sort >> easier, but I have no idea how difficult it will be to implement it in >> current GIN or GIST structures. I think Oleg or even Tom will be the >> right people to ask it :) But even if it is possible it will not be >> implemented at least until 8.5 that will need a year to come, so until >> then... >> > Unfortunately, it's not even just the lack of ORDER BY support, btree_gin > indexes seem to be broken under some circumstances. So I can't even use my > idea to limit searches to the last 10 days. > > See this: > http://pgsql.privatepaste.com/5219TutUMk > > The first query gives bogus results. It's not using the index correctly. > > timestamp_comment_gin is a GIN index on timestamp, comment_tsv. The timestamp > column is an integer. The queries work right if I drop the index. Is this a > bug in btree_gin? it'd be nice if you provide us data,so we can reproduce your problem >> It is possible to strip your table in several smaller ones putting >> them on different machines and then splitting your query with DBLINK. >> This will distribute the burden of sorting to several machines that >> will have to sort smaller parts as well. After you have your 25 ids >> from each of the machines, you can merge them, sort again and limit as >> you wish. Doing large offsets will be still problematic but faster >> anyway in most reasonable offset ranges. (Load balancing tools like >> pg_pool can automate this task, but I do not have practical experience >> using them for that purposes) >> >> Yet another very interesting technology -- sphinx search (http:// >> www.sphinxsearch.com/). It can distribute data on several machines >> automatically, but it will be probably too expensive to start using >> (if your task is not your main one :)) as they do not have standard >> automation scripts, it does not support live updates (so you will >> always have some minutes delay), and this is a standalone service, >> that needs to be maintained and configured and synchronized with our >> main database separately (though you can use pg/python to access it >> from postgres). >> >> Good luck with your task :) > Yeah, I don't really have that sort of resources. This is a small hobby > project (ie: no budget) that is growing a bit too large. I might just have to > do text searches without time ordering. > > On 7/21/2009 5:06, Scott Marlowe wrote: >> Couldn't you do tge second query as a with query then run another >> query to limit that result to everything greater than now()-xdays ? >> > I suppose I could, but I have no way to do a fast query that does both a full > text match and a < or > in the same WHERE due to the issue I described above, > so my original plan won't work. A separate BTREE timestamp index obviously > does nothing. > > And again, thank you for all the help. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-performance by date: