Re: Qual evaluation cost estimates for GIN indexes - Mailing list pgsql-hackers
From | Jesper Krogh |
---|---|
Subject | Re: Qual evaluation cost estimates for GIN indexes |
Date | |
Msg-id | 4F3DEF2B.6040407@krogh.cc Whole thread Raw |
In response to | Qual evaluation cost estimates for GIN indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi. First, thanks for looking at this. Except from GIN indexes and full-text-search being really good in our applications, this also points to those excact places where it can be improved. On 2012-02-17 00:15, Tom Lane wrote: > I looked into the complaint here of poor estimation for GIN indexscans: > http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php I think this is the excact same issue: http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php > At first glance it sounds like a mistake in selectivity estimation, > but it isn't: the rowcount estimates are pretty nearly dead on. > The problem is in the planner's estimate of the cost of executing the > @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but > actually it's a good deal more expensive than that. Some > experimentation suggests that @@ might be about 500 times as expensive > as a simple integer comparison. I don't propose pushing its procost > up that much, but surely at least 10 would be appropriate, maybe even > 100. > > However ... if you just alter pg_proc.procost in Marc's example, the > planner *still* picks a seqscan, even though its estimate of the seqscan > cost surely does go up. The reason is that its estimate of the GIN > indexscan cost goes up just as much, since we charge one qual eval cost > per returned tuple in gincostestimate. It is easy to tell from the > actual runtimes that that is not what's happening in a GIN indexscan; > we are not re-executing the @@ operator for every tuple. But the > planner's cost model doesn't know that. There is something about lossy vs. non-lossy, if the index-result is lossy, then it would "need" to execute the @@ operator on each tuple and de-toast the toasted stuff and go all the way. If it isn't then at least count() on a gin-index should be able to utillize an index-only scan now? I've had a significant amout of struggle over the years in this corner and the patch that went in for gincostestimate brought a huge set of problems to the ground, but not all. Other related threads: http://archives.postgresql.org/pgsql-performance/2010-05/msg00031.php (ts_match_vq cost in discussion) http://archives.postgresql.org/pgsql-performance/2010-05/msg00266.php I dont think I have ever seen the actual run-time of any @@ query to be faster going through the seq-scan than going through the index. Not even if it is pulling near all the tuples out. (test-case that tries to go in that corner). http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php And I think is it due to a coulple of "real-world" things: 1) The tsvector-column is typically toasted. 2) The selected columns are typically in the main table. 3) The gin-index search + pulling main table is in fact a measuable cheaper operation than pulling main+toast uncompressingtoast and applying ts_match_vq even in the most favourable case for the seqscan. Another real-world thing is that since the tsvector column is in toast and isn't read when performing a bitmap-heap-scan, in addition to the decompress-cost is it almost never hot in memory either, causing its actuall runtime to be even worse. Same problems hit a index-scan on another key where filtering on a @@ operator, but I think I got around most of them by bumping both cost of @@ and limit in the query to 10K instead of the 200 actually wanted. I do think I have been digging sufficiently in this corner and can fairly easy test and craft test-examples that will demonstrate the challenges. (a few is attached in above links). Thanks for digging in this corner. Let me know if i can help, allthough my actual coding skills are spare (at best). -- Jesper
pgsql-hackers by date: