Home > mailing lists

Re: Qual evaluation cost estimates for GIN indexes - Mailing list pgsql-hackers

From	Jesper Krogh
Subject	Re: Qual evaluation cost estimates for GIN indexes
Date	February 17, 2012 05:10:48
Msg-id	4F3DEF2B.6040407@krogh.cc Whole thread Raw
In response to	Qual evaluation cost estimates for GIN indexes (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Hi.

First, thanks for looking at this. Except from GIN indexes and
full-text-search being really good in our applications, this also
points to those excact places where it can be improved.

On 2012-02-17 00:15, Tom Lane wrote:
> I looked into the complaint here of poor estimation for GIN indexscans:
> http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php

I think this is the excact same issue:
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php

> At first glance it sounds like a mistake in selectivity estimation,
> but it isn't: the rowcount estimates are pretty nearly dead on.
> The problem is in the planner's estimate of the cost of executing the
> @@ operator.  We have pg_proc.procost set to 1 for ts_match_vq, but
> actually it's a good deal more expensive than that.  Some
> experimentation suggests that @@ might be about 500 times as expensive
> as a simple integer comparison.  I don't propose pushing its procost
> up that much, but surely at least 10 would be appropriate, maybe even
> 100.
>
> However ... if you just alter pg_proc.procost in Marc's example, the
> planner *still* picks a seqscan, even though its estimate of the seqscan
> cost surely does go up.  The reason is that its estimate of the GIN
> indexscan cost goes up just as much, since we charge one qual eval cost
> per returned tuple in gincostestimate.  It is easy to tell from the
> actual runtimes that that is not what's happening in a GIN indexscan;
> we are not re-executing the @@ operator for every tuple.  But the
> planner's cost model doesn't know that.

There is something about lossy vs. non-lossy, if the index-result
is lossy, then it would "need" to execute the @@ operator
on each tuple and de-toast the toasted stuff and go all the way.

If it isn't then at  least count() on a gin-index should be able to
utillize an index-only scan now?

I've had a significant amout of struggle over the years in this
corner and the patch that went in for gincostestimate brought
a huge set of problems to the ground, but not all.

Other related threads:
http://archives.postgresql.org/pgsql-performance/2010-05/msg00031.php
(ts_match_vq cost in discussion)
http://archives.postgresql.org/pgsql-performance/2010-05/msg00266.php

I dont think I have ever seen the actual run-time of any @@ query
to be faster going through the seq-scan than going through the index. Not
even if it is pulling near all the tuples out.

(test-case that tries to go in that corner).
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php

And I think is it due to a coulple of "real-world" things:
1) The tsvector-column is typically toasted.
2) The selected columns are typically in the main table.
3) The gin-index search + pulling main table is in    fact a measuable cheaper operation than pulling main+toast
uncompressingtoast and applying ts_match_vq even in the most   favourable case for the seqscan.

Another real-world thing is that since the tsvector column is in toast
and isn't read when performing a bitmap-heap-scan, in addition
to the decompress-cost is it almost never hot in memory either,
causing its actuall runtime to be even worse.

Same problems hit a index-scan on another key where filtering
on a @@ operator, but I think I got around most of them by bumping
both cost of @@ and limit in the query to 10K instead of the 200 actually
wanted.

I do think I have been digging sufficiently in this corner and can
fairly easy test and craft test-examples that will demonstrate
the challenges. (a few is attached in above links).

Thanks for digging in this corner. Let me know if i can help, allthough
my actual coding skills are spare (at best).

-- 
Jesper

pgsql-hackers by date:

From: premanand
Date: 17 February 2012, 04:33:59
Subject: MySQL search query is not executing in Postgres DB

From: Heikki Linnakangas
Date: 17 February 2012, 06:46:07
Subject: Re: MySQL search query is not executing in Postgres DB

Re: Qual evaluation cost estimates for GIN indexes - Mailing list pgsql-hackers

Previous

Next