On Mon, Feb 20, 2012 at 10:18:31AM +0100, Marc Mamin wrote:
> > I looked into the complaint here of poor estimation for GIN
> indexscans:
> > http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php
> > At first glance it sounds like a mistake in selectivity estimation,
> > but it isn't: the rowcount estimates are pretty nearly dead on.
> > The problem is in the planner's estimate of the cost of executing the
> > @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but
> > actually it's a good deal more expensive than that. Some
> > experimentation suggests that @@ might be about 500 times as expensive
> > as a simple integer comparison. I don't propose pushing its procost
> > up that much, but surely at least 10 would be appropriate, maybe even
> > 100.
> >
> > However ... if you just alter pg_proc.procost in Marc's example, the
> > planner *still* picks a seqscan, even though its estimate of the
> > seqscan
> > cost surely does go up. The reason is that its estimate of the GIN
> > indexscan cost goes up just as much, since we charge one qual eval
> cost
> > per returned tuple in gincostestimate. It is easy to tell from the
> > actual runtimes that that is not what's happening in a GIN indexscan;
> > we are not re-executing the @@ operator for every tuple. But the
> > planner's cost model doesn't know that.
>
> Hello,
>
> many thanks for your feedback.
>
> I've repeated my test with a table using plain storage, which halved the
> query time.
> This confirms that detoasting is the major issue for cost estimation,
> but even with plain storage the table scan remains about 30% slower
> compared to the index scan.
>
Hi Marc,
Do you happen to know in which function, the extra time for the toast
storage is spent -- zlib compression? I saw a mention of the LZ4 compression
algorithm that is BSD licensed as a Google summer of code project:
http://code.google.com/p/lz4/
that compresses at almost 7X than zlib (-1) and decompresses at 6X.
Regards,
Ken