Re: Qual evaluation cost estimates for GIN indexes - Mailing list pgsql-hackers

From ktm@rice.edu
Subject Re: Qual evaluation cost estimates for GIN indexes
Date
Msg-id 20120220142411.GA9564@aart.rice.edu
Whole thread Raw
In response to Re: Qual evaluation cost estimates for GIN indexes  ("Marc Mamin" <M.Mamin@intershop.de>)
Responses Re: Qual evaluation cost estimates for GIN indexes
List pgsql-hackers
On Mon, Feb 20, 2012 at 10:18:31AM +0100, Marc Mamin wrote:
> > I looked into the complaint here of poor estimation for GIN
> indexscans:
> > http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php
> > At first glance it sounds like a mistake in selectivity estimation,
> > but it isn't: the rowcount estimates are pretty nearly dead on.
> > The problem is in the planner's estimate of the cost of executing the
> > @@ operator.  We have pg_proc.procost set to 1 for ts_match_vq, but
> > actually it's a good deal more expensive than that.  Some
> > experimentation suggests that @@ might be about 500 times as expensive
> > as a simple integer comparison.  I don't propose pushing its procost
> > up that much, but surely at least 10 would be appropriate, maybe even
> > 100.
> > 
> > However ... if you just alter pg_proc.procost in Marc's example, the
> > planner *still* picks a seqscan, even though its estimate of the
> > seqscan
> > cost surely does go up.  The reason is that its estimate of the GIN
> > indexscan cost goes up just as much, since we charge one qual eval
> cost
> > per returned tuple in gincostestimate.  It is easy to tell from the
> > actual runtimes that that is not what's happening in a GIN indexscan;
> > we are not re-executing the @@ operator for every tuple.  But the
> > planner's cost model doesn't know that.
> 
> Hello,
> 
> many thanks for your feedback.
> 
> I've repeated my test with a table using plain storage, which halved the
> query time.
> This confirms that detoasting is the major issue for cost estimation, 
> but even with plain storage the table scan remains about 30% slower
> compared to the index scan.
> 

Hi Marc,

Do you happen to know in which function, the extra time for the toast
storage is spent -- zlib compression? I saw a mention of the LZ4 compression
algorithm that is BSD licensed as a Google summer of code project:

http://code.google.com/p/lz4/

that compresses at almost 7X than zlib (-1) and decompresses at 6X.

Regards,
Ken


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: array_to_json re-encodes ARRAY of json type
Next
From: "Marc Mamin"
Date:
Subject: Re: Qual evaluation cost estimates for GIN indexes