Re: Index AM change proposals, redux - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: Index AM change proposals, redux
Date
Msg-id Pine.LNX.4.64.0804112228380.21547@sn.sai.msu.ru
Whole thread Raw
In response to Re: Index AM change proposals, redux  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: Index AM change proposals, redux  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Slightly offtopic. How to get benefit on tuple level ? For example,
we mark GiST tsearch index as lossy, while for not very big documents it's
actually exact and we could save a lot not rechecking them.

Oleg

On Fri, 11 Apr 2008, Teodor Sigaev wrote:

>> Teodor, do you have any thoughts about exactly how you'd fix @@@ ?
>> I suppose that the recheck-need is not really a property of specific
>> tuples, but of a particular query, for that case.  Where would you
>> want to detect that?
>
> tsquery may include restriction by weight of search terms: 'sea & port:A'. 
> GIN index doesn't store information about weights, so the only difference 
> between @@ and @@@ is that @@@ is marked with RECHECK flag. I think, the 
> better way is set flag about required recheck by looking value from index, 
> not for tsquery. It gives to us more flexibility.
>
> So, I planned to add pointer to bool to consistent method, so signature will 
> be
> bool consistent( bool check[], StrategyNumber n, Datum query, bool 
> *needRecheck)
>
> Returning value of needRecheck should be ignored for operation not marked by 
> RECHECK flag in opclass. needRecheck should be initialized to true before 
> call of consistent method to keep compatibility with old opclasses.
>
> To define, is recheck needed or not, the better way is to check actually 
> needed values. For example, let tsquery is equal to
> 'foo | bar | qq:A' and tsvetor = 'foo:1,2,3 asdasdasd:4'. Obviously recheck 
> is not needed. So patch is close to trivial:
>
> *** tsginidx.c.orig     2008-04-11 17:08:37.000000000 +0400
> --- tsginidx.c  2008-04-11 17:18:45.000000000 +0400
> ***************
> *** 109,114 ****
> --- 109,115 ----
>  {
>        QueryItem  *frst;
>        bool       *mapped_check;
> +       bool       *needRecheck;
>  } GinChkVal;
>
>  static bool
> ***************
> *** 116,121 ****
> --- 117,125 ----
>  {
>        GinChkVal  *gcv = (GinChkVal *) checkval;
>
> +       if ( val->weight )
> +               *(gcv->needRecheck) = true;
> +
>        return gcv->mapped_check[((QueryItem *) val) - gcv->frst];
>  }
>
> ***************
> *** 144,149 ****
> --- 148,155 ----
>
>                gcv.frst = item = GETQUERY(query);
>                gcv.mapped_check = (bool *) palloc(sizeof(bool) * 
> query->size);
> +               gcv.needRecheck = PG_GETARG_POINTER(3);
> +               *(gcv.needRecheck) = false;
>
>                for (i = 0; i < query->size; i++)
>                        if (item[i].type == QI_VAL)
>
>
>
>
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Patch to add objetct size on "\d+" verbose output
Next
From: Tom Lane
Date:
Subject: Commit fest status