Re: tsearch refactorings - Mailing list pgsql-patches

From Teodor Sigaev
Subject Re: tsearch refactorings
Date
Msg-id 46DEE3C9.8060805@sigaev.ru
Whole thread Raw
In response to Re: tsearch refactorings  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Responses Re: tsearch refactorings  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-patches

Heikki Linnakangas wrote:
> Teodor Sigaev wrote:
>>> Ok. Probably easiest to do that by changing the palloc to palloc0 in
>>> parse_tsquery.
>> and change sizeof to sizeof(QueryItem)
>
> Do you mean the sizeofs in the memcpys in parse_tsquery? You can't

Oops, I meant pallocs in push* function. palloc0 in parse_tsquery is another way.

>
> BTW, can you explain what the CRC-32 of a value is used for? It looks
> like it's used to speed up some operations, by comparing the CRCs before
> comparing the values, but I didn't quite figure out how it works. How
It's mostly used in GiST indexes - recalculating crc32 every time for each index
tuple to be checked is rather expensive.

> much of a performance difference does it make? Would hash_any do a
> better/cheaper job?
crc32 was chosen  after testing a lot of hash function. Perl's hash was the
fastest, but crc32 makes much less number of collisions. That's interesting  for
ASCII a lot of functions produce rather small number of collision, but for upper
part of table (0x7f-0xff) crc32 was the best. CRC32 has  evenly distributed
collisions over characters, others - not.


> In any case, I think we need to calculate the CRC/hash in tsqueryrecv,
> instead of trusting the client.
Agreed.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-patches by date:

Previous
From: Kris Jurka
Date:
Subject: Re: GSS warnings
Next
From: Bruce Momjian
Date:
Subject: Re: HOT patch - version 15