Thread: Gist consistent and compression

Gist consistent and compression

From
Yeb Havinga
Date:
Hello list,

An idea for possible improvement of query speed on gist indexes with a 
costly compression function. We have a gist index that uses a compressed 
internal datatype. The compression is with some cost (it involves a 
syscache lookup). The functions called by the consistent functions are 
binary functions on the internal type. Since the query parameter passed 
to consistent is the 'external' datatype, it must be compressed by the 
consistent function. When performing a query, the same data value (the 
query) is compressed repeatedly while traversing the nodes in the index 
tree. The query speed increased substantially after building in an ad 
hoc caching (a hash) in the compression function.

We believe that every gist index that uses compression could benefit 
from a cache, even a small one. So it would make sense to add the 
caching to the gist support in the backend, instead of having to build 
one in the user defined functions gist functions for every datatype.

Any thoughts?

regards,
Yeb Havinga



Re: Gist consistent and compression

From
Dimitri Fontaine
Date:
Hi,

Le 1 mai 09 à 23:04, Yeb Havinga a écrit :
> An idea for possible improvement of query speed on gist indexes with
> a costly compression function. We have a gist index that uses a
> compressed internal datatype. The compression is with some cost (it
> involves a syscache lookup).

[...]

> small one. So it would make sense to add the caching to the gist
> support in the backend, instead of having to build one in the user
> defined functions gist functions for every datatype.
>
> Any thoughts?

It seems to me that what you're asking for is addressed indirectly in
the possibility to make your internal data type a full SQL visible
datatype. Then you store this new datatype directly in the table and
index that. Instead of converting from external to internal type at
consistent() time in a query, provide an implicit CAST for external to
internal for queries to "just work" without editing. The CAST will get
called once per literal in the query.

See prefix and the prefix_range datatype as an example of this:  http://blog.tapoueh.org/prefix.html

I'm not sure if the caching facility should be added to -core
PostgreSQL or not, I'm just proposing another way to solve your
problem without requiring it...

Regards,
--
dim

Re: Gist consistent and compression

From
Yeb Havinga
Date:
Hi Dimitri, list
> It seems to me that what you're asking for is addressed indirectly in 
> the possibility to make your internal data type a full SQL visible 
> datatype. Then you store this new datatype directly in the table and 
> index that. Instead of converting from external to internal type at 
> consistent() time in a query, provide an implicit CAST for external to 
> internal for queries to "just work" without editing. The CAST will get 
> called once per literal in the query.
Thanks for your reply. If there would be a way that the user never sees 
the 'internal' type in query results (with some rules perhaps), it might 
work. But somehow it feels 'not right'; the user datatype is the 
external one, and how the gist index works internally should not be put 
as burden on the user. The compression is also lossy, so recheck is 
necessary and I think that will not work if only the internal type is 
stored in the relation.
> See prefix and the prefix_range datatype as an example of this:
>   http://blog.tapoueh.org/prefix.html
I wish I'd seen the slides on that link half a year ago, since I think 
they are helpful to anyone that wants to start writing a gist index. I 
remember using google translate on a russion page about gist in the 
search for information. Besides that also reading Guttmans r-tree paper 
was good, to have any idea what picksplit, penalty and union are 
supposed to do.

regards,
Yeb Havinga