Home > mailing lists

Re: GiST support for inet datatypes - Mailing list pgsql-hackers

From	Andreas Karlsson
Subject	Re: GiST support for inet datatypes
Date	February 24, 2014 03:44:25
Msg-id	530A95E3.2080702@proxel.se Whole thread Raw
In response to	Re: GiST support for inet datatypes (Emre Hasegeli <emre@hasegeli.com>)
Responses	Re: GiST support for inet datatypes (Emre Hasegeli <emre@hasegeli.com>)
List	pgsql-hackers

Tree view

On 02/06/2014 06:14 PM, Emre Hasegeli wrote:
> Third versions of the patches attached. They are rebased to the HEAD. In
> this versions, the bitncommon function is changed. <sys/socket.h> included
> to network_gist.c to be able to compile it on FreeBSD. Geometric mean
> calculation for partial bucket match on the function
> inet_hist_inclusion_selectivity
> reverted back. It was something I changed without enough testing on
> the second revision of the patch. This version uses the maximum divider
> calculated from the boundaries of the bucket, like the first version. It is
> simpler and more reliable.

Thanks for the updated patch.

About the discussions about upgrading PostgreSQL, extensions and 
defaults I do not have any strong opinion. I think that this patch is 
useful even if it does not end up the default, but it would be a pity 
since the BTree GiST index is broken.

Note: The patches do not apply anymore due to changes to 
src/backend/utils/adt/Makefile.

>> I am not convinced of your approach to calculating the selectivity from the
>> histogram. The thing I have the problem with is the clever trickery involved
>> with how you handle different operator types. I prefer the clearer code of
>> the range types with how calc_hist_selectivity_scalar is used. Is there any
>> reason for why that approach would not work here or result in worse code?
>
> Currently we do not have histogram of the lower and upper bounds as
> the range types. Current histogram can be used nicely as the lower bound,
> but not the upper bound because the comparison is first on the common bits
> of the network part, then on the length of the network part. For example,
> 10.0/16 is defined as greater than 10/8.
>
> Using the histogram as the lower bounds of the networks is not enough to
> calculate selectivity for any of these operators. Using it also as the upper
> bounds is still not enough for the inclusion operators. The lengths of
> the network parts should taken into consideration in a way and it is
> what this patch does. Using separate histograms for the lower bounds,
> the upper bounds and the lengths of the network parts can solve all of these
> problems, but it is a lot of work.

I see, thanks for the explanation. But I am still not very fond of how 
that code is written since I find it hard to verify the correctness of 
it, but have no better suggestions.

>> I see from the tests that you still are missing selectivity functions for
>> operators, what is your plan for this?
>
> This was because the join selectivity estimation functions. I set
> the geo_selfuncs for the missing ones. All tests pass with them. I want
> to develop the join selectivity function too, but not for this commit fest.

All tests pass now. Excellent!

Do you think the new index is useful even if you use the basic 
geo_selfuncs? Or should we wait with committing the patches until all 
selfuncs are implemented?

-- 
Andreas Karlsson

pgsql-hackers by date:

From: Marko Kreen
Date: 24 February 2014, 00:31:41
Subject: Re: SSL: better default ciphersuite

From: Josh Berkus
Date: 24 February 2014, 06:23:48
Subject: Re: jsonb and nested hstore

Re: GiST support for inet datatypes - Mailing list pgsql-hackers

Previous

Next