Home > mailing lists

Re: Double sorting split patch - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: Double sorting split patch
Date	October 4, 2011 08:52:16
Msg-id	CAPpHfds333E_V28nRkDY2CBSDC+TDHykZ1HBdPpYkGAjb9w6dg@mail.gmail.com Whole thread Raw
In response to	Re: Double sorting split patch (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: Double sorting split patch
List	pgsql-hackers

Tree view

On Tue, Oct 4, 2011 at 12:12 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

Can you elaborate the consider-split algorithm? The criteria to select the new split over the previously selected one is this:
! /*
! * If ratio is acceptable, we should compare current split with
! * previously selected one. If no split was selected then we select
! * current anyway. Between splits of one dimension we search for
! * minimal overlap (allowing negative values) and minimal ration
! * (between same overlaps. We switch dimension if find less overlap
! * (non-negative) or less range with same overlap.
! */
! range = diminfo->upper - diminfo->lower;
! overlap = ((leftUpper) - (rightLower)) / range;
! if (context->first ||
! (context->dim == dimNum &&
! (overlap < context->overlap ||
! (overlap == context->overlap && ratio > context->ratio))) ||
! (context->dim != dimNum &&
! ((range > context->range &&
! non_negative(overlap) <= non_negative(context->overlap)) ||
! non_negative(overlap) < non_negative(context->overlap)))
! )
! {

Why are negative overlaps handled differently across dimensions and within the same dimension? Your considerSplit algorithm in the SYRCoSE 2011 paper doesn't seem to make that distinction.

Yes, I've changed this behaviour after experiments on real-life datasets. On the datasets where data don't overlap themselfs (points are always such datasets), non-overlapping splits are frequently possible. In this case splits tends to be along one dimension, because most distant non-overlapping splits (i.e. having lowest negative overlap) appears to be in the same dimension as previous split. Therefore MBRs appear to be very prolonged along another dimension, that's bad for search. In order to prevent this behavour I've made transition to another dimension by non-nagative part of overlap and range. Using range as the split criteria makes MBRs more quadratic, and using non-negative part of overlap as priority criteria give to range chance to matter.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

From: Alex Hunsaker
Date: 04 October 2011, 08:34:41
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

From: Amit Khandekar
Date: 04 October 2011, 09:10:23
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Re: Double sorting split patch - Mailing list pgsql-hackers

Previous

Next