On fös, 2006-02-17 at 08:01 -0500, Ron wrote:
> At 04:24 AM 2/17/2006, Ragnar wrote:
> >On fös, 2006-02-17 at 01:20 -0500, Ron wrote:
> > >
> > > OK, so here's _a_ way (there are others) to obtain a mapping such that
> > > if a < b then f(a) < f (b) and
> > > if a == b then f(a) == f(b)
> >
> > > By scanning the table once, we can map say 0000001h (Hex used to ease
> > > typing) to the row with the minimum value and 1111111h to the row
> > > with the maximum value as well as mapping everything in between to
> > > their appropriate keys. That same scan can be used to assign a
> > > pointer to each record's location.
> >
> >This step is just as expensive as the original
> >sort you want to replace/improve.
>
> Why do you think that? External sorts involve
> the equivalent of multiple scans of the table to
> be sorted, sometimes more than lgN (where N is
> the number of items in the table to be
> sorted). Since this is physical IO we are
> talking about, each scan is very expensive, and
> therefore 1 scan is going to take considerably
> less time than >= lgN scans will be.
Call me dim, but please explain exactly how you are going
to build this mapping in one scan. Are you assuming
the map will fit in memory?
>
>
> >If you want to keep this mapping saved as a sort
> >of an index, or as part ot each row data, this
> >will make the cost of inserts and updates enormous.
>
> Not sure you've got this right either. Looks to
> me like we are adding a <= 32b quantity to each
> row. Once we know the mapping, incrementally
> updating it upon insert or update would seem to
> be simple matter of a fast search for the correct
> ranking [Interpolation search, which we have all
> the needed data for, is O(lglgN). Hash based
> search is O(1)]; plus an increment/decrement of
> the key values greater/less than the key value of
> the row being inserted / updated. Given than we
> are updating all the keys in a specific range
> within a tree structure, that update can be done
> in O(lgm) (where m is the number of records affected).
Say again ?
Let us say you have 1 billion rows, where the
column in question contains strings like
baaaaaaaaaaaaaaa....aaa
baaaaaaaaaaaaaaa....aab
baaaaaaaaaaaaaaa....aac
...
not necessarily in this order on disc of course
The minimum value would be keyed as 00000001h,
the next one as 00000002h and so on.
Now insert new value 'aaaaa'
Not only will you have to update 1 billion records,
but also all the values in your map.
please explain
gnari