Re: qsort again (was Re: [PERFORM] Strange Create Index - Mailing list pgsql-hackers

From Craig A. James
Subject Re: qsort again (was Re: [PERFORM] Strange Create Index
Date
Msg-id 43F4A7D8.4050907@modgraph-usa.com
Whole thread Raw
In response to Re: qsort again (was Re: [PERFORM] Strange Create Index  (Markus Schaber <schabi@logix-tt.com>)
Responses Re: qsort again (was Re: [PERFORM] Strange Create Index  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Markus Schaber wrote:
> Ron wrote:
>>...and of course if you know enough about the data to be sorted so as to
>>constrain it appropriately, one should use a non comparison based O(N)
>>sorting algorithm rather than any of the general comparison based
>>O(NlgN) methods.
>
> Sounds interesting, could you give us some pointers (names, URLs,
> papers) to such algorithms?

Most of these techniques boil down to good ol' "bucket sort".  A simple example: suppose you have a column of integer
percentages,range zero to 100.  You know there are only 101 distinct values.  So create 101 "buckets" (e.g. linked
lists),make a single pass through your data and drop each row's ID into the right bucket, then make a second pass
throughthe buckets, and write the row ID's out in bucket order.  This is an O(N) sort technique. 

Any time you have a restricted data range, you can do this.  Say you have 100 million rows of scientific results known
tobe good to only three digits -- it can have at most 1,000 distinct values (regardless of the magnitude of the
values),so you can do this with 1,000 buckets and just two passes through the data. 

You can also use this trick when the optimizer is asked for "fastest first result."  Say you have a cursor on a column
ofnumbers with good distribution.  If you do a bucket sort on the first two or three digits only, you know the first
"page"of results will be in the first bucket.  So you only need to apply qsort to that first bucket (which is very
fast,since it's small), and you can deliver the first page of data to the application.  This can be particularly
effectivein interactive situations, where the user typically looks at a few pages of data and then abandons the search.
 

I doubt this is very relevant to Postgres.  A relational database has to be general purpose, and it's hard to give it
"hints"that would tell it when to use this particular optimization. 

Craig

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: qsort again (was Re: [PERFORM] Strange Create
Next
From: Ron
Date:
Subject: Re: qsort again (was Re: [PERFORM] Strange Create