Thread: sorting problem

sorting problem

From

Jamie Deppeler

Date:

17 December 2004, 00:26:57

Problem i am having at the moment i cant get a true alpha sort to work
as Order By is sorting A..Z then a..z where i need aA..zZ sort
independant of case.

SQL Query

SELECT
*
FROM
 person
WHERE
  (salutation LIKE '%To%')
ORDER BY
 person.lastname


Results

Ahsteit
Bloggs
Cap
Carrey
Diver
Duckula
Goldsworthy
Gruff
Harmony
Hassleberry-flop-flop
Heyheagle
Jahoosal
Straindove
Yorrick
of Finchery

Re: sorting problem

From

Peter Eisentraut

Date:

17 December 2004, 00:45:59

Jamie Deppeler wrote:
> Problem i am having at the moment i cant get a true alpha sort to
> work as Order By is sorting A..Z then a..z where i need aA..zZ sort
> independant of case.

Initialize the database cluster with a locale setting other than "C".

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: sorting problem

From

Michael Fuhr

Date:

17 December 2004, 00:50:26

On Fri, Dec 17, 2004 at 11:28:36AM +1100, Jamie Deppeler wrote:

> Problem i am having at the moment i cant get a true alpha sort to work
> as Order By is sorting A..Z then a..z where i need aA..zZ sort
> independant of case.

ORDER BY LOWER(person.lastname)

or

ORDER BY UPPER(person.lastname)

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: sorting problem

From

Michael Fuhr

Date:

17 December 2004, 00:59:59

On Fri, Dec 17, 2004 at 01:45:36AM +0100, Peter Eisentraut wrote:
> Jamie Deppeler wrote:
> > Problem i am having at the moment i cant get a true alpha sort to
> > work as Order By is sorting A..Z then a..z where i need aA..zZ sort
> > independant of case.
>
> Initialize the database cluster with a locale setting other than "C".

Hmmm...did I misunderstand something when I recommended using
ORDER BY LOWER(person.lastname)?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: sorting problem

From

Chris Smith

Date:

17 December 2004, 01:24:47

Would doing it this way require an index:

create index lower_lastname on table x lower(lastname);

?

Regards,

Chris Smith

Suite 30, 45-51 Huntley St, Alexandria, NSW 2015 Australia

Ph: +61 2 9517 2505
Fx: +61 2 9517 1915

email: info@interspire.com
web: www.interspire.com


Michael Fuhr wrote:
> On Fri, Dec 17, 2004 at 11:28:36AM +1100, Jamie Deppeler wrote:
>
>
>>Problem i am having at the moment i cant get a true alpha sort to work
>>as Order By is sorting A..Z then a..z where i need aA..zZ sort
>>independant of case.
>
>
> ORDER BY LOWER(person.lastname)
>
> or
>
> ORDER BY UPPER(person.lastname)
>


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.5.4 - Release Date: 12/15/2004

Re: sorting problem

From

Greg Stark

Date:

17 December 2004, 04:33:55

Chris Smith <chris@interspire.com> writes:

> Would doing it this way require an index:
>
> create index lower_lastname on table x lower(lastname);

Well it doesn't *require* but it may be a good idea. It depends on your
queries. It will NOT be useful for a query like:

select * from x order by lower(lastname)

where postgres won't bother with the index since it will be slower than just
resorting the entire table. The way this index is useful is if you have
queries of the form:

select * from x where lower(lastname) between ? and ? order by lower(lastname)

or

select * from x order by lower(lastname) offset ? limit ?

Though this will eventually switch to sorting when the offset is large.
Better is to use something like:

select * from x where lower(lastname) > ? order by lower(lastname) limit ?

or perhaps something like this if a merge join with fast start is useful:

select * from x join y on (x.lower(lastname)=y.lower(lastname))

But

--
greg

Re: sorting problem

From

Peter Eisentraut

Date:

17 December 2004, 09:29:46

Am Freitag, 17. Dezember 2004 01:59 schrieb Michael Fuhr:
> > Initialize the database cluster with a locale setting other than "C".
>
> Hmmm...did I misunderstand something when I recommended using
> ORDER BY LOWER(person.lastname)?

Those are two different ways to achieve the same effect in this particular
case.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: sorting problem

From

Bruno Wolff III

Date:

17 December 2004, 14:13:38

On Thu, Dec 16, 2004 at 23:33:00 -0500,
  Greg Stark <gsstark@mit.edu> wrote:
>
> Chris Smith <chris@interspire.com> writes:
>
> > Would doing it this way require an index:
> >
> > create index lower_lastname on table x lower(lastname);
>
> Well it doesn't *require* but it may be a good idea. It depends on your
> queries. It will NOT be useful for a query like:
>
> select * from x order by lower(lastname)
>
> where postgres won't bother with the index since it will be slower than just
> resorting the entire table. The way this index is useful is if you have
> queries of the form:

Using an index to do an order by is an order N operation. Doing a sort
is an order N log N operation. For large values of N, a index will be
faster. The index will slow down write operations, so it may still be
a bad idea in some cases.

Re: sorting problem

From

Greg Stark

Date:

17 December 2004, 17:11:47

Bruno Wolff III <bruno@wolff.to> writes:

> Using an index to do an order by is an order N operation.

No, using an index to do an order by is actually still n*log(n). You have to
traverse all the parent pages in the binary tree of the index as well.

This only goes to show how small the log(n) component of the order is. It's
easily dwarfed by large constants such as the difference in i/o efficiency
from non-contiguous reads.

--
greg

Re: sorting problem

From

Tom Lane

Date:

17 December 2004, 17:14:40

Bruno Wolff III <bruno@wolff.to> writes:
> Greg Stark <gsstark@mit.edu> wrote:
>> where postgres won't bother with the index since it will be slower than just
>> resorting the entire table.

> Using an index to do an order by is an order N operation. Doing a sort
> is an order N log N operation. For large values of N, a index will be
> faster.

Unfortunately not ... the constant factors are such that the index
solution isn't very competitive at large N, unless the table is already
well ordered (ie clustered).  The sort code is a lot better at avoiding
random-seeks-all-over-the-disk syndrome.

I suppose your argument is good as N goes to infinity, but for
real-world cases we don't seem to reach the asymptotic regime.

            regards, tom lane

Re: sorting problem

From

Tom Lane

Date:

17 December 2004, 18:29:41

Greg Stark <gsstark@mit.edu> writes:
> Bruno Wolff III <bruno@wolff.to> writes:
>> Using an index to do an order by is an order N operation.

> No, using an index to do an order by is actually still n*log(n). You have to
> traverse all the parent pages in the binary tree of the index as well.

Only if you searched afresh from the root for each key, which an
indexscan is not going to do.

            regards, tom lane

Re: sorting problem

From

Lincoln Yeoh

Date:

17 December 2004, 20:12:12

--=======6C297FEF=======
Content-Type: text/plain; x-avg-checked=avg-ok-43C64EFF; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit

At 12:14 PM 12/17/2004 -0500, Tom Lane wrote:

>Bruno Wolff III <bruno@wolff.to> writes:
> > Greg Stark <gsstark@mit.edu> wrote:
> >> where postgres won't bother with the index since it will be slower
> than just
> >> resorting the entire table.
>
> > Using an index to do an order by is an order N operation. Doing a sort
> > is an order N log N operation. For large values of N, a index will be
> > faster.
>
>Unfortunately not ... the constant factors are such that the index
>solution isn't very competitive at large N, unless the table is already
>well ordered (ie clustered).  The sort code is a lot better at avoiding
>random-seeks-all-over-the-disk syndrome.

Which would involve reading less data?

In some cases I'd like to use the method that is more likely to fit within RAM.

Also if there are multiple parallel queries, one could end up with
something similar to random-seeks, so reading/writing less data could still
be better.

Regards,
Link.

--=======6C297FEF=======--

Re: sorting problem

From

Greg Stark

Date:

17 December 2004, 20:37:29

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Bruno Wolff III <bruno@wolff.to> writes:
> >> Using an index to do an order by is an order N operation.
>
> > No, using an index to do an order by is actually still n*log(n). You have to
> > traverse all the parent pages in the binary tree of the index as well.
>
> Only if you searched afresh from the root for each key, which an
> indexscan is not going to do.

Isn't that still nlog(n)? In the end you're going to have read in every page
of the index including all those non-leaf pages. Aren't there nlog(n) pages?

--
greg

Re: sorting problem

From

Bruno Wolff III

Date:

17 December 2004, 20:59:39

On Fri, Dec 17, 2004 at 15:36:58 -0500,
  Greg Stark <gsstark@mit.edu> wrote:
>
> Isn't that still nlog(n)? In the end you're going to have read in every page
> of the index including all those non-leaf pages. Aren't there nlog(n) pages?

The depth of the tree is log N, but there are only N nodes. I think you
can treat the amount of information at each node as constant.

Re: sorting problem

From

Bruno Wolff III

Date:

17 December 2004, 21:07:11

On Fri, Dec 17, 2004 at 15:12:18 -0600,
  Bruno Wolff III <bruno@wolff.to> wrote:
> On Fri, Dec 17, 2004 at 15:36:58 -0500,
>   Greg Stark <gsstark@mit.edu> wrote:
> >
> > Isn't that still nlog(n)? In the end you're going to have read in every page
> > of the index including all those non-leaf pages. Aren't there nlog(n) pages?
>
> The depth of the tree is log N, but there are only N nodes. I think you
> can treat the amount of information at each node as constant.

That should have been 2N-1 nodes.

Re: sorting problem

From

Tom Lane

Date:

17 December 2004, 21:16:21

Bruno Wolff III <bruno@wolff.to> writes:
> Greg Stark <gsstark@mit.edu> wrote:
>> Isn't that still nlog(n)? In the end you're going to have read in every page
>> of the index including all those non-leaf pages. Aren't there nlog(n) pages?

> The depth of the tree is log N, but there are only N nodes. I think you
> can treat the amount of information at each node as constant.

Besides, we don't read any non-leaf pages except the ones down the left
edge of the btree.  An indexscan just visits the leaf pages using their
sibling links to get from one to the next.  (If it did otherwise, we'd
not produce the output in sorted order, which would make this discussion
moot...)

            regards, tom lane