Re: Sort performance on large tables - Mailing list pgsql-performance

From Luke Lonergan
Subject Re: Sort performance on large tables
Date
Msg-id 3E37B936B592014B978C4415F90D662D0193B9F0@MI8NYCMAIL06.Mi8.com
Whole thread Raw
In response to Sort performance on large tables  (Charlie Savage <cfis@interserv.com>)
Responses Re: Sort performance on large tables
List pgsql-performance
Charlie,

> Should I expect results like this?  I realize that the
> computer is quite low-end and is very IO bound for this
> query, but I'm still surprised that the sort operation takes so long.

It's the sort performance of Postgres that's your problem.

> Out of curiosity, I setup an Oracle database on the same
> machine with the same data and ran the same query.  Oracle
> was over an order of magnitude faster. Looking at its query
> plan, it avoided the sort by using "HASH GROUP BY."  Does
> such a construct exist in PostgreSQL (I see only hash joins)?

Yes, hashaggregate does a similar thing.  You can force the planner to
do it, don't remember off the top of my head but someone else on-list
will.

> Also as an experiment I forced oracle to do a sort by running
> this query:
>
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid
> ORDER BY tlid;
>
> Even with this, it was more than a magnitude faster than Postgresql.
> Which makes me think I have somehow misconfigured postgresql
> (see the relevant parts of postgresql.conf below).

Just as we find with a similar comparison (with a "popular commercial,
proprietary database" :-) Though some might suggest you increase
work_mem or other tuning suggestions to speed sorting, none work.  In
fact, we find that increasing work_mem actually slows sorting slightly.

We are commissioning an improved sorting routine for bizgres
(www.bizgres.org) which will be contributed to the postgres main, but
won't come out at least until 8.2 comes out, possibly 12 mos.  In the
meantime, you will be able to use the new routine in the bizgres version
of postgres, possibly in the next couple of months.

Also - we (Greenplum) are about to announce the public beta of the
bizgres MPP database, which will use all of your CPUs, and those of
other nodes in a cluster, for sorting.  We see a linear scaling of sort
performance, so you could add CPUs and/or hosts and scale out of the
problem.

Cheers,

- Luke


pgsql-performance by date:

Previous
From: Mario Weilguni
Date:
Subject: Re: 8.1 iss
Next
From: "Marc Morin"
Date:
Subject: Re: Sort performance on large tables