Re: [PERFORM] A Better External Sort? - Mailing list pgsql-hackers

From Ron Peacetree
Subject Re: [PERFORM] A Better External Sort?
Date
Msg-id 138872.1128556455518.JavaMail.root@elwamui-milano.atl.sa.earthlink.net
Whole thread Raw
In response to [PERFORM] A Better External Sort?  (Ron Peacetree <rjpeace@earthlink.net>)
Responses Re: [PERFORM] A Better External Sort?  (Hannu Krosing <hannu@skype.net>)
Re: [PERFORM] A Better External Sort?  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
I'm putting in as much time as I can afford thinking about pg related
performance issues.  I'm doing it because of a sincere desire to help
understand and solve them, not to annoy people.

If I didn't believe in pg, I would't be posting thoughts about how to
make it better.

It's probably worth some review (suggestions marked with a "+":

+I came to the table with a possibly better way to deal with external
sorts (that now has branched into 2 efforts: short term improvements
to the existing code, and the original from-the-ground-up idea).  That
suggestion was based on a great deal of prior thought and research,
despite what some others might think.

Then we were told that our IO limit was lower than I thought.

+I suggested that as a "Quick Fix" we try making sure we do IO
transfers in large enough chunks based in the average access time
of the physical device in question so as to achieve the device's
ASTR (ie at least 600KB per access for a 50MBps ASTR device with
a 12ms average access time.) whenever circumstances allowed us.
As far as I know, this experiment hasn't been tried yet.

I asked some questions about physical layout and format translation
overhead being possibly suboptimal that seemed to be agreed to, but
specifics as to where we are taking the hit don't seem to have been
made explicit yet.

+I made the "from left field" suggestion that perhaps a pg native fs
format would be worth consideration.  This is a major project, so
the suggestion was to at least some extent tongue-in-cheek.

+I then made some suggestions about better code instrumentation
so that we can more accurately characterize were the bottlenecks are.

We were also told that evidently we are CPU bound far before one
would naively expect to be based on the performance specifications
of the components involved.

Double checking among the pg developer community led to some
differing opinions as to what the actual figures were and under what
circumstances they were achieved.  Further discussion seems to have
converged on both accurate values and a better understanding as to
the HW and SW  needed; _and_ we've gotten some RW confirmation
as to what current reasonable expectations are within this problem
domain from outside the pg community.

+Others have made some good suggestions in this thread as well.
Since I seem to need to defend my tone here, I'm not detailing them
here.  That should not be construed as a lack of appreciation of them.

Now I've asked for the quickest path to detailed understanding of the
pg IO subsystem.  The goal being to get more up to speed on its
coding details.  Certainly not to annoy you or anyone else.

At least from my perspective, this for the most part seems to have
been an useful and reasonable engineering discussion that has
exposed a number of important things.

Regards,
Ron

pgsql-hackers by date:

Previous
From: Richard Huxton
Date:
Subject: Resultset duplicates (was Re: prefix btree implementation)
Next
From: "Luke Lonergan"
Date:
Subject: Re: [PERFORM] A Better External Sort?