Re: The REAL cost of joins - Mailing list pgsql-general

From Merlin Moncure
Subject Re: The REAL cost of joins
Date
Msg-id b42b73151003040700r2fb5e08ew92e51a7699ec158e@mail.gmail.com
Whole thread Raw
In response to The REAL cost of joins  (Marcin Krol <mrkafk@gmail.com>)
List pgsql-general
On Wed, Mar 3, 2010 at 4:59 PM, Marcin Krol <mrkafk@gmail.com> wrote:
> Hello everyone,
>
> I have inadvertently set off a furball on an unrelated ng on what is the
> actual cost of SQL joins. But there's no reliable conclusion. I would like
> to actually know that, that is, are JOINs truly expensive?

There's a lot of variables that go in to join performance.  Are the
necessary columns indexed, what is the comparison operator, size of
the index, etc?  Complicating the calculation, postgres has access to
different algorithms it chooses depending on circumstances.  In
general, joins cost less than a lot of novice programmers think
because they are simply unaware of the number of highly efficient ways
to attack the problem of doing set operations on lists of sorted
items, or perhaps have had bad experiences on databases that had a
lousy selection of algorithms or poor (if any) statistics.

The three biggest factors on join performance are 1: having index in
appropriate places,  2: if/when there is a full cache fault and the
database has to get information from disk when executing the join.
And 3: there being enough working memory to do things like 'in query'
sorts. Point 2 is where denormalization can reap real benefits because
it can reduce the number of disk seeks to get data (compared to the
cpu and memory, disk seeks take eons).

Databases have a characteristic that reminds me of some of the
challenges that aircraft engineers face.  Jets that go supersonic have
an entirely different set of operating principles because above the
speed of sound all the rules changes in terms of thermals, vibration,
stresses, etc.  Similarly in the database world performance tend to
lurch in a very unpleasant direction when you active working set of
data is not able to be properly served in RAM.   The 'sonic boom' you
hear is your disk drives grinding as the clock ticks away...for many
DBAs hearing this sound the first time is a life changing experience.
This is when those little things you never bothered to think about
like having the records that tend to be pulled up together by grouped
together on disk become suddenly very important...you discover the
CLUSTER command and become best friends :-).

merlin

pgsql-general by date:

Previous
From: Justin Graf
Date:
Subject: Re: Scratching my head why results are different between machines.
Next
From: Greg Stark
Date:
Subject: Re: Scratching my head why results are different between machines.