Home > mailing lists

Re: The REAL cost of joins - Mailing list pgsql-general

From	Peter Hunsberger
Subject	Re: The REAL cost of joins
Date	March 3, 2010 21:34:04
Msg-id	cc159a4a1003031433r27c8de5erea85d0b841117b9@mail.gmail.com Whole thread Raw
In response to	The REAL cost of joins (Marcin Krol <mrkafk@gmail.com>)
List	pgsql-general

Tree view

On Wed, Mar 3, 2010 at 3:59 PM, Marcin Krol <mrkafk@gmail.com> wrote:
>
> Hello everyone,
>
> I have inadvertently set off a furball on an unrelated ng on what is the actual cost of SQL joins. But there's no
reliableconclusion. I would like to actually know that, that is, are JOINs truly expensive? 
>
> As they say, one measurement is worth a thousand opinions, so I've done measurement on my PG app:
>
> $ time echo "\c hrs;
> SELECT hosts.ip, reservation.start_date, architecture.architecture, os_kind.os_kind, os_rel.os_rel,
os_version.os_version,project.project, email.email FROM hosts 
>  INNER JOIN project ON project.id = hosts.project_id
>  INNER JOIN architecture ON hosts.architecture_id = architecture.id
>  INNER JOIN os_kind ON os_kind.id = hosts.os_kind_id
>  INNER JOIN os_rel ON hosts.os_rel_id = os_rel.id
>  INNER JOIN os_version ON hosts.os_version_id = os_version.id
>  INNER JOIN reservation_hosts ON hosts.id = reservation_hosts.host_id
>  INNER JOIN reservation on reservation.id = reservation_hosts.reservation_id
>  INNER JOIN email ON reservation.email_id = email.id
>
> ;" | psql > /dev/null
>
> real    0m0.099s
> user    0m0.015s
> sys     0m0.005s
>
>
>
> $ time echo "\c hrs;
> > SELECT hosts.ip FROM hosts;
> > SELECT reservation.start_date FROM reservation;
> > SELECT architecture.architecture FROM architecture;
> > SELECT os_rel.os_rel FROM os_rel;
> > SELECT os_version.os_version FROM os_version;
> > SELECT project.project FROM project;
> > SELECT email.email FROM email;
> > " | psql > /dev/null
>
> real    0m0.046s
> user    0m0.008s
> sys     0m0.004s
>
> Note: I've created indexes on those tables, both on data columns like hosts.ip and on .id columns.
>
> What do you think of this? And in general: when (if?) should one denormalize data?

I don't get what these two examples are supposed to show?  They do
not, in general, create equivalent output data. If you had to write
procedural logic to do the same correlation between the individual
result sets as the joins are performing then you'd be back to at least
the same performance and probably worse, so what exactly is being
compared here?

As to your second question; well, outside of not doing premature
optimization I'll observe that for operational databases,
denormalization rarely gives me enough savings to justify the
resultant extra data management required to maintain it.  If you're
talking about a data warehouse (and given the way you've worded the
question I supsect you are not?) then that's a different question, in
that case I prefer to generate the data from the operational side of
the house so you're no longer paying an operational cost to maintain
the denormalized data (you've got a whole different set of costs
instead).

--
Peter Hunsberger

pgsql-general by date:

From: Guillaume Yziquel
Date: 03 March 2010, 21:27:23
Subject: Re: PQntuples returns an int.

From: Justin Graf
Date: 03 March 2010, 21:34:17
Subject: Re: Scratching my head why results are different between machines.

Re: The REAL cost of joins - Mailing list pgsql-general

Previous

Next