Re: [HACKERS] Fwd: Joins and links - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: [HACKERS] Fwd: Joins and links
Date
Msg-id 3785A8ED.C906F585@trust.ee
Whole thread Raw
In response to Re: [HACKERS] Fwd: Joins and links  (Bob Devine <devine@cs.utah.edu>)
Responses Re: [HACKERS] Fwd: Joins and links
List pgsql-hackers
Leon wrote:
> 
> Bob Devine wrote:
> 
> > It is my many years of watching databases in use that suggest
> > that links are not worth the overhead.  My gut feeling is that
> > links would speed up a simple join by only 10% and there are
> > many other ways to speed up joins.
> 
> Let's count. We have two tables, joined by link. What is the
> cost of lookup? First there is an index scan, which is between
> 2 and 5 iterations, and link lookup which is 1 iteration. Average
> is 4 iterations.

This is true for the case wher you want to look up only one row.

The difference will quickly degrade as more rows are fetched in one 
query and cache misses and disk head movement start rattling your 
disks. The analogy being a man who needs 10 different items from a 
supermarket and takes 10 full round trips from home to buy them.

> And if we don't have link, there is 6 iterations.
> More than 10% already! We still didn't consider joining multiple
> tables and big tables.

I think that the two-tables-one-row lookup will gain the most, 
probably even more than 10%

> So the gain will be big anyway.
> 
> That is not to consider the optimizer (do I sound like a broken
> record? :)  To be sincere, current Postgres optimizer sucks heavily
> and in most cases can't figure out the fastest way.

Adding links does nothing to improve the optimizer, its still free 
to choose sucky plans. It is possible that links are faster if used 
in the right way, as they cut out the index lookup, but I suspect that 
hard-coding link-is-always-faster into the optimiser would also produce 
a lot of very bad plans. 

The link-is-always-faster is probably true only for all-memory
databases, 
and even there not allways - for example if it happened to produce a
worse 
initial ordering for sort/group by than some other strategy, a complex 
query can still run slower (the difference will be small either way)

> Implementing links is a quick and cheap way to get a performance 
> gain on a wide range of tasks.

Fixing the optimizer would get a performance gain on a far wider 
range of tasks, and is still needed for links.

> I am obliged to repeat this again and again,
> because every day there appears a new developer who didn't hear
> that yet :)

Unfortunaltely there are far less _developers_ than letter-writers, and
it
is sometimes quite hard to make them even commit good and useful patches 
that are ready.

So I quess thet if you want links in foreseeable future, your best bet 
would be to start coding, and to coordinate with whoever starts to
fix/rewrite
the optimizer (probably Vadim)

(BTW, in PostgreSQL, I still consider myself a letter-writer and not 
developer, as I have committed no code for the backend)

-------------
Hannu


pgsql-hackers by date:

Previous
From: Tom Ivar Helbekkmo
Date:
Subject: Re: [HACKERS] Fwd: Joins and links
Next
From: Hannu Krosing
Date:
Subject: Re: [HACKERS] Updated TODO list