Re: How best to index foreign key for inner join - Mailing list pgsql-novice

From Thom Brown
Subject Re: How best to index foreign key for inner join
Date
Msg-id bddc86150911270716lc2993efuac35a30d93d53821@mail.gmail.com
Whole thread Raw
In response to Re: How best to index foreign key for inner join  (Nathaniel Trellice <naptrel@yahoo.co.uk>)
List pgsql-novice
2009/11/27 Nathaniel Trellice <naptrel@yahoo.co.uk>
On 27 Nov 2009, at 11:12, Thom Brown <thombrown@gmail.com> wrote:

Since for each row of table1 you'll be looking for multiple records on table2, I imagine you'd want to index table2.table1_id, and table2.t since that's what you're filtering on.  If table1 is particularly big, you might benefit from also indexing table1.name.

I personally prefer to use INNER JOIN syntax as it's clearer, although the query planner will probably be identical as it's clever that way:

SELECT table2.x, table2.y, table2.t
FROM table1
INNER JOIN table2 ON table1.id = table2.table1_id
AND table1.name = 'some_name'
WHERE table2.t BETWEEN some_t AND some_other_t;

Thanks for the help Thom.

Do you think it's possible to phrase this query in such a way that the planner would be able to exploit any benefits from a mutli-column index on table2 on either (table1_id, t) or (t, table1_t)?

If you imagine setting up a view that encapsulates the inner join, creating a 'virtual' table with columns:

name text
x real
y real
t timestampz

Then the manual (11.3) suggests that ANDed queries on 'name' and 't' will be improved by using a multi-column index (name, t).

But the join confuses me. When the rule system breaks down both the query and the view's join, will the benefits of the multi-column index still be realised?


Nathaniel


Again, it really depends on the size of your tables.  If, for example, table2 contains 5 million rows and 99% of them have the same t value, and index probably won't be used by the planner since it's value distribution in the statistics data for that table (gathered by ANALYZE) would suggest a sequential scan is just as efficient.  If you have a lot of varying combinations between table2.table1_id and table2.t, a multi-column index will probably benefit you.

A good way to tell how much work is involved in running a query is to put EXPLAIN ANALYZE before it to see where the more expensive parts of the query are.

There are many instances where indexes won't be used by the planner purely because it's statistics tell it that's it's more or equally efficient to a sequential scan.  You need to know your data well and take the size of the table into account.

I'm hoping someone with better query-planner knowledge can chime in here with more solid recommendations though.

Thom

pgsql-novice by date:

Previous
From: "Jean-Yves F. Barbier"
Date:
Subject: Re: index speed-up and automatic tables/procedures creation
Next
From: "Jean-Yves F. Barbier"
Date:
Subject: views