Home > mailing lists

Re: How best to index foreign key for inner join - Mailing list pgsql-novice

From	Thom Brown
Subject	Re: How best to index foreign key for inner join
Date	November 27, 2009 11:16:57
Msg-id	bddc86150911270716lc2993efuac35a30d93d53821@mail.gmail.com Whole thread Raw
In response to	Re: How best to index foreign key for inner join (Nathaniel Trellice <naptrel@yahoo.co.uk>)
List	pgsql-novice

Tree view

2009/11/27 Nathaniel Trellice <naptrel@yahoo.co.uk>

On 27 Nov 2009, at 11:12, Thom Brown <thombrown@gmail.com> wrote:

Since for each row of table1 you'll be looking for multiple records on table2, I imagine you'd want to index table2.table1_id, and table2.t since that's what you're filtering on. If table1 is particularly big, you might benefit from also indexing table1.name.

I personally prefer to use INNER JOIN syntax as it's clearer, although the query planner will probably be identical as it's clever that way:

SELECT table2.x, table2.y, table2.t
FROM table1
INNER JOIN table2 ON table1.id = table2.table1_id
AND table1.name = 'some_name'
WHERE table2.t BETWEEN some_t AND some_other_t;

Thanks for the help Thom.

Do you think it's possible to phrase this query in such a way that the planner would be able to exploit any benefits from a mutli-column index on table2 on either (table1_id, t) or (t, table1_t)?

If you imagine setting up a view that encapsulates the inner join, creating a 'virtual' table with columns:

name text
x real
y real
t timestampz

Then the manual (11.3) suggests that ANDed queries on 'name' and 't' will be improved by using a multi-column index (name, t).

But the join confuses me. When the rule system breaks down both the query and the view's join, will the benefits of the multi-column index still be realised?

Nathaniel

Again, it really depends on the size of your tables. If, for example, table2 contains 5 million rows and 99% of them have the same t value, and index probably won't be used by the planner since it's value distribution in the statistics data for that table (gathered by ANALYZE) would suggest a sequential scan is just as efficient. If you have a lot of varying combinations between table2.table1_id and table2.t, a multi-column index will probably benefit you.

A good way to tell how much work is involved in running a query is to put EXPLAIN ANALYZE before it to see where the more expensive parts of the query are.

There are many instances where indexes won't be used by the planner purely because it's statistics tell it that's it's more or equally efficient to a sequential scan. You need to know your data well and take the size of the table into account.

I'm hoping someone with better query-planner knowledge can chime in here with more solid recommendations though.

Thom

pgsql-novice by date:

From: "Jean-Yves F. Barbier"
Date: 27 November 2009, 11:05:01
Subject: Re: index speed-up and automatic tables/procedures creation

From: "Jean-Yves F. Barbier"
Date: 27 November 2009, 11:19:11
Subject: views

Re: How best to index foreign key for inner join - Mailing list pgsql-novice

Previous

Next