Thread: optimisation for a table with frequently used query

optimisation for a table with frequently used query

From
danmcb
Date:
Hi,

I have a table that looks like this:

CREATE TABLE my_table {
  id SERIAL PRIMARY KEY,
  id_1 INTEGER REFERENCES tab1(id),
  id_2 INTEGER REFERENCES tab2(id),
  .
  .
  .
};

I will often be running queries that look like

SELECT * from my_table where id_1 = x and id_2 = y;

Neither id_1 or id_2 or the combination of them is unique. I expect
this table to become large over time.

What can I do to help postgres run this query efficiently (if
anything)?

Apologies if this is a "RTFM" question - I looked but did not find.
Could be I didn't look hard enough ...

thanks in advance

Daniel


Re: optimisation for a table with frequently used query

From
PFC
Date:

> SELECT * from my_table where id_1 = x and id_2 = y;
>
> Neither id_1 or id_2 or the combination of them is unique. I expect
> this table to become large over time.

Create an index on (id_1, id_2), or (id_2,id_1).

Re: optimisation for a table with frequently used query

From
danmcb
Date:
Thanks!


Re: optimisation for a table with frequently used query

From
Lew
Date:
danmcb wrote:
>> SELECT * from my_table where id_1 = x and id_2 = y;
>> Neither id_1 or id_2 or the combination of them is unique. I expect
>> this table to become large over time.

PFC wrote:
> Create an index on (id_1, id_2), or (id_2,id_1).

What are the strengths and weaknesses compared to creating two indexes, one on
each column?

I am guessing that changes to the table are slower with two indexes.  How
could it affect queries?

What if the typical query pattern was more balanced among constraints on one
column, on the other, and on both?

--
Lew

Re: optimisation for a table with frequently used query

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Lew
> Sent: Tuesday, May 29, 2007 6:38 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] optimisation for a table with frequently used
query
>
> danmcb wrote:
> >> SELECT * from my_table where id_1 = x and id_2 = y;
> >> Neither id_1 or id_2 or the combination of them is unique. I expect
> >> this table to become large over time.
>
> PFC wrote:
> > Create an index on (id_1, id_2), or (id_2,id_1).
>
> What are the strengths and weaknesses compared to creating two
indexes,
> one on
> each column?

Creating one index on each column will not do nearly so much filtering
as creating one index that contains both columns (unless by chance the
data always comes in pairs).  The only time that there might be an
advantage is if your query contains only one of the two columns for
filtering.  If the missing column is the most significant from a two
column index, then the two column index won't be used.

> I am guessing that changes to the table are slower with two indexes.
How
> could it affect queries?

It will slow down update queries.  The more indexes you add, the slower
update queries become.

> What if the typical query pattern was more balanced among constraints
on
> one
> column, on the other, and on both?

Collect some statistics to reduce guesswork.

Re: optimisation for a table with frequently used query

From
danmcb
Date:
I just did some checks on two seperate indexes c.f. one combined one.
I saw almost no difference between making select statements.

Haven't tried what happens with many updates - makes sense that more
indexes will slow that down though.

again thanks - bit of a noob question I know, but it's good to
learn :-)



Re: optimisation for a table with frequently used query

From
PFC
Date:
> again thanks - bit of a noob question I know, but it's good to
> learn :-)

    Well not really since the answer is quite subtle...

    You kave two columns A and B.
    Say you have index on A, and index on B.
    These queries will make direct use of the index :
    A=... or any range on A (BETWEEN <, >, <=, >= etc )
    B=... or any range on B (BETWEEN <, >, <=, >= etc )

    Now if you ask for (A=... AND B=...) or ask for (A=... OR B=...) one
index cannot be used, so postgres uses a bitmap scan to combine the
indexes (read the docs). It is slightly slower than a direct index scan,
but still much faster than not using indexes at all.
    If you had an index on A,B it would have been used directly.

    If one of the two indexes has very poor selectivity (like just a few
different values), bitmap scan will not be optimal. If your indexes have
lots of different values, it will be about as fast as a real index.

    An index on A,B can also do WHERE A=... ORDER BY A,B without actually
doing the sort (it will pick the rows in index order), which is nice for
many things, like getting blog comments in order.