Thread: optimisation for a table with frequently used query
Hi, I have a table that looks like this: CREATE TABLE my_table { id SERIAL PRIMARY KEY, id_1 INTEGER REFERENCES tab1(id), id_2 INTEGER REFERENCES tab2(id), . . . }; I will often be running queries that look like SELECT * from my_table where id_1 = x and id_2 = y; Neither id_1 or id_2 or the combination of them is unique. I expect this table to become large over time. What can I do to help postgres run this query efficiently (if anything)? Apologies if this is a "RTFM" question - I looked but did not find. Could be I didn't look hard enough ... thanks in advance Daniel
> SELECT * from my_table where id_1 = x and id_2 = y; > > Neither id_1 or id_2 or the combination of them is unique. I expect > this table to become large over time. Create an index on (id_1, id_2), or (id_2,id_1).
Thanks!
danmcb wrote: >> SELECT * from my_table where id_1 = x and id_2 = y; >> Neither id_1 or id_2 or the combination of them is unique. I expect >> this table to become large over time. PFC wrote: > Create an index on (id_1, id_2), or (id_2,id_1). What are the strengths and weaknesses compared to creating two indexes, one on each column? I am guessing that changes to the table are slower with two indexes. How could it affect queries? What if the typical query pattern was more balanced among constraints on one column, on the other, and on both? -- Lew
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Lew > Sent: Tuesday, May 29, 2007 6:38 AM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] optimisation for a table with frequently used query > > danmcb wrote: > >> SELECT * from my_table where id_1 = x and id_2 = y; > >> Neither id_1 or id_2 or the combination of them is unique. I expect > >> this table to become large over time. > > PFC wrote: > > Create an index on (id_1, id_2), or (id_2,id_1). > > What are the strengths and weaknesses compared to creating two indexes, > one on > each column? Creating one index on each column will not do nearly so much filtering as creating one index that contains both columns (unless by chance the data always comes in pairs). The only time that there might be an advantage is if your query contains only one of the two columns for filtering. If the missing column is the most significant from a two column index, then the two column index won't be used. > I am guessing that changes to the table are slower with two indexes. How > could it affect queries? It will slow down update queries. The more indexes you add, the slower update queries become. > What if the typical query pattern was more balanced among constraints on > one > column, on the other, and on both? Collect some statistics to reduce guesswork.
I just did some checks on two seperate indexes c.f. one combined one. I saw almost no difference between making select statements. Haven't tried what happens with many updates - makes sense that more indexes will slow that down though. again thanks - bit of a noob question I know, but it's good to learn :-)
> again thanks - bit of a noob question I know, but it's good to > learn :-) Well not really since the answer is quite subtle... You kave two columns A and B. Say you have index on A, and index on B. These queries will make direct use of the index : A=... or any range on A (BETWEEN <, >, <=, >= etc ) B=... or any range on B (BETWEEN <, >, <=, >= etc ) Now if you ask for (A=... AND B=...) or ask for (A=... OR B=...) one index cannot be used, so postgres uses a bitmap scan to combine the indexes (read the docs). It is slightly slower than a direct index scan, but still much faster than not using indexes at all. If you had an index on A,B it would have been used directly. If one of the two indexes has very poor selectivity (like just a few different values), bitmap scan will not be optimal. If your indexes have lots of different values, it will be about as fast as a real index. An index on A,B can also do WHERE A=... ORDER BY A,B without actually doing the sort (it will pick the rows in index order), which is nice for many things, like getting blog comments in order.