Thread: seqscan instead of index scan
Dear all: Im having a weird problem here. I have a table w/ ~180.000 rows and I want to select those where c > 0 or d > 0 (there only a few of those on the table) I indexed columns c and d (separately) but this query used the slow seqscan instead of the index scan: select * from t where c<>0 or d<>0; After playing some time, I noticed that if I change the "or" for an "and", pg used the fast index scan (but the query w/ 'and' was not what I as looking for). Then, I thought I could do the following: Creating an index with the expression (c+d) and selecting the rows where c+d > 0: select * from t where c + d <> 0; Again, this used a seqscan. Asking in #postgresql in freenode, somebody told me to try to disable seqscan (set enable_seqscan false) and suprisingly, Pg started using the index scan and it was -fast-. Now: I've no idea why it chooses to use a seq scan instead of the index scan (yes, I've just vacuum analyzed the table before running the query). Some more info: c and d are both bigint. I've tried the queries casting the constant (0) to bigint but nothing changed. Im using debian's pg 7.4.1-2. Thanks in advance
Attachment
On Mon, Aug 30, 2004 at 14:46:37 -0300, Martin Sarsale <martin@emepe3.net> wrote: > Dear all: > > Im having a weird problem here. I have a table w/ ~180.000 rows and I > want to select those where c > 0 or d > 0 (there only a few of those on > the table) > I indexed columns c and d (separately) but this query used the slow > seqscan instead of the index scan: Postgres doesn't 'or' bitmaps derived from two indexes. You might have more luck using a combined index. > > select * from t where c<>0 or d<>0; > > After playing some time, I noticed that if I change the "or" for an > "and", pg used the fast index scan (but the query w/ 'and' was not what > I as looking for). > > Then, I thought I could do the following: > Creating an index with the expression (c+d) and selecting the rows where > c+d > 0: > select * from t where c + d <> 0; > > Again, this used a seqscan. Asking in #postgresql in freenode, somebody > told me to try to disable seqscan (set enable_seqscan false) and > suprisingly, Pg started using the index scan and it was -fast-. > > Now: I've no idea why it chooses to use a seq scan instead of the index > scan (yes, I've just vacuum analyzed the table before running the > query). > > Some more info: > c and d are both bigint. I've tried the queries casting the constant (0) > to bigint but nothing changed. > > Im using debian's pg 7.4.1-2. > > > Thanks in advance >
> Im having a weird problem here. I have a table w/ ~180.000 rows and I > want to select those where c > 0 or d > 0 (there only a few of those on > the table) > I indexed columns c and d (separately) but this query used the slow > seqscan instead of the index scan: create function is_somethingable (ctype, dtype) returns boolean as ' return case when $1 > 0 and $2 > 0 then true else false end; ' language sql immutable; create index t_idx on t(is_somethingable(c,d)); analyze t; select * from t where is_somethingable(t.c, t.d) = true; Merlin
On Mon, 2004-08-30 at 15:02, Bruno Wolff III wrote: > On Mon, Aug 30, 2004 at 14:46:37 -0300, > > Im having a weird problem here. I have a table w/ ~180.000 rows and I > > want to select those where c > 0 or d > 0 (there only a few of those on > > the table) > > I indexed columns c and d (separately) but this query used the slow > > seqscan instead of the index scan: > > Postgres doesn't 'or' bitmaps derived from two indexes. You might have more > luck using a combined index. With combined index, you mean a multiple column index? From http://www.postgresql.org/docs/7.4/interactive/indexes-multicolumn.html "Multicolumn indexes can only be used if the clauses involving the indexed columns are joined with AND. For instance, SELECT name FROM test2 WHERE major = constant OR minor = constant; cannot make use of the index test2_mm_idx defined above to look up both columns. (It can be used to look up only the major column, however.) " But I need something like: select * from t where c<>0 or d<>0;
Attachment
On Mon, 2004-08-30 at 15:06, Merlin Moncure wrote: > create function is_somethingable (ctype, dtype) returns boolean as Thanks, but I would prefer a simpler solution. I would like to know why this uses a seqscan instead of an index scan: create index t_idx on t((c+d)); select * from t where c+d > 0;
Attachment
> On Mon, 2004-08-30 at 15:06, Merlin Moncure wrote: > > create function is_somethingable (ctype, dtype) returns boolean as > > Thanks, but I would prefer a simpler solution. > > I would like to know why this uses a seqscan instead of an index scan: > > create index t_idx on t((c+d)); > select * from t where c+d > 0; > hmmm, please define simple. Using a functional index you can define an index around the way you access the data. There is no faster or better way to do it...this is a mathematical truth, not a problem with the planner. Why not use the right tool for the job? A boolean index is super-efficient both in disk space and cache utilization. Multiple column indexes are useless for 'or' combinations! (however they are a huge win for 'and' combinations because you don't have to merge). With an 'or' expression, the planner must use one index or the other, or use both and merge the results. When and what the planner uses is an educated guess based on statistics. Also, your function can be changed...why fill all your queries with Boolean cruft when you can abstract it into the database and reap the speed savings at the same time? I think it's time to rethink the concept of 'simple'. Constructive criticism all, Merlin
On Mon, Aug 30, 2004 at 15:07:15 -0300, Martin Sarsale <martin@emepe3.net> wrote: > > With combined index, you mean a multiple column index? > From > http://www.postgresql.org/docs/7.4/interactive/indexes-multicolumn.html You are right, a multicolumn index doesn't help for 'or'.
>> create index t_idx on t((c+d)); >> select * from t where c+d > 0; Why not : select ((select * from t where c<>0::bigint) UNION (select * from t where d<>0::bigint)) group by whatever; or someting ?
On Mon, 30 Aug 2004, Martin Sarsale wrote: > On Mon, 2004-08-30 at 15:06, Merlin Moncure wrote: > > create function is_somethingable (ctype, dtype) returns boolean as > > Thanks, but I would prefer a simpler solution. > > I would like to know why this uses a seqscan instead of an index scan: > > create index t_idx on t((c+d)); > select * from t where c+d > 0; As a geuss, since 7.4 and earlier have no statistics on the distribution of c+d it has to guess about how likely that is to be true and is probably overestimating. 8.0beta might handle this better.
Another option here is to use a partial index. You can index on some other column -- perhaps the column you want the results ordered by where the where clause is true. Something like: create index t_idx on t (name) where c>0 and d>0; then any select with a matching where clause can use the index: select * from t where c>0 and d>0 order by name Could scan the index and not even have to sort on name. -- greg
Martin Sarsale <martin@emepe3.net> writes: > I indexed columns c and d (separately) but this query used the slow > seqscan instead of the index scan: > select * from t where c<>0 or d<>0; > After playing some time, I noticed that if I change the "or" for an > "and", pg used the fast index scan (but the query w/ 'and' was not what > I as looking for). I don't think so. <> is not an indexable operator --- it appears nowhere in the index operator classes. It would help if you showed us *exactly* what you did instead of a not-very-accurate filtered version. I'm going to assume that you meant > ... > Now: I've no idea why it chooses to use a seq scan instead of the index > scan (yes, I've just vacuum analyzed the table before running the > query). Because 7.4 doesn't have statistics about expression indexes, so it has no idea that there are only a few rows with c+d > 0. What I'd suggest is select * from t where c>0 union select * from t where d>0 with separate indexes on c and d. Another possibility is a partial index on exactly the condition you want: create index nonzero on t(c) where c>0 or d>0; although I'm not certain if 7.4 has enough stats to recognize this as a win. regards, tom lane
> Using a functional index you can define an index around the way you > access the data. There is no faster or better way to do it...this is a > mathematical truth, not a problem with the planner. Why not use the > right tool for the job? A boolean index is super-efficient both in disk > space and cache utilization. Thanks for your constructive criticism, you're absolutely right. I had to modify your "return" for a "select": create function rankeable (bigint, bigint) returns boolean as ' select case when $1 > 0 or $2 > 0 then true else false end;' language sql immutable; and it works great.
Attachment
On Mon, 30 Aug 2004, Martin Sarsale wrote: > "Multicolumn indexes can only be used if the clauses involving the > indexed columns are joined with AND. For instance, > > SELECT name FROM test2 WHERE major = constant OR minor = constant; You can use DeMorgan's Theorem to transform an OR clause to an AND clause. In general: A OR B <=> NOT ((NOT A) AND (NOT B)) So: > But I need something like: > > select * from t where c<>0 or d<>0; select * from t where not (c=0 and d=0); I haven't actually tried to see if postgresql would do anything interesting after such a transformation.
> On Mon, 30 Aug 2004, Martin Sarsale wrote: > > "Multicolumn indexes can only be used if the clauses involving the > > indexed columns are joined with AND. For instance, > > > > SELECT name FROM test2 WHERE major = constant OR minor = constant; > > You can use DeMorgan's Theorem to transform an OR clause to an AND clause. > > In general: > A OR B <=> NOT ((NOT A) AND (NOT B)) > > So: > > > But I need something like: > > > > select * from t where c<>0 or d<>0; > > select * from t where not (c=0 and d=0); > > I haven't actually tried to see if postgresql would do anything > interesting after such a transformation. That made me really curious. I ran a quick test and it turns out the server used dm's theorem to convert the expression back to 'or' case. Explain output (see below to set up the test case for this stmnt): esp=# explain analyze select * from millions where not (value1 <> 500000 and value2 <> 200000); QUERY PLAN ------------------------------------------------------------------------ ---------------------------- -------------------------------------- Index Scan using millions_1_idx, millions_2_idx on millions (cost=0.00..12.01 rows=2 width=8) (act ual time=0.000..0.000 rows=2 loops=1) Index Cond: ((value1 = 500000) OR (value2 = 200000)) Total runtime: 0.000 ms (3 rows) drop table tens; drop table millions; create table tens(value int); create table millions(value1 int, value2 int); insert into tens values (0); insert into tens values (1); insert into tens values (2); insert into tens values (3); insert into tens values (4); insert into tens values (5); insert into tens values (6); insert into tens values (7); insert into tens values (8); insert into tens values (9); insert into millions select ones.value + (tens.value * 10) + (hundreds.value * 100) + (thousands.value * 1000) + (tenthousands.value * 10000) + (hundredthousands.value * 100000) from tens ones, tens tens, tens hundreds, tens thousands, tens tenthousands, tens hundredthousands; update millions set value2 = value1; create index millions_idx1 on millions(value1); create index millions_idx2 on millions(value2); create index millions_idx12 on millions(value1, value2); vacuum analyze millions;