Thread: Index usage for BYTEA column in OR/IN clause
Is it true that the planner currently doesn't utilize index for BYTEA column in OR or IN clause? -- b is an indexed BYTEA column explain select * from t where b='foo'; -- index scan explain select * from t where b like 'f%'; -- index explain select * from t where b='foo' or b='bar'; -- seq scan explain select * from t where b='foo' or b like 'b%'; -- seq explain select * from t where b like 'f%' or b like 'b%'; -- seq explain select * from t where b in ('foo','bar'); -- seq Currently I'm setting enable_seqscan to off for these... -- dave
On Sat, Mar 27, 2004 at 21:52:45 +0700, David Garamond <lists@zara.6.isreserved.com> wrote: > Is it true that the planner currently doesn't utilize index for BYTEA > column in OR or IN clause? Without seeing the explain analyse output for these queries it is going to be hard to say why sequential scans were used in some cases. If the planner estimates it will be visiting a substantial fraction of rows in a table (something like 5 or 10%) then it will use a sequential scan because this will be faster. Postgres doesn't use bit mapping to speed up searches on or'd conditions, so that sequential scans are going to look even better when compared to doing multiple index scans. However, I would have expected the queries below to use index scans on real tables where the b column was unique or nearly so. My guess is that you tried this using toy tables and that for them a sequential scan could easily be faster. > > -- b is an indexed BYTEA column > > explain select * from t where b='foo'; -- index scan > explain select * from t where b like 'f%'; -- index > > explain select * from t where b='foo' or b='bar'; -- seq scan > explain select * from t where b='foo' or b like 'b%'; -- seq > explain select * from t where b like 'f%' or b like 'b%'; -- seq > explain select * from t where b in ('foo','bar'); -- seq > > Currently I'm setting enable_seqscan to off for these... > > -- > dave > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
David Garamond <lists@zara.6.isreserved.com> writes: > Is it true that the planner currently doesn't utilize index for BYTEA > column in OR or IN clause? No. BYTEA is the same as any other datatype. regards, tom lane
The table contain +- 1 mil records, all of the actual version of the queries below return < 10 rows, so an index should be used. Using an index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is no good reason why a seq scan should be used, especially in a case of b='foo' or b='bar'. Bruno Wolff III wrote: > Without seeing the explain analyse output for these queries it is going > to be hard to say why sequential scans were used in some cases. > > If the planner estimates it will be visiting a substantial fraction of > rows in a table (something like 5 or 10%) then it will use a sequential > scan because this will be faster. > > Postgres doesn't use bit mapping to speed up searches on or'd conditions, > so that sequential scans are going to look even better when compared to > doing multiple index scans. > > However, I would have expected the queries below to use index scans > on real tables where the b column was unique or nearly so. My guess > is that you tried this using toy tables and that for them a sequential > scan could easily be faster. > > >> -- b is an indexed BYTEA column >> >> explain select * from t where b='foo'; -- index scan >> explain select * from t where b like 'f%'; -- index >> >> explain select * from t where b='foo' or b='bar'; -- seq scan >> explain select * from t where b='foo' or b like 'b%'; -- seq >> explain select * from t where b like 'f%' or b like 'b%'; -- seq >> explain select * from t where b in ('foo','bar'); -- seq >> >>Currently I'm setting enable_seqscan to off for these... -- dave
David Garamond <lists@zara.6.isreserved.com> writes: > The table contain +- 1 mil records, all of the actual version of the > queries below return < 10 rows, so an index should be used. Using an > index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is > no good reason why a seq scan should be used, especially in a case of > b='foo' or b='bar'. [shrug...] We can't possibly diagnose a bad-plan-choice problem with the amount of information you've provided. See http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines regards, tom lane
Tom Lane wrote: > David Garamond <lists@zara.6.isreserved.com> writes: > >>The table contain +- 1 mil records, all of the actual version of the >>queries below return < 10 rows, so an index should be used. Using an >>index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is >>no good reason why a seq scan should be used, especially in a case of >>b='foo' or b='bar'. > > [shrug...] We can't possibly diagnose a bad-plan-choice problem with > the amount of information you've provided. See > http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines Okay, my bad. How about this: 1. script to create the test table (a 1mil-record table; each record contains 1-40 random bytes): #!/usr/bin/ruby -rpostgres ROWS = 1_000_000 conn = PGconn.connect("/tmp", 5432, "", "", "...", "...", "...") conn.exec("CREATE TABLE t (b BYTEA NOT NULL)") conn.exec("CREATE INDEX i_t_b ON t(b)") (1..ROWS).each { |i| b = (1..rand(40)+1).collect{"\\\\"+rand(256).to_s(8).rjust(3,"0")} if i % 1000 == 1; conn.exec("BEGIN"); end conn.exec("INSERT INTO t VALUES ('#{b}')") if i % 1000 == 0; conn.exec("COMMIT"); puts "#{i}/1000000..."; end } 2. output of explain analyze: => explain analyze select * from t where b='aa'; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32) (actual time=0.062..0.062 rows=0 loops=1) Index Cond: (b = 'aa'::bytea) Total runtime: 0.166 ms (3 rows) Time: 19.372 ms => explain analyze select * from t where b='aa' or b='ab'; QUERY PLAN ---------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual time=6857.151..6857.151 rows=0 loops=1) Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea)) Total runtime: 6857.345 ms (3 rows) Time: 6864.526 ms => explain analyze select * from t where b like 'aa%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32) (actual time=0.682..15.763 rows=2 loops=1) Index Cond: ((b >= 'aa'::bytea) AND (b < 'ab'::bytea)) Filter: (b ~~ 'aa%'::bytea) Total runtime: 15.935 ms (4 rows) Time: 29.432 ms => explain analyze select * from t where b like 'aa%' or b like 'ab%'; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual time=636.738..7239.460 rows=3 loops=1) Filter: ((b ~~ 'aa%'::bytea) OR (b ~~ 'ab%'::bytea)) Total runtime: 7239.758 ms (3 rows) Time: 7251.326 ms => explain analyze select * from t where b in ('aa','ab'); QUERY PLAN ---------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual time=7055.390..7055.390 rows=0 loops=1) Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea)) Total runtime: 7055.574 ms (3 rows) Time: 7063.942 ms -- dave
David Garamond <lists@zara.6.isreserved.com> writes: > 1. script to create the test table (a 1mil-record table; each record > contains 1-40 random bytes): This script is lacking a VACUUM or ANALYZE command, so the planner doesn't know how large the table is. Note the ridiculously small cost estimates in EXPLAIN ... regards, tom lane
Tom Lane wrote: > This script is lacking a VACUUM or ANALYZE command, so the planner > doesn't know how large the table is. Note the ridiculously small > cost estimates in EXPLAIN ... I see, I never knew about having to VACUUM/ANALYZE first. Thanks. -- dave