Thread: Index usage for BYTEA column in OR/IN clause

Index usage for BYTEA column in OR/IN clause

From
David Garamond
Date:
Is it true that the planner currently doesn't utilize index for BYTEA
column in OR or IN clause?

  -- b is an indexed BYTEA column

  explain select * from t where b='foo';                    -- index scan
  explain select * from t where b like 'f%';                -- index

  explain select * from t where b='foo' or b='bar';         -- seq scan
  explain select * from t where b='foo' or b like 'b%';     -- seq
  explain select * from t where b like 'f%' or b like 'b%'; -- seq
  explain select * from t where b in ('foo','bar');         -- seq

Currently I'm setting enable_seqscan to off for these...

--
dave

Re: Index usage for BYTEA column in OR/IN clause

From
Bruno Wolff III
Date:
On Sat, Mar 27, 2004 at 21:52:45 +0700,
  David Garamond <lists@zara.6.isreserved.com> wrote:
> Is it true that the planner currently doesn't utilize index for BYTEA
> column in OR or IN clause?

Without seeing the explain analyse output for these queries it is going
to be hard to say why sequential scans were used in some cases.

If the planner estimates it will be visiting a substantial fraction of
rows in a table (something like 5 or 10%) then it will use a sequential
scan because this will be faster.

Postgres doesn't use bit mapping to speed up searches on or'd conditions,
so that sequential scans are going to look even better when compared to
doing multiple index scans.

However, I would have expected the queries below to use index scans
on real tables where the b column was unique or nearly so. My guess
is that you tried this using toy tables and that for them a sequential
scan could easily be faster.

>
>  -- b is an indexed BYTEA column
>
>  explain select * from t where b='foo';                    -- index scan
>  explain select * from t where b like 'f%';                -- index
>
>  explain select * from t where b='foo' or b='bar';         -- seq scan
>  explain select * from t where b='foo' or b like 'b%';     -- seq
>  explain select * from t where b like 'f%' or b like 'b%'; -- seq
>  explain select * from t where b in ('foo','bar');         -- seq
>
> Currently I'm setting enable_seqscan to off for these...
>
> --
> dave
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Index usage for BYTEA column in OR/IN clause

From
Tom Lane
Date:
David Garamond <lists@zara.6.isreserved.com> writes:
> Is it true that the planner currently doesn't utilize index for BYTEA
> column in OR or IN clause?

No.  BYTEA is the same as any other datatype.

            regards, tom lane

Re: Index usage for BYTEA column in OR/IN clause

From
David Garamond
Date:
The table contain +- 1 mil records, all of the actual version of the
queries below return < 10 rows, so an index should be used. Using an
index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
no good reason why a seq scan should be used, especially in a case of
b='foo' or b='bar'.



Bruno Wolff III wrote:
> Without seeing the explain analyse output for these queries it is going
> to be hard to say why sequential scans were used in some cases.
>
> If the planner estimates it will be visiting a substantial fraction of
> rows in a table (something like 5 or 10%) then it will use a sequential
> scan because this will be faster.
>
> Postgres doesn't use bit mapping to speed up searches on or'd conditions,
> so that sequential scans are going to look even better when compared to
> doing multiple index scans.
>
> However, I would have expected the queries below to use index scans
> on real tables where the b column was unique or nearly so. My guess
> is that you tried this using toy tables and that for them a sequential
> scan could easily be faster.
>
>
>> -- b is an indexed BYTEA column
>>
>> explain select * from t where b='foo';                    -- index scan
>> explain select * from t where b like 'f%';                -- index
>>
>> explain select * from t where b='foo' or b='bar';         -- seq scan
>> explain select * from t where b='foo' or b like 'b%';     -- seq
>> explain select * from t where b like 'f%' or b like 'b%'; -- seq
>> explain select * from t where b in ('foo','bar');         -- seq
>>
>>Currently I'm setting enable_seqscan to off for these...

--
dave


Re: Index usage for BYTEA column in OR/IN clause

From
Tom Lane
Date:
David Garamond <lists@zara.6.isreserved.com> writes:
> The table contain +- 1 mil records, all of the actual version of the
> queries below return < 10 rows, so an index should be used. Using an
> index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
> no good reason why a seq scan should be used, especially in a case of
> b='foo' or b='bar'.

[shrug...]  We can't possibly diagnose a bad-plan-choice problem with
the amount of information you've provided.  See
http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines

            regards, tom lane

Re: Index usage for BYTEA column in OR/IN clause

From
David Garamond
Date:
Tom Lane wrote:
> David Garamond <lists@zara.6.isreserved.com> writes:
>
>>The table contain +- 1 mil records, all of the actual version of the
>>queries below return < 10 rows, so an index should be used. Using an
>>index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
>>no good reason why a seq scan should be used, especially in a case of
>>b='foo' or b='bar'.
>
> [shrug...]  We can't possibly diagnose a bad-plan-choice problem with
> the amount of information you've provided.  See
> http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines

Okay, my bad. How about this:


1. script to create the test table (a 1mil-record table; each record
contains 1-40 random bytes):

#!/usr/bin/ruby -rpostgres
ROWS = 1_000_000
conn = PGconn.connect("/tmp", 5432, "", "", "...", "...", "...")
conn.exec("CREATE TABLE t (b BYTEA NOT NULL)")
conn.exec("CREATE INDEX i_t_b ON t(b)")
(1..ROWS).each { |i|
   b = (1..rand(40)+1).collect{"\\\\"+rand(256).to_s(8).rjust(3,"0")}
   if i % 1000 == 1; conn.exec("BEGIN"); end
   conn.exec("INSERT INTO t VALUES ('#{b}')")
   if i % 1000 == 0; conn.exec("COMMIT"); puts "#{i}/1000000..."; end
}


2. output of explain analyze:

=> explain analyze select * from t where b='aa';
                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------
  Index Scan using i_t_v on t  (cost=0.00..17.07 rows=6 width=32)
(actual time=0.062..0.062 rows=0 loops=1)
    Index Cond: (b = 'aa'::bytea)
  Total runtime: 0.166 ms
(3 rows)

Time: 19.372 ms

=> explain analyze select * from t where b='aa' or b='ab';
                                              QUERY PLAN
----------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..25.00 rows=10 width=32) (actual
time=6857.151..6857.151 rows=0 loops=1)
    Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
  Total runtime: 6857.345 ms
(3 rows)

Time: 6864.526 ms

=> explain analyze select * from t where b like 'aa%';
                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------
  Index Scan using i_t_v on t  (cost=0.00..17.07 rows=6 width=32)
(actual time=0.682..15.763 rows=2 loops=1)
    Index Cond: ((b >= 'aa'::bytea) AND (b < 'ab'::bytea))
    Filter: (b ~~ 'aa%'::bytea)
  Total runtime: 15.935 ms
(4 rows)

Time: 29.432 ms

=> explain analyze select * from t where b like 'aa%' or b like 'ab%';
                                             QUERY PLAN
---------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..25.00 rows=10 width=32) (actual
time=636.738..7239.460 rows=3 loops=1)
    Filter: ((b ~~ 'aa%'::bytea) OR (b ~~ 'ab%'::bytea))
  Total runtime: 7239.758 ms
(3 rows)

Time: 7251.326 ms

=> explain analyze select * from t where b in ('aa','ab');
                                              QUERY PLAN
----------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..25.00 rows=10 width=32) (actual
time=7055.390..7055.390 rows=0 loops=1)
    Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
  Total runtime: 7055.574 ms
(3 rows)

Time: 7063.942 ms

--
dave


Re: Index usage for BYTEA column in OR/IN clause

From
Tom Lane
Date:
David Garamond <lists@zara.6.isreserved.com> writes:
> 1. script to create the test table (a 1mil-record table; each record
> contains 1-40 random bytes):

This script is lacking a VACUUM or ANALYZE command, so the planner
doesn't know how large the table is.  Note the ridiculously small
cost estimates in EXPLAIN ...

            regards, tom lane

Re: Index usage for BYTEA column in OR/IN clause

From
David Garamond
Date:
Tom Lane wrote:
> This script is lacking a VACUUM or ANALYZE command, so the planner
> doesn't know how large the table is.  Note the ridiculously small
> cost estimates in EXPLAIN ...

I see, I never knew about having to VACUUM/ANALYZE first. Thanks.

--
dave