Re: Index usage for BYTEA column in OR/IN clause - Mailing list pgsql-general

From David Garamond
Subject Re: Index usage for BYTEA column in OR/IN clause
Date
Msg-id 40671D3E.3040806@zara.6.isreserved.com
Whole thread Raw
In response to Re: Index usage for BYTEA column in OR/IN clause  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Index usage for BYTEA column in OR/IN clause
List pgsql-general
Tom Lane wrote:
> David Garamond <lists@zara.6.isreserved.com> writes:
>
>>The table contain +- 1 mil records, all of the actual version of the
>>queries below return < 10 rows, so an index should be used. Using an
>>index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
>>no good reason why a seq scan should be used, especially in a case of
>>b='foo' or b='bar'.
>
> [shrug...]  We can't possibly diagnose a bad-plan-choice problem with
> the amount of information you've provided.  See
> http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines

Okay, my bad. How about this:


1. script to create the test table (a 1mil-record table; each record
contains 1-40 random bytes):

#!/usr/bin/ruby -rpostgres
ROWS = 1_000_000
conn = PGconn.connect("/tmp", 5432, "", "", "...", "...", "...")
conn.exec("CREATE TABLE t (b BYTEA NOT NULL)")
conn.exec("CREATE INDEX i_t_b ON t(b)")
(1..ROWS).each { |i|
   b = (1..rand(40)+1).collect{"\\\\"+rand(256).to_s(8).rjust(3,"0")}
   if i % 1000 == 1; conn.exec("BEGIN"); end
   conn.exec("INSERT INTO t VALUES ('#{b}')")
   if i % 1000 == 0; conn.exec("COMMIT"); puts "#{i}/1000000..."; end
}


2. output of explain analyze:

=> explain analyze select * from t where b='aa';
                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------
  Index Scan using i_t_v on t  (cost=0.00..17.07 rows=6 width=32)
(actual time=0.062..0.062 rows=0 loops=1)
    Index Cond: (b = 'aa'::bytea)
  Total runtime: 0.166 ms
(3 rows)

Time: 19.372 ms

=> explain analyze select * from t where b='aa' or b='ab';
                                              QUERY PLAN
----------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..25.00 rows=10 width=32) (actual
time=6857.151..6857.151 rows=0 loops=1)
    Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
  Total runtime: 6857.345 ms
(3 rows)

Time: 6864.526 ms

=> explain analyze select * from t where b like 'aa%';
                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------
  Index Scan using i_t_v on t  (cost=0.00..17.07 rows=6 width=32)
(actual time=0.682..15.763 rows=2 loops=1)
    Index Cond: ((b >= 'aa'::bytea) AND (b < 'ab'::bytea))
    Filter: (b ~~ 'aa%'::bytea)
  Total runtime: 15.935 ms
(4 rows)

Time: 29.432 ms

=> explain analyze select * from t where b like 'aa%' or b like 'ab%';
                                             QUERY PLAN
---------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..25.00 rows=10 width=32) (actual
time=636.738..7239.460 rows=3 loops=1)
    Filter: ((b ~~ 'aa%'::bytea) OR (b ~~ 'ab%'::bytea))
  Total runtime: 7239.758 ms
(3 rows)

Time: 7251.326 ms

=> explain analyze select * from t where b in ('aa','ab');
                                              QUERY PLAN
----------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..25.00 rows=10 width=32) (actual
time=7055.390..7055.390 rows=0 loops=1)
    Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
  Total runtime: 7055.574 ms
(3 rows)

Time: 7063.942 ms

--
dave


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Resolution for "ERROR: cannot handle whole-row reference" ?
Next
From: Sean Chittenden
Date:
Subject: Re: Resolution for "ERROR: cannot handle whole-row reference" ?