varchar index joins not working? - Mailing list pgsql-performance

From Adam Gundy
Subject varchar index joins not working?
Date
Msg-id 6f55f1270804092013n73ee1f81t2c6d72cee49655a3@mail.gmail.com
Whole thread Raw
Responses Re: varchar index joins not working?
List pgsql-performance
I'm hitting an unexpected problem with postgres 8.3 - I have some
tables which use varchar(32) for their unique IDs which I'm attempting
to join using some simple SQL:

select *
from group_access, groups
where group_access.groupid = groups.groupid and
         group_access.uid = '7275359408f44591d0717e16890ce335';

there's a unique index on group_access.groupid, and a non-unique index
on groups.groupid. both are non-null.

the problem is: if groupid (in both tables) is varchar, I cannot force
postgres (no matter how hard I try) to do an index scan. it ends up
reading the entire groups table (pretty large!):

                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=8.89..41329.88 rows=119940 width=287) (actual
time=0.202..935.136 rows=981 loops=1)
   Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text)
   ->  Seq Scan on groups  (cost=0.00..31696.48 rows=1123348
width=177) (actual time=0.011..446.091 rows=1125239 loops=1)
   ->  Hash  (cost=8.51..8.51 rows=30 width=110) (actual
time=0.148..0.148 rows=30 loops=1)
         ->  Seq Scan on group_access  (cost=0.00..8.51 rows=30
width=110) (actual time=0.014..0.126 rows=30 loops=1)
               Filter: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
 Total runtime: 935.443 ms
(7 rows)

if I disable seq_scan, I get this:

                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1.47..106189.61 rows=120004 width=287) (actual
time=0.100..1532.353 rows=981 loops=1)
   Merge Cond: ((group_access.groupid)::text = (groups.groupid)::text)
   ->  Index Scan using group_access_pkey on group_access
(cost=0.00..43.91 rows=30 width=110) (actual time=0.044..0.148 rows=30
loops=1)
         Index Cond: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
   ->  Index Scan using groups_1_idx on groups  (cost=0.00..102135.71
rows=1123952 width=177) (actual time=0.031..856.555 rows=1125827
loops=1)
 Total runtime: 1532.880 ms
(6 rows)

it's running an index scan across the entire table (no condition applied) :-(

so, just for the hell of it, I tried making groupid a char(32),
despite repeated assertions in this group that there's no performance
difference between the two:

                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.48..253.85 rows=304 width=291) (actual
time=0.715..22.906 rows=984 loops=1)
   ->  Bitmap Heap Scan on group_access  (cost=4.48..9.86 rows=30
width=111) (actual time=0.372..0.570 rows=30 loops=1)
         Recheck Cond: (uid = '7275359408f44591d0717e16890ce335'::bpchar)
         ->  Bitmap Index Scan on group_access_uid_key
(cost=0.00..4.48 rows=30 width=0) (actual time=0.331..0.331 rows=30
loops=1)
               Index Cond: (uid = '7275359408f44591d0717e16890ce335'::bpchar)
   ->  Index Scan using groups_1_idx on groups  (cost=0.00..7.96
rows=14 width=180) (actual time=0.176..0.396 rows=33 loops=30)
         Index Cond: (groups.groupid = group_access.groupid)
 Total runtime: 26.837 ms
(8 rows)

(this last plan is actually against a smaller test DB, but I get the
same behavior with it, seq scan for varchar or index scan for char,
and the results returned are identical for this query)

the databases are UTF-8, if that makes a difference...

pgsql-performance by date:

Previous
From: "samantha mahindrakar"
Date:
Subject: Re: Performance with temporary table
Next
From: Arjen van der Meijden
Date:
Subject: Re: large tables and simple "= constant" queries using indexes