query very slow when enable_seqscan=on - Mailing list pgsql-bugs

From Tomasz Ostrowski
Subject query very slow when enable_seqscan=on
Date
Msg-id 20060703203105.GA2801@batory.org.pl
Whole thread Raw
Responses Re: query very slow when enable_seqscan=on  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: query very slow when enable_seqscan=on  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-bugs
I have a very slow query when enable_seqscan=on and very fast when
enable_seqscan=off. My schema looks like this (relevant columns
only):

create table organizations (
    organization_id serial primary key,
    organization varchar(200) not null,
    organization_location varchar(55) not null
    -- and several irrelevant columns
); -- about 9000 records
create table persons (
    person_id serial primary key,
    surname varchar(50) not null,
    forename varchar(35) not null,
    organization_id int references organizations,
        -- and several irrelevant columns
); -- about 6500 records
create index persons_surname_forename_person_id on persons (
    surname,
    forename,
    lpad(person_id,10,'0')
); -- I was hoping this would speed up array comparisions


The query looking for a position of a person of given person_id in a
list sorted by surname, forename and person_id and filtered by some
criteria. In this example person_id=1, forename~*'to' (about 400
people) and organization_location~*'warszawa' (about 2000
organizations):

select count(*) as position from (select
        person_id, surname, forename
        from persons
        natural left join organizations
        where forename~*'to' and organization_location~*'warszawa'
    ) as person_filter
        where array[surname, forename, lpad(person_id,10,'0')]
        <
        (select array[surname, forename, lpad(person_id,10,'0')]
            from persons where person_id=1);

This query take about 30 seconds when enable_seqscan=on and 65
milliseconds when off.

When enable_seqscan=on:
 Aggregate  (cost=785.72..785.73 rows=1 width=0) (actual time=27948.955..27948.956 rows=1 loops=1)
   InitPlan
     ->  Index Scan using persons_pkey on persons  (cost=0.00..3.11 rows=1 width=26) (actual time=0.019..0.019 rows=0
loops=1)
           Index Cond: (person_id = 1)
   ->  Nested Loop  (cost=0.00..782.60 rows=1 width=0) (actual time=27948.939..27948.939 rows=0 loops=1)
         Join Filter: ("inner".organization_id = "outer".organization_id)
         ->  Seq Scan on organization  (cost=0.00..480.95 rows=1 width=4) (actual time=0.071..69.702 rows=1892 loops=1)
               Filter: ((organization_location)::text ~* 'warszawa'::text)
         ->  Seq Scan on persons  (cost=0.00..296.10 rows=444 width=4) (actual time=14.720..14.720 rows=0 loops=1892)
               Filter: (((forename)::text ~* 'to'::text) AND (ARRAY[surname, forename, (lpad((person_id)::text, 10,
'0'::text))::charactervarying] < $0)) 
 Total runtime: 27949.106 ms

When enable_seqscan=off:
 Aggregate  (cost=100001710.26..100001710.27 rows=1 width=0) (actual time=66.788..66.789 rows=1 loops=1)
   InitPlan
     ->  Index Scan using persons_pkey on persons  (cost=0.00..3.11 rows=1 width=26) (actual time=0.019..0.019 rows=0
loops=1)
           Index Cond: (person_id = 1)
   ->  Hash Join  (cost=100001408.81..100001707.14 rows=1 width=0) (actual time=66.756..66.756 rows=0 loops=1)
         Hash Cond: ("outer".organization_id = "inner".organization_id)
         ->  Seq Scan on persons  (cost=100000000.00..100000296.10 rows=444 width=4) (actual time=14.972..14.972 rows=0
loops=1)
               Filter: (((forename)::text ~* 'to'::text) AND (ARRAY[surname, forename, (lpad((person_id)::text, 10,
'0'::text))::charactervarying] < $0)) 
         ->  Hash  (cost=1408.81..1408.81 rows=1 width=4) (actual time=51.763..51.763 rows=1892 loops=1)
               ->  Index Scan using organizations_pkey on organizations  (cost=0.00..1408.81 rows=1 width=4) (actual
time=0.049..48.233rows=1892 loops=1) 
                     Filter: ((organization_location)::text ~* 'warszawa'::text)
 Total runtime: 66.933 ms


Database is properly analyzed. postgresql-8.1.4 on Fedora Core 4.

Regards
Tometzky

PS. Actual table and column names are different (they're in Polish)
but I've translated them for better readability for english-speaking.

PS. I wonder if it makes sense to "enable_seqscan=off" for every client
if a database is small enough to fit in OS cache.
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh

pgsql-bugs by date:

Previous
From: Simon Riggs
Date:
Subject: Re: BUG #2511: violation of primary key on update with 2
Next
From: Tom Lane
Date:
Subject: Re: BUG #2510: ERROR: out of memory DETAIL: Failed on request of size 825242672.