Digesting explain analyze - Mailing list pgsql-performance

From Jesper Krogh
Subject Digesting explain analyze
Date
Msg-id 4B44E02A.6080109@krogh.cc
Whole thread Raw
Responses Re: Digesting explain analyze
Re: Digesting explain analyze
List pgsql-performance
Hi.

I have a table that consists of somewhere in the magnitude of 100.000.000
rows and all rows are of this tuples

(id1,id2,evalue);

Then I'd like to speed up a query like this:

explain analyze select id from table where id1 = 2067 or id2 = 2067 order
by evalue asc limit 100;
                                                                  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1423.28..1423.28 rows=100 width=12) (actual
time=2.565..2.567 rows=100 loops=1)
   ->  Sort  (cost=1423.28..1424.54 rows=505 width=12) (actual
time=2.560..2.560 rows=100 loops=1)
         Sort Key: evalue
         Sort Method:  top-N heapsort  Memory: 25kB
         ->  Bitmap Heap Scan on table  (cost=16.58..1420.75 rows=505
width=12) (actual time=0.709..1.752 rows=450 loops=1)
               Recheck Cond: ((id1 = 2067) OR (id2 = 2067))
               ->  BitmapOr  (cost=16.58..16.58 rows=506 width=0) (actual
time=0.676..0.676 rows=0 loops=1)
                     ->  Bitmap Index Scan on id1_evalue_idx
(cost=0.00..11.44 rows=423 width=0) (actual
time=0.599..0.599 rows=450 loops=1)
                           Index Cond: (id1_id = 2067)
                     ->  Bitmap Index Scan on id2_evalue_idx
(cost=0.00..4.89 rows=83 width=0) (actual
time=0.070..0.070 rows=1 loops=1)
                           Index Cond: (id2_id = 2067)
 Total runtime: 2.642 ms
(12 rows)


What I had expected was to see the "Bitmap Index Scan on id1_evalue_idx"
to chop it off at a "limit 1". The inner sets are on average 3.000 for
both id1 and id2 and a typical limit would be 100, so if I could convince
postgresql to not fetch all of them then I would reduce the set retrieved
by around 60. The dataset is quite large so the random query is not very
likely to be hitting the same part of the dataset again, so there is going
to be a fair amount of going to disk.,

I would also mean that using it in a for loop in a stored-procedure in
plpgsql it would not get any benefit from the CURSOR effect?

I actually tried to stuff id1,id2 into an array and do a GIST index on the
array,evalue hoping that it directly would satisfy this query.. it used
the GIST index fetch the rows the post-sorting and limit on the set.

What it boils down to is more or less:

Does a "bitmap index scan" support ordering and limit ?
Does a "multicolummn gist index" support ordering and limit ?

Have I missed anything that can hugely speed up fetching these (typically
100 rows) from the database.

--
Jesper



pgsql-performance by date:

Previous
From: Craig James
Date:
Subject: Re: pg_connect takes 3.0 seconds
Next
From: Ron Mayer
Date:
Subject: Re: Digesting explain analyze