Re: Digesting explain analyze - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: Digesting explain analyze
Date
Msg-id Pine.LNX.4.64.1001071122100.23520@sn.sai.msu.ru
Whole thread Raw
In response to Re: Digesting explain analyze  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Jesper,

the whole idea of bitmap index scan is to optimize heap access, so it ruins
any ordering, returned by index. That's why our new KNNGist, which returned
ordered index tuples doesn't supports bitmap index scan (note, this is only
for knn search).

Oleg

On Wed, 6 Jan 2010, Robert Haas wrote:

> On Wed, Jan 6, 2010 at 2:10 PM, Jesper Krogh <jesper@krogh.cc> wrote:
> > Hi.
> >
> > I have a table that consists of somewhere in the magnitude of 100.000.000
> > rows and all rows are of this tuples
> >
> > (id1,id2,evalue);
> >
> > Then I'd like to speed up a query like this:
> >
> > explain analyze select id from table where id1 =3D 2067 or id2 =3D 2067 o=
> rder
> > by evalue asc limit 100;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY PLAN
> > -------------------------------------------------------------------------=
> ----------------------------------------------------------------------
> > =A0Limit =A0(cost=3D1423.28..1423.28 rows=3D100 width=3D12) (actual
> > time=3D2.565..2.567 rows=3D100 loops=3D1)
> > =A0 -> =A0Sort =A0(cost=3D1423.28..1424.54 rows=3D505 width=3D12) (actual
> > time=3D2.560..2.560 rows=3D100 loops=3D1)
> > =A0 =A0 =A0 =A0 Sort Key: evalue
> > =A0 =A0 =A0 =A0 Sort Method: =A0top-N heapsort =A0Memory: 25kB
> > =A0 =A0 =A0 =A0 -> =A0Bitmap Heap Scan on table =A0(cost=3D16.58..1420.75=
>  rows=3D505
> > width=3D12) (actual time=3D0.709..1.752 rows=3D450 loops=3D1)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 Recheck Cond: ((id1 =3D 2067) OR (id2 =3D 206=
> 7))
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0BitmapOr =A0(cost=3D16.58..16.58 rows=
> =3D506 width=3D0) (actual
> > time=3D0.676..0.676 rows=3D0 loops=3D1)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Bitmap Index Scan on id1_ev=
> alue_idx
> > (cost=3D0.00..11.44 rows=3D423 width=3D0) (actual
> > time=3D0.599..0.599 rows=3D450 loops=3D1)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Index Cond: (id1_id =
> =3D 2067)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Bitmap Index Scan on id2_ev=
> alue_idx
> > (cost=3D0.00..4.89 rows=3D83 width=3D0) (actual
> > time=3D0.070..0.070 rows=3D1 loops=3D1)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Index Cond: (id2_id =
> =3D 2067)
> > =A0Total runtime: 2.642 ms
> > (12 rows)
> >
> >
> > What I had expected was to see the "Bitmap Index Scan on id1_evalue_idx"
> > to chop it off at a "limit 1". The inner sets are on average 3.000 for
> > both id1 and id2 and a typical limit would be 100, so if I could convince
> > postgresql to not fetch all of them then I would reduce the set retrieved
> > by around 60. The dataset is quite large so the random query is not very
> > likely to be hitting the same part of the dataset again, so there is going
> > to be a fair amount of going to disk.,
> >
> > I would also mean that using it in a for loop in a stored-procedure in
> > plpgsql it would not get any benefit from the CURSOR effect?
> >
> > I actually tried to stuff id1,id2 into an array and do a GIST index on the
> > array,evalue hoping that it directly would satisfy this query.. it used
> > the GIST index fetch the rows the post-sorting and limit on the set.
> >
> > What it boils down to is more or less:
> >
> > Does a "bitmap index scan" support ordering and limit ?
> > Does a "multicolummn gist index" support ordering and limit ?
> >
> > Have I missed anything that can hugely speed up fetching these (typically
> > 100 rows) from the database.
>
> Bitmap index scans always return all the matching rows.  It would be
> nice if they could fetch them in chunks for queries like this, but
> they can't.  I am not sure whether there's any way to make GIST do
> what you want.
>
> You might try something like this (untested):
>
> SELECT * FROM (
>    select id from table where id1 =3D 2067 order by evalue asc limit 100
>    union all
>    select id from table where id2 =3D 2067 order by evalue asc limit 100
> ) x ORDER BY evalue LIMIT 100
>
> If you have an index by (id1, evalue) and by (id2, evalue) then I
> would think this would be pretty quick, as it should do two index
> scans (not bitmap index scans) to fetch 100 rows for each, then append
> the results, sort them, and then limit again.
>
> ...Robert
>
> --=20
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Massive table (500M rows) update nightmare
Next
From: Matthew Wakeling
Date:
Subject: Re: Digesting explain analyze