Thread: many similar indexbased selects are extremely slow

many similar indexbased selects are extremely slow

From

peter pilsl

Date:

01 January 2005, 23:55:25

psql8:

I use a bigger psql-table to store information and keep an id-value of
each row in memory of my application for faster access.
My applications is able to calculate a list of needed id's in very short
time and then wants to retrieve all rows corresponding to this id's.

So in fact I perform a lot of operations like:

select field1,field2,field3 from mytable where id=XX;

There is a index on the id-field and the id-field is of type OID, so
everything should be quite fast. Unfortunately it is not.

On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000
rows. In this testscenario I only fetch the OID and no other col.
I dont understand this. Am I expecting far to much? Is 10seconds for the
retrieval of 10000 OIDs a fine value? I want it to be less than one
second and from my experience with postgres this operation is extremely
slow compared to the impressive speed of most other operations.

I also tried to use the IN-operator, which is much more slower. Is there
any other way to speed up things? I can order the list of id's to
retrieve in my application if there is a way to tell psql not to search
the whole index every time but somehow "do better".

If it is of any interest, here is the table. The id is stored as id_artikel.


                                        Table "public.artikelindex"
         Column         |            Type             |
     Modifiers
-----------------------+-----------------------------+----------------------------------------------------
  autor_artikel         | text                        |
  titel_artikel         | text                        |
  jahrgang_zeitschrift  | integer                     |
  jahr_zeitschrift      | character varying(20)       |
  heftnummer            | character varying(30)       |
  seitenzahl_artikel    | character varying(30)       |
  bemerkungen_artikel   | text                        |
  deskriptoren_alt      | text                        |
  deskriptoren_neu      | text                        |
  personennamen_artikel | text                        |
  orte_artikel          | text                        |
  id_artikel            | oid                         |
  id_titel              | oid                         |
  cdate                 | timestamp without time zone | default
('now'::text)::timestamp(6) with time zone
  udate                 | timestamp without time zone | default
('now'::text)::timestamp(6) with time zone
  uid                   | oid                         |
  gid                   | oid                         |
  mod                   | boolean                     |
Indexes:
     "id_artikel_idx" btree (id_artikel)
     "id_titel_idx" btree (id_titel)
     "idx_artikelindeax_autor" btree (autor_artikel)
     "idx_artikelindex_fingerprint" btree (id_artikel)
     "idx_artikelindex_jahr" btree (jahrgang_zeitschrift)
     "idx_artikelindex_jahrgang" btree (jahr_zeitschrift)
     "idx_artikelindex_zeitschrift" btree (id_titel)
Rules:
     delete_val AS
     ON DELETE TO artikelindex DO  UPDATE counter SET val = counter.val + 1
   WHERE counter.tab::text = 'artikelindex'::character varying::text
     insert_val AS
     ON INSERT TO artikelindex DO  UPDATE counter SET val = counter.val + 1
   WHERE counter.tab::text = 'artikelindex'::character varying::text
     update_val AS
     ON UPDATE TO artikelindex DO  UPDATE counter SET val = counter.val + 1
   WHERE counter.tab::text = 'artikelindex'::character varying::text


And more: here is my retrieving program. I use perl and the DBI-module
and the following code-snip

--------------------------
my $sth=$dbh->prepare(
        'select OID from artikelindex where id_artikel=?');

foreach (@id) {
   my $ret=$sth->execute($_);
   my $x=$sth->fetchrow_arrayref;
}
-------------------------



thnx a lot for any idea,
peter



--
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl@goldfisch.at

Re: many similar indexbased selects are extremely slow

From

Jeff Davis

Date:

02 January 2005, 01:35:31

Well, first it would be a good idea to see what postgres is actually
doing. Send the output of:

=> EXPLAIN ANALYZE SELECT OID FROM atrikelindex WHERE id_artikel=?;

(where ? is replaced by some id value)

It will either say index lookup or sequential scan since it's just a
select from one table. That will tell you whether the index is being
used or not.

Next, if it is doing something that seems unreasonable, try doing:

=> VACUUM ANALYZE artikelindex;

And we should also look at the EXPLAIN output on the IN query version
like:

=> EXPLAIN ANALYZE SELECT OID FROM artikelindex WHERE id_artikel IN
(?,?,...,?);

Although I'm not sure what kind of performance to expect if you have 10k
values in that list.

Also, what version are you using? And how many rows in the table? Is it
regularly vacuumed? Are there a lot of updates/deletes going to that
table?

I suspect that the fastest way that postgres can get you those rows
would be the IN query on a recent version of postgresql. It may choose a
sequential scan, which likely would be good since one seq scan will
hopefully take less than 10 seconds.

Regards,
    Jeff Davis


On Sun, 2005-01-02 at 00:52 +0100, peter pilsl wrote:
> psql8:
>
> I use a bigger psql-table to store information and keep an id-value of
> each row in memory of my application for faster access.
> My applications is able to calculate a list of needed id's in very short
> time and then wants to retrieve all rows corresponding to this id's.
>
> So in fact I perform a lot of operations like:
>
> select field1,field2,field3 from mytable where id=XX;
>
> There is a index on the id-field and the id-field is of type OID, so
> everything should be quite fast. Unfortunately it is not.
>
> On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000
> rows. In this testscenario I only fetch the OID and no other col.
> I dont understand this. Am I expecting far to much? Is 10seconds for the
> retrieval of 10000 OIDs a fine value? I want it to be less than one
> second and from my experience with postgres this operation is extremely
> slow compared to the impressive speed of most other operations.
>
> I also tried to use the IN-operator, which is much more slower. Is there
> any other way to speed up things? I can order the list of id's to
> retrieve in my application if there is a way to tell psql not to search
> the whole index every time but somehow "do better".
>
> If it is of any interest, here is the table. The id is stored as id_artikel.
>
>
>                                         Table "public.artikelindex"
>          Column         |            Type             |
>      Modifiers
> -----------------------+-----------------------------+----------------------------------------------------
>   autor_artikel         | text                        |
>   titel_artikel         | text                        |
>   jahrgang_zeitschrift  | integer                     |
>   jahr_zeitschrift      | character varying(20)       |
>   heftnummer            | character varying(30)       |
>   seitenzahl_artikel    | character varying(30)       |
>   bemerkungen_artikel   | text                        |
>   deskriptoren_alt      | text                        |
>   deskriptoren_neu      | text                        |
>   personennamen_artikel | text                        |
>   orte_artikel          | text                        |
>   id_artikel            | oid                         |
>   id_titel              | oid                         |
>   cdate                 | timestamp without time zone | default
> ('now'::text)::timestamp(6) with time zone
>   udate                 | timestamp without time zone | default
> ('now'::text)::timestamp(6) with time zone
>   uid                   | oid                         |
>   gid                   | oid                         |
>   mod                   | boolean                     |
> Indexes:
>      "id_artikel_idx" btree (id_artikel)
>      "id_titel_idx" btree (id_titel)
>      "idx_artikelindeax_autor" btree (autor_artikel)
>      "idx_artikelindex_fingerprint" btree (id_artikel)
>      "idx_artikelindex_jahr" btree (jahrgang_zeitschrift)
>      "idx_artikelindex_jahrgang" btree (jahr_zeitschrift)
>      "idx_artikelindex_zeitschrift" btree (id_titel)
> Rules:
>      delete_val AS
>      ON DELETE TO artikelindex DO  UPDATE counter SET val = counter.val + 1
>    WHERE counter.tab::text = 'artikelindex'::character varying::text
>      insert_val AS
>      ON INSERT TO artikelindex DO  UPDATE counter SET val = counter.val + 1
>    WHERE counter.tab::text = 'artikelindex'::character varying::text
>      update_val AS
>      ON UPDATE TO artikelindex DO  UPDATE counter SET val = counter.val + 1
>    WHERE counter.tab::text = 'artikelindex'::character varying::text
>
>
> And more: here is my retrieving program. I use perl and the DBI-module
> and the following code-snip
>
> --------------------------
> my $sth=$dbh->prepare(
>         'select OID from artikelindex where id_artikel=?');
>
> foreach (@id) {
>    my $ret=$sth->execute($_);
>    my $x=$sth->fetchrow_arrayref;
> }
> -------------------------
>
>
>
> thnx a lot for any idea,
> peter
>
>
>
> --
> mag. peter pilsl
> goldfisch.at
> IT-management
> tel +43 699 1 3574035
> fax +43 699 4 3574035
> pilsl@goldfisch.at
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: many similar indexbased selects are extremely slow

From

Jeff Davis

Date:

02 January 2005, 02:14:28

For what it's worth, I put 100k rows into a table in 8.0beta5, and
selected 10k at a time. When doing each SELECT seperately using the
index, it took about 2.5s to do 10k SELECTs. When using an IN query
containing all the id's that I wanted, it took less than a second.

Jeff

On Sat, 2005-01-01 at 17:34 -0800, Jeff Davis wrote:
> Well, first it would be a good idea to see what postgres is actually
> doing. Send the output of:
>
> => EXPLAIN ANALYZE SELECT OID FROM atrikelindex WHERE id_artikel=?;
>
> (where ? is replaced by some id value)
>
> It will either say index lookup or sequential scan since it's just a
> select from one table. That will tell you whether the index is being
> used or not.
>
> Next, if it is doing something that seems unreasonable, try doing:
>
> => VACUUM ANALYZE artikelindex;
>
> And we should also look at the EXPLAIN output on the IN query version
> like:
>
> => EXPLAIN ANALYZE SELECT OID FROM artikelindex WHERE id_artikel IN
> (?,?,...,?);
>
> Although I'm not sure what kind of performance to expect if you have 10k
> values in that list.
>
> Also, what version are you using? And how many rows in the table? Is it
> regularly vacuumed? Are there a lot of updates/deletes going to that
> table?
>
> I suspect that the fastest way that postgres can get you those rows
> would be the IN query on a recent version of postgresql. It may choose a
> sequential scan, which likely would be good since one seq scan will
> hopefully take less than 10 seconds.
>
> Regards,
>     Jeff Davis
>
>
> On Sun, 2005-01-02 at 00:52 +0100, peter pilsl wrote:
> > psql8:
> >
> > I use a bigger psql-table to store information and keep an id-value of
> > each row in memory of my application for faster access.
> > My applications is able to calculate a list of needed id's in very short
> > time and then wants to retrieve all rows corresponding to this id's.
> >
> > So in fact I perform a lot of operations like:
> >
> > select field1,field2,field3 from mytable where id=XX;
> >
> > There is a index on the id-field and the id-field is of type OID, so
> > everything should be quite fast. Unfortunately it is not.
> >
> > On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000
> > rows. In this testscenario I only fetch the OID and no other col.
> > I dont understand this. Am I expecting far to much? Is 10seconds for the
> > retrieval of 10000 OIDs a fine value? I want it to be less than one
> > second and from my experience with postgres this operation is extremely
> > slow compared to the impressive speed of most other operations.
> >
> > I also tried to use the IN-operator, which is much more slower. Is there
> > any other way to speed up things? I can order the list of id's to
> > retrieve in my application if there is a way to tell psql not to search
> > the whole index every time but somehow "do better".
> >
> > If it is of any interest, here is the table. The id is stored as id_artikel.
> >
> >
> >                                         Table "public.artikelindex"
> >          Column         |            Type             |
> >      Modifiers
> > -----------------------+-----------------------------+----------------------------------------------------
> >   autor_artikel         | text                        |
> >   titel_artikel         | text                        |
> >   jahrgang_zeitschrift  | integer                     |
> >   jahr_zeitschrift      | character varying(20)       |
> >   heftnummer            | character varying(30)       |
> >   seitenzahl_artikel    | character varying(30)       |
> >   bemerkungen_artikel   | text                        |
> >   deskriptoren_alt      | text                        |
> >   deskriptoren_neu      | text                        |
> >   personennamen_artikel | text                        |
> >   orte_artikel          | text                        |
> >   id_artikel            | oid                         |
> >   id_titel              | oid                         |
> >   cdate                 | timestamp without time zone | default
> > ('now'::text)::timestamp(6) with time zone
> >   udate                 | timestamp without time zone | default
> > ('now'::text)::timestamp(6) with time zone
> >   uid                   | oid                         |
> >   gid                   | oid                         |
> >   mod                   | boolean                     |
> > Indexes:
> >      "id_artikel_idx" btree (id_artikel)
> >      "id_titel_idx" btree (id_titel)
> >      "idx_artikelindeax_autor" btree (autor_artikel)
> >      "idx_artikelindex_fingerprint" btree (id_artikel)
> >      "idx_artikelindex_jahr" btree (jahrgang_zeitschrift)
> >      "idx_artikelindex_jahrgang" btree (jahr_zeitschrift)
> >      "idx_artikelindex_zeitschrift" btree (id_titel)
> > Rules:
> >      delete_val AS
> >      ON DELETE TO artikelindex DO  UPDATE counter SET val = counter.val + 1
> >    WHERE counter.tab::text = 'artikelindex'::character varying::text
> >      insert_val AS
> >      ON INSERT TO artikelindex DO  UPDATE counter SET val = counter.val + 1
> >    WHERE counter.tab::text = 'artikelindex'::character varying::text
> >      update_val AS
> >      ON UPDATE TO artikelindex DO  UPDATE counter SET val = counter.val + 1
> >    WHERE counter.tab::text = 'artikelindex'::character varying::text
> >
> >
> > And more: here is my retrieving program. I use perl and the DBI-module
> > and the following code-snip
> >
> > --------------------------
> > my $sth=$dbh->prepare(
> >         'select OID from artikelindex where id_artikel=?');
> >
> > foreach (@id) {
> >    my $ret=$sth->execute($_);
> >    my $x=$sth->fetchrow_arrayref;
> > }
> > -------------------------
> >
> >
> >
> > thnx a lot for any idea,
> > peter
> >
> >
> >
> > --
> > mag. peter pilsl
> > goldfisch.at
> > IT-management
> > tel +43 699 1 3574035
> > fax +43 699 4 3574035
> > pilsl@goldfisch.at
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html

Re: many similar indexbased selects are extremely slow

From

Pierre-Frédéric Caillaud

Date:

02 January 2005, 11:03:45

> I use a bigger psql-table to store information and keep an id-value of

    how big ?

> each row in memory of my application for faster access.

    related to the previous question : are you sure there won't be a day
where it won't fit ?

> My applications is able to calculate a list of needed id's in very short
> time and then wants to retrieve all rows corresponding to this id's.

> select field1,field2,field3 from mytable where id=XX;

    Good for one id, see below

> There is a index on the id-field and the id-field is of type OID, so
> everything should be quite fast. Unfortunately it is not.

    Why not use integer (serial) instead of oid ?

> On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000
> rows. In this testscenario I only fetch the OID and no other col.

    I think this is pretty fast. The machine runs 1K queries/s, including
generating the query, passing it to postgres via a socket, parsing it,
executing it, returning one row via a socket, etc. If you want faster
results you'll have to get all your results in one query, and only then
can get it a lot faster (not mentioning reducing your server load by a lot
!)

> I dont understand this. Am I expecting far to much? Is 10seconds for the
> retrieval of 10000 OIDs a fine value? I want it to be less than one

    For 10K queries, it's fast !

> I also tried to use the IN-operator, which is much more slower. Is there

    That's what I'd advise you to use. You should find why it's slow and make
it fast. Why not post the EXPLAIN ANALYZE results for a SELECT * FROM
thetable WHERE id_artikel IN (1000 values) for instance ? WHat plan does
it choose ?


    If all else fails, you can create a set-returning function which will
take an array of id's as its parameter, loop on it, do a SELECT for each
oid, and RETURN NEXT for each result ; then you can process the whole
result set in one query ; but it'll be slower than a propermy optimized IN
query...

Re: many similar indexbased selects are extremely slow

From

Pierre-Frédéric Caillaud

Date:

02 January 2005, 11:07:56

> select field1,field2,field3 from mytable where id=XX;

    For instance, on my machine :

SELECT * FROM bigtable with 2M rows WHERE id IN (list of 500 values)

takes 10 ms.