Thread: many similar indexbased selects are extremely slow
psql8: I use a bigger psql-table to store information and keep an id-value of each row in memory of my application for faster access. My applications is able to calculate a list of needed id's in very short time and then wants to retrieve all rows corresponding to this id's. So in fact I perform a lot of operations like: select field1,field2,field3 from mytable where id=XX; There is a index on the id-field and the id-field is of type OID, so everything should be quite fast. Unfortunately it is not. On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000 rows. In this testscenario I only fetch the OID and no other col. I dont understand this. Am I expecting far to much? Is 10seconds for the retrieval of 10000 OIDs a fine value? I want it to be less than one second and from my experience with postgres this operation is extremely slow compared to the impressive speed of most other operations. I also tried to use the IN-operator, which is much more slower. Is there any other way to speed up things? I can order the list of id's to retrieve in my application if there is a way to tell psql not to search the whole index every time but somehow "do better". If it is of any interest, here is the table. The id is stored as id_artikel. Table "public.artikelindex" Column | Type | Modifiers -----------------------+-----------------------------+---------------------------------------------------- autor_artikel | text | titel_artikel | text | jahrgang_zeitschrift | integer | jahr_zeitschrift | character varying(20) | heftnummer | character varying(30) | seitenzahl_artikel | character varying(30) | bemerkungen_artikel | text | deskriptoren_alt | text | deskriptoren_neu | text | personennamen_artikel | text | orte_artikel | text | id_artikel | oid | id_titel | oid | cdate | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone udate | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone uid | oid | gid | oid | mod | boolean | Indexes: "id_artikel_idx" btree (id_artikel) "id_titel_idx" btree (id_titel) "idx_artikelindeax_autor" btree (autor_artikel) "idx_artikelindex_fingerprint" btree (id_artikel) "idx_artikelindex_jahr" btree (jahrgang_zeitschrift) "idx_artikelindex_jahrgang" btree (jahr_zeitschrift) "idx_artikelindex_zeitschrift" btree (id_titel) Rules: delete_val AS ON DELETE TO artikelindex DO UPDATE counter SET val = counter.val + 1 WHERE counter.tab::text = 'artikelindex'::character varying::text insert_val AS ON INSERT TO artikelindex DO UPDATE counter SET val = counter.val + 1 WHERE counter.tab::text = 'artikelindex'::character varying::text update_val AS ON UPDATE TO artikelindex DO UPDATE counter SET val = counter.val + 1 WHERE counter.tab::text = 'artikelindex'::character varying::text And more: here is my retrieving program. I use perl and the DBI-module and the following code-snip -------------------------- my $sth=$dbh->prepare( 'select OID from artikelindex where id_artikel=?'); foreach (@id) { my $ret=$sth->execute($_); my $x=$sth->fetchrow_arrayref; } ------------------------- thnx a lot for any idea, peter -- mag. peter pilsl goldfisch.at IT-management tel +43 699 1 3574035 fax +43 699 4 3574035 pilsl@goldfisch.at
Well, first it would be a good idea to see what postgres is actually doing. Send the output of: => EXPLAIN ANALYZE SELECT OID FROM atrikelindex WHERE id_artikel=?; (where ? is replaced by some id value) It will either say index lookup or sequential scan since it's just a select from one table. That will tell you whether the index is being used or not. Next, if it is doing something that seems unreasonable, try doing: => VACUUM ANALYZE artikelindex; And we should also look at the EXPLAIN output on the IN query version like: => EXPLAIN ANALYZE SELECT OID FROM artikelindex WHERE id_artikel IN (?,?,...,?); Although I'm not sure what kind of performance to expect if you have 10k values in that list. Also, what version are you using? And how many rows in the table? Is it regularly vacuumed? Are there a lot of updates/deletes going to that table? I suspect that the fastest way that postgres can get you those rows would be the IN query on a recent version of postgresql. It may choose a sequential scan, which likely would be good since one seq scan will hopefully take less than 10 seconds. Regards, Jeff Davis On Sun, 2005-01-02 at 00:52 +0100, peter pilsl wrote: > psql8: > > I use a bigger psql-table to store information and keep an id-value of > each row in memory of my application for faster access. > My applications is able to calculate a list of needed id's in very short > time and then wants to retrieve all rows corresponding to this id's. > > So in fact I perform a lot of operations like: > > select field1,field2,field3 from mytable where id=XX; > > There is a index on the id-field and the id-field is of type OID, so > everything should be quite fast. Unfortunately it is not. > > On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000 > rows. In this testscenario I only fetch the OID and no other col. > I dont understand this. Am I expecting far to much? Is 10seconds for the > retrieval of 10000 OIDs a fine value? I want it to be less than one > second and from my experience with postgres this operation is extremely > slow compared to the impressive speed of most other operations. > > I also tried to use the IN-operator, which is much more slower. Is there > any other way to speed up things? I can order the list of id's to > retrieve in my application if there is a way to tell psql not to search > the whole index every time but somehow "do better". > > If it is of any interest, here is the table. The id is stored as id_artikel. > > > Table "public.artikelindex" > Column | Type | > Modifiers > -----------------------+-----------------------------+---------------------------------------------------- > autor_artikel | text | > titel_artikel | text | > jahrgang_zeitschrift | integer | > jahr_zeitschrift | character varying(20) | > heftnummer | character varying(30) | > seitenzahl_artikel | character varying(30) | > bemerkungen_artikel | text | > deskriptoren_alt | text | > deskriptoren_neu | text | > personennamen_artikel | text | > orte_artikel | text | > id_artikel | oid | > id_titel | oid | > cdate | timestamp without time zone | default > ('now'::text)::timestamp(6) with time zone > udate | timestamp without time zone | default > ('now'::text)::timestamp(6) with time zone > uid | oid | > gid | oid | > mod | boolean | > Indexes: > "id_artikel_idx" btree (id_artikel) > "id_titel_idx" btree (id_titel) > "idx_artikelindeax_autor" btree (autor_artikel) > "idx_artikelindex_fingerprint" btree (id_artikel) > "idx_artikelindex_jahr" btree (jahrgang_zeitschrift) > "idx_artikelindex_jahrgang" btree (jahr_zeitschrift) > "idx_artikelindex_zeitschrift" btree (id_titel) > Rules: > delete_val AS > ON DELETE TO artikelindex DO UPDATE counter SET val = counter.val + 1 > WHERE counter.tab::text = 'artikelindex'::character varying::text > insert_val AS > ON INSERT TO artikelindex DO UPDATE counter SET val = counter.val + 1 > WHERE counter.tab::text = 'artikelindex'::character varying::text > update_val AS > ON UPDATE TO artikelindex DO UPDATE counter SET val = counter.val + 1 > WHERE counter.tab::text = 'artikelindex'::character varying::text > > > And more: here is my retrieving program. I use perl and the DBI-module > and the following code-snip > > -------------------------- > my $sth=$dbh->prepare( > 'select OID from artikelindex where id_artikel=?'); > > foreach (@id) { > my $ret=$sth->execute($_); > my $x=$sth->fetchrow_arrayref; > } > ------------------------- > > > > thnx a lot for any idea, > peter > > > > -- > mag. peter pilsl > goldfisch.at > IT-management > tel +43 699 1 3574035 > fax +43 699 4 3574035 > pilsl@goldfisch.at > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
For what it's worth, I put 100k rows into a table in 8.0beta5, and selected 10k at a time. When doing each SELECT seperately using the index, it took about 2.5s to do 10k SELECTs. When using an IN query containing all the id's that I wanted, it took less than a second. Jeff On Sat, 2005-01-01 at 17:34 -0800, Jeff Davis wrote: > Well, first it would be a good idea to see what postgres is actually > doing. Send the output of: > > => EXPLAIN ANALYZE SELECT OID FROM atrikelindex WHERE id_artikel=?; > > (where ? is replaced by some id value) > > It will either say index lookup or sequential scan since it's just a > select from one table. That will tell you whether the index is being > used or not. > > Next, if it is doing something that seems unreasonable, try doing: > > => VACUUM ANALYZE artikelindex; > > And we should also look at the EXPLAIN output on the IN query version > like: > > => EXPLAIN ANALYZE SELECT OID FROM artikelindex WHERE id_artikel IN > (?,?,...,?); > > Although I'm not sure what kind of performance to expect if you have 10k > values in that list. > > Also, what version are you using? And how many rows in the table? Is it > regularly vacuumed? Are there a lot of updates/deletes going to that > table? > > I suspect that the fastest way that postgres can get you those rows > would be the IN query on a recent version of postgresql. It may choose a > sequential scan, which likely would be good since one seq scan will > hopefully take less than 10 seconds. > > Regards, > Jeff Davis > > > On Sun, 2005-01-02 at 00:52 +0100, peter pilsl wrote: > > psql8: > > > > I use a bigger psql-table to store information and keep an id-value of > > each row in memory of my application for faster access. > > My applications is able to calculate a list of needed id's in very short > > time and then wants to retrieve all rows corresponding to this id's. > > > > So in fact I perform a lot of operations like: > > > > select field1,field2,field3 from mytable where id=XX; > > > > There is a index on the id-field and the id-field is of type OID, so > > everything should be quite fast. Unfortunately it is not. > > > > On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000 > > rows. In this testscenario I only fetch the OID and no other col. > > I dont understand this. Am I expecting far to much? Is 10seconds for the > > retrieval of 10000 OIDs a fine value? I want it to be less than one > > second and from my experience with postgres this operation is extremely > > slow compared to the impressive speed of most other operations. > > > > I also tried to use the IN-operator, which is much more slower. Is there > > any other way to speed up things? I can order the list of id's to > > retrieve in my application if there is a way to tell psql not to search > > the whole index every time but somehow "do better". > > > > If it is of any interest, here is the table. The id is stored as id_artikel. > > > > > > Table "public.artikelindex" > > Column | Type | > > Modifiers > > -----------------------+-----------------------------+---------------------------------------------------- > > autor_artikel | text | > > titel_artikel | text | > > jahrgang_zeitschrift | integer | > > jahr_zeitschrift | character varying(20) | > > heftnummer | character varying(30) | > > seitenzahl_artikel | character varying(30) | > > bemerkungen_artikel | text | > > deskriptoren_alt | text | > > deskriptoren_neu | text | > > personennamen_artikel | text | > > orte_artikel | text | > > id_artikel | oid | > > id_titel | oid | > > cdate | timestamp without time zone | default > > ('now'::text)::timestamp(6) with time zone > > udate | timestamp without time zone | default > > ('now'::text)::timestamp(6) with time zone > > uid | oid | > > gid | oid | > > mod | boolean | > > Indexes: > > "id_artikel_idx" btree (id_artikel) > > "id_titel_idx" btree (id_titel) > > "idx_artikelindeax_autor" btree (autor_artikel) > > "idx_artikelindex_fingerprint" btree (id_artikel) > > "idx_artikelindex_jahr" btree (jahrgang_zeitschrift) > > "idx_artikelindex_jahrgang" btree (jahr_zeitschrift) > > "idx_artikelindex_zeitschrift" btree (id_titel) > > Rules: > > delete_val AS > > ON DELETE TO artikelindex DO UPDATE counter SET val = counter.val + 1 > > WHERE counter.tab::text = 'artikelindex'::character varying::text > > insert_val AS > > ON INSERT TO artikelindex DO UPDATE counter SET val = counter.val + 1 > > WHERE counter.tab::text = 'artikelindex'::character varying::text > > update_val AS > > ON UPDATE TO artikelindex DO UPDATE counter SET val = counter.val + 1 > > WHERE counter.tab::text = 'artikelindex'::character varying::text > > > > > > And more: here is my retrieving program. I use perl and the DBI-module > > and the following code-snip > > > > -------------------------- > > my $sth=$dbh->prepare( > > 'select OID from artikelindex where id_artikel=?'); > > > > foreach (@id) { > > my $ret=$sth->execute($_); > > my $x=$sth->fetchrow_arrayref; > > } > > ------------------------- > > > > > > > > thnx a lot for any idea, > > peter > > > > > > > > -- > > mag. peter pilsl > > goldfisch.at > > IT-management > > tel +43 699 1 3574035 > > fax +43 699 4 3574035 > > pilsl@goldfisch.at > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html
> I use a bigger psql-table to store information and keep an id-value of how big ? > each row in memory of my application for faster access. related to the previous question : are you sure there won't be a day where it won't fit ? > My applications is able to calculate a list of needed id's in very short > time and then wants to retrieve all rows corresponding to this id's. > select field1,field2,field3 from mytable where id=XX; Good for one id, see below > There is a index on the id-field and the id-field is of type OID, so > everything should be quite fast. Unfortunately it is not. Why not use integer (serial) instead of oid ? > On a 2GHz-machine with 1GB RAM it takes approx. 10seconds to fetch 10000 > rows. In this testscenario I only fetch the OID and no other col. I think this is pretty fast. The machine runs 1K queries/s, including generating the query, passing it to postgres via a socket, parsing it, executing it, returning one row via a socket, etc. If you want faster results you'll have to get all your results in one query, and only then can get it a lot faster (not mentioning reducing your server load by a lot !) > I dont understand this. Am I expecting far to much? Is 10seconds for the > retrieval of 10000 OIDs a fine value? I want it to be less than one For 10K queries, it's fast ! > I also tried to use the IN-operator, which is much more slower. Is there That's what I'd advise you to use. You should find why it's slow and make it fast. Why not post the EXPLAIN ANALYZE results for a SELECT * FROM thetable WHERE id_artikel IN (1000 values) for instance ? WHat plan does it choose ? If all else fails, you can create a set-returning function which will take an array of id's as its parameter, loop on it, do a SELECT for each oid, and RETURN NEXT for each result ; then you can process the whole result set in one query ; but it'll be slower than a propermy optimized IN query...
> select field1,field2,field3 from mytable where id=XX; For instance, on my machine : SELECT * FROM bigtable with 2M rows WHERE id IN (list of 500 values) takes 10 ms.