Thread: Database tuning
Hello all, I have a PostgreSQL database that is storing The Bible. It has 31,103 records in it and I have a PHP page executing this query: SELECT books.book_name, bible.chapter, bible.verse, bible.versetext FROM asv_bible bible, book_bible books WHERE bible.book = books.id ORDER BY random() LIMIT 1 The database schema is: /* -------------------------------------------------------- Sequences -------------------------------------------------------- */ CREATE SEQUENCE "book_bible_seq" start 1 increment 1 maxvalue 2147483647 minvalue 1 cache 1; /* -------------------------------------------------------- Table structure for table "asv_bible" -------------------------------------------------------- */ CREATE TABLE "asv_bible" ( "id" int8 NOT NULL, "book" int8, "chapter" int8, "verse" int8, "versetext" text, CONSTRAINT "asv_bible_pkey" PRIMARY KEY ("id") ); /* -------------------------------------------------------- Table structure for table "book_bible" -------------------------------------------------------- */ CREATE TABLE "book_bible" ( "id" int4 DEFAULT nextval('book_bible_seq'::text) NOT NULL, "book_name" varchar(20), CONSTRAINT "book_bible_pkey" PRIMARY KEY ("id") ); Right now it takes 9 seconds to return the results. I don't think that it has anything to do with the language executing it, as I have run the same query in via Perl and had the same luck. I was wondering if anyone could offer any help with lowering the time it takes to run? Am I optimizing the database correctly? My hardware is a Pentium II 400 with 128MB of RAM. Or if this is the normal runtime for a database of this size, I'd just like confirmation. Thanks, Kevin _________________________________________________________________ Send and receive Hotmail on your mobile device: http://mobile.msn.com
Hello, The problem is not the database, but your query. You are asking your poor Pentium II to sort "your" Bible in a random order!! And then you will only retrieve the first entry. If you actually want to get a uniformly chosen random verse from The Bible, and you actually CARE about the quality of your randomness you can use your id field. Do I understand well if I read "each verse has an unique ID" ? You can index that table by your id, and then SELECT ... WHERE bible.book = books.id AND bible.id=XXX; XXX is a random value ranking from 0 (or 1) to the maximum id value. You generate it outside the database. You can also try (but I am not certain at all that these may work): SELECT ... WHERE bible.book = books.id AND bible.id=(MAX*random()); But you still need to know MAX. Both approaches need the certainty that every different ID has an associated verse. You can even store the max on some other table, if you do not want to care about it in your application, and then issue: SELECT ... WHERE bible.book = books.id AND bible.id=(select maxval*random() from counttable); I hope that helps. Anyway, before trying any query on a big database, an EXPLAIN should be helpful/meaningful. If you see SEQ SCAN on a very long table, that is certainly BAD. Standard question: Did you VACUUM ANALYZE? If not, do it. Good luck! Antonio K Old wrote: > Hello all, > > I have a PostgreSQL database that is storing The Bible. It has 31,103 > records in it and I have a PHP page executing this query: > SELECT > books.book_name, bible.chapter, bible.verse, bible.versetext > FROM asv_bible bible, book_bible books WHERE bible.book = books.id > ORDER BY random() LIMIT 1 > > The database schema is: > > /* -------------------------------------------------------- > Sequences > -------------------------------------------------------- */ > CREATE SEQUENCE "book_bible_seq" start 1 increment 1 maxvalue > 2147483647 minvalue 1 cache 1; > > /* -------------------------------------------------------- > Table structure for table "asv_bible" > -------------------------------------------------------- */ > CREATE TABLE "asv_bible" ( > "id" int8 NOT NULL, > "book" int8, > "chapter" int8, > "verse" int8, > "versetext" text, > CONSTRAINT "asv_bible_pkey" PRIMARY KEY ("id") > ); > > > > /* -------------------------------------------------------- > Table structure for table "book_bible" > -------------------------------------------------------- */ > CREATE TABLE "book_bible" ( > "id" int4 DEFAULT nextval('book_bible_seq'::text) NOT NULL, > "book_name" varchar(20), > CONSTRAINT "book_bible_pkey" PRIMARY KEY ("id") > ); > > Right now it takes 9 seconds to return the results. I don't think > that it has anything to do with the language executing it, as I have > run the same query in via Perl and had the same luck. > > I was wondering if anyone could offer any help with lowering the time > it takes to run? > > Am I optimizing the database correctly? > > My hardware is a Pentium II 400 with 128MB of RAM. > > Or if this is the normal runtime for a database of this size, I'd just > like confirmation. > > Thanks, > Kevin > > > _________________________________________________________________ > Send and receive Hotmail on your mobile device: http://mobile.msn.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > . >
> SELECT > books.book_name, bible.chapter, bible.verse, bible.versetext > FROM asv_bible bible, book_bible books WHERE bible.book = books.id ORDER BY > random() LIMIT 1 > I was wondering if anyone could offer any help with lowering the time it > takes to run? I didn't see anything in your schema about an index on asv_bible.book (book_bible.id is a primary key therefore an index is created implicitly). Without an index on the book column, the join is going to take some time. I would put an index on that and do a VACUUM ANALYZE. You should see a huge speedup from that alone. Greg
Hello, Antonio Fiol Bonnin! On Fri, Dec 21, 2001 at 06:41:41PM +0100, you wrote: > XXX is a random value ranking from 0 (or 1) to the maximum id value. You > generate it outside the database. Maximum id is not good solution, beacuse some of ids between 0 (or 1) could be missing for some reasons. You should: select id from table; Then chose random one outside SQL, and then: select * from table where table.id = '$random_id'; This would be slower but will work for sure. Another approach -- select random id outside SQL from 0 (or 1) to maximum_id and then trying to select with 'where table.id = '$random_id'' until you've got > 0 rows in result. -- NEVE-RIPE
> > >>XXX is a random value ranking from 0 (or 1) to the maximum id value. You >>generate it outside the database. >> >Maximum id is not good solution, beacuse some of ids between 0 (or 1) >could be missing for some reasons. You should: > >select id from table; > SEQ SCAN. Avoid for big tables. >Then chose random one outside SQL, and then: > >select * from table where table.id = '$random_id'; > >This would be slower but will work for sure. > Probably even slower than original. Not sure, though. >Another approach -- select random id outside SQL from 0 (or 1) to >maximum_id and then trying to select with 'where table.id = >'$random_id'' until you've got > 0 rows in result. > The only drawback that I see to this approach is that it may not find a result in a finite time. Better: "Compress" your table so that it holds all continuous IDs. This may be painful and long, but it will pay off in the end, if your table is not updated often, so this operation only needs to be done once. I guess The Bible is not updated very often. ;-) Antonio
Hello, Antonio Fiol Bonnin! On Fri, Dec 21, 2001 at 09:11:41PM +0100, you wrote: > >Then chose random one outside SQL, and then: > > > >select * from table where table.id = '$random_id'; > > > >This would be slower but will work for sure. > > > Probably even slower than original. Not sure, though. No, it should be faster that 9 seconds even if count of rows is very big :) Just verified on 100,000 records. It is about 1 second on my PIII-800/512Mb using perl as frontend. > > >Another approach -- select random id outside SQL from 0 (or 1) to > >maximum_id and then trying to select with 'where table.id = > >'$random_id'' until you've got > 0 rows in result. > > > The only drawback that I see to this approach is that it may not find a > result in a finite time. If IDs in table are pretty "Compressed" (c) you, as it is in our case it will work in VERY finite time :> > > Better: "Compress" your table so that it holds all continuous IDs. This > may be painful and long, but it will pay off in the end, if your table > is not updated often, so this operation only needs to be done once. Yes, I agree it will be _very_ painful if you have a lot of crosslinks in tables. > > I guess The Bible is not updated very often. ;-) We never should assume this. We don't know if God is preparing new release or just abandoned this book, maybe he has CVS somewhere... -- NEVE-RIPE
Antonio Fiol =?ISO-8859-1?Q?Bonn=EDn?= <fiol@w3ping.com> writes: > Better: "Compress" your table so that it holds all continuous IDs. It's not really necessary to do that. Assuming that you know the range of IDs, you could do select ... where id >= (random() * MAX) order by id limit 1; (where you actually need to do the random() calculation on the client side so you can send a constant in the query; or else cheat using a user-defined function that's marked "iscachable". Search for "iscachable" in the mail list archives to see more about that fine point.) This should produce an indexscan plan that starts scanning at (random() * MAX) and stops as soon as it's got the first tuple. If you have wide gaps in the ID sequence then the items just after each such gap will be disproportionately likely to be chosen by this approach. So it's not perfect. But minor irregularity in the sequence of IDs can be tolerated this way. regards, tom lane