Thread: Database tuning

Database tuning

From

"K Old"

Date:

21 December 2001, 12:11:15

Hello all,

I have a PostgreSQL database that is storing The Bible.  It has 31,103
records in it and I have a PHP page executing this query:
SELECT
books.book_name, bible.chapter, bible.verse, bible.versetext
FROM asv_bible bible, book_bible books WHERE bible.book = books.id ORDER BY
random() LIMIT 1

The database schema is:

/* --------------------------------------------------------
  Sequences
-------------------------------------------------------- */
CREATE SEQUENCE "book_bible_seq" start 1 increment 1 maxvalue 2147483647
minvalue 1 cache 1;

/* --------------------------------------------------------
  Table structure for table "asv_bible"
-------------------------------------------------------- */
CREATE TABLE "asv_bible" (
   "id" int8 NOT NULL,
   "book" int8,
   "chapter" int8,
   "verse" int8,
   "versetext" text,
   CONSTRAINT "asv_bible_pkey" PRIMARY KEY ("id")
);



/* --------------------------------------------------------
  Table structure for table "book_bible"
-------------------------------------------------------- */
CREATE TABLE "book_bible" (
   "id" int4 DEFAULT nextval('book_bible_seq'::text) NOT NULL,
   "book_name" varchar(20),
   CONSTRAINT "book_bible_pkey" PRIMARY KEY ("id")
);

Right now it takes 9 seconds to return the results.  I don't think that it
has anything to do with the language executing it, as I have run  the same
query in via Perl and had the same luck.

I was wondering if anyone could offer any help with lowering the time it
takes to run?

Am I optimizing the database correctly?

My hardware is a Pentium II 400 with 128MB of RAM.

Or if this is the normal runtime for a database of this size, I'd just like
confirmation.

Thanks,
Kevin


_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com

Re: Database tuning

From

Antonio Fiol Bonnín

Date:

21 December 2001, 12:44:13

Hello,

The problem is not the database, but your query. You are asking your
poor Pentium II to sort "your" Bible in a random order!! And then you
will only retrieve the first entry.

If you actually want to get a uniformly chosen random verse from The
Bible, and you actually CARE about the quality of your randomness you
can use your id field. Do I understand well if I read "each verse has an
unique ID" ?

You can index that table by your id, and then SELECT ... WHERE
bible.book = books.id AND bible.id=XXX;

XXX is a random value ranking from 0 (or 1) to the maximum id value. You
generate it outside the database.

You can also try (but I am not certain at all that these may work):

SELECT ... WHERE bible.book = books.id AND bible.id=(MAX*random());

But you still need to know MAX.

Both approaches need the certainty that every different ID has an
associated verse.

You can even store the max on some other table, if you do not want to
care about it in your application, and then issue:

SELECT ... WHERE bible.book = books.id AND bible.id=(select
maxval*random() from counttable);

I hope that helps.

Anyway, before trying any query on a big database, an EXPLAIN should be
helpful/meaningful.

If you see SEQ SCAN on a very long table, that is certainly BAD.

Standard question: Did you VACUUM ANALYZE? If not, do it.

Good luck!

Antonio

K Old wrote:

> Hello all,
>
> I have a PostgreSQL database that is storing The Bible.  It has 31,103
> records in it and I have a PHP page executing this query:
> SELECT
> books.book_name, bible.chapter, bible.verse, bible.versetext
> FROM asv_bible bible, book_bible books WHERE bible.book = books.id
> ORDER BY random() LIMIT 1
>
> The database schema is:
>
> /* --------------------------------------------------------
>  Sequences
> -------------------------------------------------------- */
> CREATE SEQUENCE "book_bible_seq" start 1 increment 1 maxvalue
> 2147483647 minvalue 1 cache 1;
>
> /* --------------------------------------------------------
>  Table structure for table "asv_bible"
> -------------------------------------------------------- */
> CREATE TABLE "asv_bible" (
>   "id" int8 NOT NULL,
>   "book" int8,
>   "chapter" int8,
>   "verse" int8,
>   "versetext" text,
>   CONSTRAINT "asv_bible_pkey" PRIMARY KEY ("id")
> );
>
>
>
> /* --------------------------------------------------------
>  Table structure for table "book_bible"
> -------------------------------------------------------- */
> CREATE TABLE "book_bible" (
>   "id" int4 DEFAULT nextval('book_bible_seq'::text) NOT NULL,
>   "book_name" varchar(20),
>   CONSTRAINT "book_bible_pkey" PRIMARY KEY ("id")
> );
>
> Right now it takes 9 seconds to return the results.  I don't think
> that it has anything to do with the language executing it, as I have
> run  the same query in via Perl and had the same luck.
>
> I was wondering if anyone could offer any help with lowering the time
> it takes to run?
>
> Am I optimizing the database correctly?
>
> My hardware is a Pentium II 400 with 128MB of RAM.
>
> Or if this is the normal runtime for a database of this size, I'd just
> like confirmation.
>
> Thanks,
> Kevin
>
>
> _________________________________________________________________
> Send and receive Hotmail on your mobile device: http://mobile.msn.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
> .
>

Re: Database tuning

From

"Gregory Wood"

Date:

21 December 2001, 13:54:24

> SELECT
> books.book_name, bible.chapter, bible.verse, bible.versetext
> FROM asv_bible bible, book_bible books WHERE bible.book = books.id ORDER
BY
> random() LIMIT 1

> I was wondering if anyone could offer any help with lowering the time it
> takes to run?

I didn't see anything in your schema about an index on asv_bible.book
(book_bible.id is a primary key therefore an index is created implicitly).
Without an index on the book column, the join is going to take some time. I
would put an index on that and do a VACUUM ANALYZE. You should see a huge
speedup from that alone.

Greg

Re: Database tuning

From

Nevermind

Date:

21 December 2001, 15:04:49

Hello, Antonio Fiol Bonnin!

On Fri, Dec 21, 2001 at 06:41:41PM +0100, you wrote:

> XXX is a random value ranking from 0 (or 1) to the maximum id value. You
> generate it outside the database.
Maximum id is not good solution, beacuse some of ids between 0 (or 1)
could be missing for some reasons. You should:

select id from table;

Then chose random one outside SQL, and then:

select * from table where table.id = '$random_id';

This would be slower but will work for sure.

Another approach -- select random id outside SQL from 0 (or 1) to
maximum_id and then trying to select with 'where table.id =
'$random_id'' until you've got > 0 rows in result.

--
NEVE-RIPE

Re: Database tuning

From

Antonio Fiol Bonnín

Date:

21 December 2001, 15:24:32

>
>
>>XXX is a random value ranking from 0 (or 1) to the maximum id value. You
>>generate it outside the database.
>>
>Maximum id is not good solution, beacuse some of ids between 0 (or 1)
>could be missing for some reasons. You should:
>
>select id from table;
>
SEQ SCAN. Avoid for big tables.

>Then chose random one outside SQL, and then:
>
>select * from table where table.id = '$random_id';
>
>This would be slower but will work for sure.
>
Probably even slower than original. Not sure, though.

>Another approach -- select random id outside SQL from 0 (or 1) to
>maximum_id and then trying to select with 'where table.id =
>'$random_id'' until you've got > 0 rows in result.
>
The only drawback that I see to this approach is that it may not find a
result in a finite time.

Better: "Compress" your table so that it holds all continuous IDs. This
may be painful and long, but it will pay off in the end, if your table
is not updated often, so this operation only needs to be done once.

I guess The Bible is not updated very often. ;-)

Antonio

Re: Database tuning

From

Nevermind

Date:

21 December 2001, 15:24:33

Hello, Antonio Fiol Bonnin!

On Fri, Dec 21, 2001 at 09:11:41PM +0100, you wrote:

> >Then chose random one outside SQL, and then:
> >
> >select * from table where table.id = '$random_id';
> >
> >This would be slower but will work for sure.
> >
> Probably even slower than original. Not sure, though.
No, it should be faster that 9 seconds even if count of rows is very big
:) Just verified on 100,000 records. It is about 1 second on my PIII-800/512Mb
using perl as frontend.

>
> >Another approach -- select random id outside SQL from 0 (or 1) to
> >maximum_id and then trying to select with 'where table.id =
> >'$random_id'' until you've got > 0 rows in result.
> >
> The only drawback that I see to this approach is that it may not find a
> result in a finite time.
If IDs in table are pretty "Compressed" (c) you, as it is in our case it
will work in VERY finite time :>
>
> Better: "Compress" your table so that it holds all continuous IDs. This
> may be painful and long, but it will pay off in the end, if your table
> is not updated often, so this operation only needs to be done once.
Yes, I agree it will be _very_ painful if you have a lot of crosslinks
in tables.
>
> I guess The Bible is not updated very often. ;-)
We never should assume this. We don't know if God is preparing new
release or just abandoned this book, maybe he has CVS somewhere...

--
NEVE-RIPE

Re: Database tuning

From

Tom Lane

Date:

21 December 2001, 21:34:13

Antonio Fiol =?ISO-8859-1?Q?Bonn=EDn?= <fiol@w3ping.com> writes:
> Better: "Compress" your table so that it holds all continuous IDs.

It's not really necessary to do that.  Assuming that you know the
range of IDs, you could do

    select ... where id >= (random() * MAX) order by id limit 1;

(where you actually need to do the random() calculation on the client
side so you can send a constant in the query; or else cheat using a
user-defined function that's marked "iscachable".  Search for
"iscachable" in the mail list archives to see more about that fine
point.)

This should produce an indexscan plan that starts scanning at
(random() * MAX) and stops as soon as it's got the first tuple.

If you have wide gaps in the ID sequence then the items just after
each such gap will be disproportionately likely to be chosen by this
approach.  So it's not perfect.  But minor irregularity in the sequence
of IDs can be tolerated this way.

            regards, tom lane