Thread: Performance hit on loading from HD

Performance hit on loading from HD

From
Vitaly Belman
Date:
I have a problem with certain queries performance. Trouble is that
while their execution plan is pretty good and mostly their execution
is great as well, their FIRST execution time (that is after you mount
the database) is abysmal.

I realize that it happens due to the loading of data from the HD to
the memory/swap and it wouldn't be too bad if I just could make the
data stay in the memory, sadly, after a few minutes the data is back
on the HD and running the query again results the same bad
performance.

Here's a query for example, though as I said, this problem occurs in
different queries.

---------------------------------------------------------------------------------------

  SELECT *
    FROM bv_bookgenres, bv_books
   WHERE bv_books.book_id = bv_bookgenres.book_id and genre_id = 987
ORDER BY vote_avg limit 10

---------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------

QUERY PLAN
Limit  (cost=2601.16..2601.18 rows=10 width=193) (actual
time=4735.097..4735.107 rows=10 loops=1)
  ->  Sort  (cost=2601.16..2601.70 rows=219 width=193) (actual
time=4735.092..4735.095 rows=10 loops=1)
        Sort Key: bv_books.vote_avg
        ->  Nested Loop  (cost=0.00..2592.64 rows=219 width=193)
(actual time=74.615..4719.147 rows=1877 loops=1)
              ->  Index Scan using i_bookgenres_genre_id on
bv_bookgenres  (cost=0.00..1707.03 rows=218 width=4) (actual
time=74.540..2865.366 rows=1877 loops=1)
                    Index Cond: (genre_id = 987)
              ->  Index Scan using bv_books_pkey on bv_books
(cost=0.00..4.05 rows=1 width=193) (actual time=0.968..0.971 rows=1
loops=1877)
                    Index Cond: (bv_books.book_id = "outer".book_id)
Total runtime: 4735.726 ms

---------------------------------------------------------------------------------------

If I run the query again after it just finished running I would get
the following timing:

---------------------------------------------------------------------------------------

Limit  (cost=3937.82..3937.84 rows=10 width=204)
  ->  Sort  (cost=3937.82..3938.38 rows=223 width=204)
        Sort Key: bv_books.vote_avg
        ->  Nested Loop  (cost=0.00..3929.12 rows=223 width=204)
              ->  Index Scan using i_bookgenres_genre_id on
bv_bookgenres  (cost=0.00..1731.94 rows=222 width=8)
                    Index Cond: (genre_id = 987)
              ->  Index Scan using bv_books_pkey on bv_books
(cost=0.00..9.88 rows=1 width=196)
                    Index Cond: (bv_books.book_id = "outer".book_id)

---------------------------------------------------------------------------------------

Before going on, I should say that I am running PostgreSQL on CoLinux
under Windows 2000. From what I read/tested, the CoLinux performance
on CoLinux are matching to the performance of VMWare. Yet, I'm still
wondering if it is a side effect of my development setup or if some of
my settings are indeed wrong.

With that said, here is the information of the tables:

---------------------------------------------------------------------------------------

CREATE TABLE bv_books
(
  book_id serial NOT NULL,
  book_name varchar(255) NOT NULL,
  series_id int4,
  series_index int2,
  annotation_desc_id int4,
  description_desc_id int4,
  book_picture varchar(255) NOT NULL,
  reviews_error int4 NOT NULL,
  vote_avg float4 NOT NULL,
  vote_count int4 NOT NULL,
  book_genre int4[],
  book_name_fulltext tsearch2.tsvector,
  book_name_fulltext2 tsearch2.tsvector,
  CONSTRAINT bv_books_pkey PRIMARY KEY (book_id),
  CONSTRAINT fk_books_annotation_desc_id FOREIGN KEY
(annotation_desc_id) REFERENCES bv_descriptions (description_id) ON
UPDATE RESTRICT ON DELETE SET NULL,
  CONSTRAINT fk_books_description_desc_id FOREIGN KEY
(description_desc_id) REFERENCES bv_descriptions (description_id) ON
UPDATE RESTRICT ON DELETE SET NULL,
  CONSTRAINT fk_books_series_id FOREIGN KEY (series_id) REFERENCES
bv_series (series_id) ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITH OIDS;

CREATE TABLE bv_bookgenres
(
  book_id int4 NOT NULL,
  genre_id int4 NOT NULL,
  CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
  CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES
bv_books (book_id) ON UPDATE RESTRICT ON DELETE CASCADE,
  CONSTRAINT fk_bookgenres_genre_id FOREIGN KEY (genre_id) REFERENCES
bv_genres (genre_id) ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITH OIDS;

---------------------------------------------------------------------------------------

As far as the data is concerned, there are around 170,000 rows in
bv_books and 940,000 in bv_bookgenres. There are also btree index on
all the relevant (to the query) fields.

I can live up with the fact that the data has to be loaded the first
time it is accessed, but is it possible to make it stick longer in the
memory? Is it the fact that CoLinux gets only 128MB of RAM? Or one of
my settings should be fixed?

Thanks

Re: Performance hit on loading from HD

From
Josh Berkus
Date:
Vitaly,

> I have a problem with certain queries performance. Trouble is that
> while their execution plan is pretty good and mostly their execution
> is great as well, their FIRST execution time (that is after you mount
> the database) is abysmal.

This is a well-known problem.   The general approach to this is to run a
script to do select * queries against all important tables on system
start-up.

> I realize that it happens due to the loading of data from the HD to
> the memory/swap and it wouldn't be too bad if I just could make the
> data stay in the memory, sadly, after a few minutes the data is back
> on the HD and running the query again results the same bad
> performance.

This could be for a variety of reasons.  On a standard platform (which yours
most definitely is not), this would be due to database vacuuming, commits of
large updates to your data, or another application using most of the system
memory.

> Before going on, I should say that I am running PostgreSQL on CoLinux
> under Windows 2000. From what I read/tested, the CoLinux performance
> on CoLinux are matching to the performance of VMWare. Yet, I'm still
> wondering if it is a side effect of my development setup or if some of
> my settings are indeed wrong.

Probably you will continue to get worse-than-normal performance from both.
You simply can't expect performance PostgreSQL running on an emulation
environment.   If you could, we wouldn't have bothered with a Windows port.
Speaking of which, have you started testing the Windows port?  I'd be
interested in your comparison of it against running on CoLinux.

> I can live up with the fact that the data has to be loaded the first
> time it is accessed, but is it possible to make it stick longer in the
> memory? Is it the fact that CoLinux gets only 128MB of RAM? Or one of
> my settings should be fixed?

Well, mostly it's that you should start testing 8.0, and the Windows port.
Not only should running native be better, but 8.0 (thanks to the work of Jan
Wieck) is now able to take advantage of a large chunk of dedicated memory,
which earlier versions were not.   Also, "lazy vacuum" and the "background
writer", also features of 8.0 and Jan's work, should prevent PostgreSQL from
cleaning out its own cache completely.    You should test this,
*particularly* on Windows where we could use some more performance testing.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco