Home > mailing lists

Re: PostgreSQL caching - Mailing list pgsql-performance

From	Vitaly Belman
Subject	Re: PostgreSQL caching
Date	May 25, 2004 19:53:13
Msg-id	1371146711781.20040525225305@012.net.il Whole thread Raw
In response to	Re: PostgreSQL caching (Jochem van Dieten <jochemd@oli.tudelft.nl>)
Responses	Re: PostgreSQL caching
List	pgsql-performance

Tree view

Hello Jochem and Marty,

I guess I should have posted the table structure before =(:

Table structure + Indexes
-------------------------

CREATE TABLE public.bv_books
(
  book_id serial NOT NULL,
  book_title varchar(255) NOT NULL,
  series_id int4,
  series_index int2,
  annotation_desc_id int4,
  description_desc_id int4,
  book_picture varchar(255) NOT NULL,
  vote_avg float4 NOT NULL,
  vote_count int4 NOT NULL,
  CONSTRAINT bv_books_pkey PRIMARY KEY (book_id)
) WITH OIDS;

CREATE INDEX i_books_vote_avg
  ON public.bv_books
  USING btree
  (vote_avg);

CREATE INDEX i_books_vote_count
  ON public.bv_books
  USING btree
  (vote_count);

-------------------------

CREATE TABLE public.bv_bookgenres
(
  book_id int4 NOT NULL,
  genre_id int4 NOT NULL,
  CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
  CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES public.bv_books (book_id) ON UPDATE RESTRICT ON
DELETERESTRICT 
) WITH OIDS;

CREATE INDEX i_bookgenres_book_id
  ON public.bv_bookgenres
  USING btree
  (book_id);

CREATE INDEX i_bookgenres_genre_id
  ON public.bv_bookgenres
  USING btree
  (genre_id);
-------------------------

MS> I didn't see the table structure, but I assume that the vote_avg and
MS> vote_count fields are in bv_bookgenres.  If no fields are actually
MS> needed from bv_bookgenres, then the query might be constructed in a way
MS> that only the index would be read, without loading any row data.

I didn't understand you. vote_avg is stored in bv_books.. So yes, the
only thing I need from bv_bookgenres is the id of the book, but I can't
store this info in bv_books because there is N to N relationship
between them - every book can belong to a number of genres... If
that's what you meant.

MS> I think that you mentioned this was for a web app.  Do you actually have
MS> a web page that displays 2000 rows of data?

Well.. It is all "paginated", you can access 2000 items of the data
(as there are actually 2000 books in the genre) but you only see 10
items at a time.. I mean, probably no one would go over the 2000
books, but I can't just hide them =\.

JvD> Presuming that vote_avg is a field in the table bv_bookgenres,
JvD> try a composite index on genre_id and vote_avg and then see if
JvD> you can use the limit clause to reduce the number of loop
JvD> iterations from 1993 to 10.

I'm afraid your idea is invalid in my case =\... Naturally I could
eventually do data coupling to gain perforemnce boost if this issue
will not be solved in other ways. I'll keep your idea in mind anyway,
thanks.

Once again thanks for you feedback.

Regards,
 Vitaly Belman

 ICQ: 1912453
 AIM: VitalyB1984
 MSN: tmdagent@hotmail.com
 Yahoo!: VitalyBe

Tuesday, May 25, 2004, 6:37:44 PM, you wrote:

JvD> Vitaly Belman wrote:
>>
>> If you'll be so kind though, I'd be glad if you could spot anything to
>> speed up in this query. Here's the query and its plan that happens
>> without any caching:
>>
>> -------------------------------------------------------------------------------------------------------------
>> QUERY
>> -----
>> SELECT     bv_books. * ,
>>            vote_avg,
>>            vote_count
>> FROM       bv_bookgenres,
>>            bv_books
>> WHERE      bv_books.book_id = bv_bookgenres.book_id AND
>>            bv_bookgenres.genre_id = 5830
>> ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;
>>
>> QUERY PLAN
>> ----------
>> Limit  (cost=2337.41..2337.43 rows=10 width=76) (actual
>> time=7875.000..7875.000 rows=10 loops=1)
>>   ->  Sort  (cost=2337.41..2337.94 rows=214 width=76) (actual
>> time=7875.000..7875.000 rows=10 loops=1)
>>         Sort Key: bv_books.vote_avg
>>         ->  Nested Loop  (cost=0.00..2329.13 rows=214 width=76)
>> (actual time=16.000..7844.000 rows=1993 loops=1)
>>               ->  Index Scan using i_bookgenres_genre_id on
>> bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4) (actual
>> time=16.000..3585.000 rows=1993 loops=1)
>>                     Index Cond: (genre_id = 5830)
>>               ->  Index Scan using bv_books_pkey on bv_books
>> (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1
>> loops=1993)
>>                     Index Cond: (bv_books.book_id = "outer".book_id)
>> Total runtime: 7875.000 ms

JvD> Presuming that vote_avg is a field in the table bv_bookgenres,
JvD> try a composite index on genre_id and vote_avg and then see if
JvD> you can use the limit clause to reduce the number of loop
JvD> iterations from 1993 to 10.

JvD> CREATE INDEX test_idx ON bv_bookgenres (genre_id, vote_avg);

JvD> The following query tries to force that execution lan and,
JvD> presuming there is a foreign key relation between
JvD> bv_books.book_id AND bv_bookgenres.book_id, I expect it will give
JvD> the same results, but be carefull with NULL's:

JvD> SELECT    bv_books. * ,
JvD>     vote_avg,
JvD>     vote_count
JvD> FROM     (
JvD>         SELECT    bg.*
JvD>         FROM     bv_bookgenres bg
JvD>         WHERE    bg.genre_id = 5830
JvD>         ORDER BY
JvD>             bg.vote_avg DESC
JvD>         LIMIT    10
JvD>     ) bv_bookgenres,
JvD>     bv_books
JvD> WHERE    bv_books.book_id = bv_bookgenres.book_id
JvD> ORDER BY
JvD>     vote_avg DESC
JvD> LIMIT    10;

JvD> Jochem

pgsql-performance by date:

From: Jochem van Dieten
Date: 25 May 2004, 15:38:14
Subject: Re: PostgreSQL caching

From: Robert Treat
Date: 25 May 2004, 21:08:17
Subject: Re: Interpreting vmstat

Re: PostgreSQL caching - Mailing list pgsql-performance

Previous

Next