Re: PostgreSQL caching - Mailing list pgsql-performance

From Vitaly Belman
Subject Re: PostgreSQL caching
Date
Msg-id 1673440625.20040526173356@012.net.il
Whole thread Raw
In response to Re: PostgreSQL caching  (Marty Scholes <marty@outputservices.com>)
Responses Re: PostgreSQL caching  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Hello Marty, Nick and Robert,

NB> Depending on what version of PG you are running, IN might take a while
NB> to complete. If so try an EXISTS instead

RT> A question and two experiments... what version of postgresql is this?

I am using the newer 7.5dev native Windows port. For this reason I
don't think that IN will cause any trouble (I read that this issue was
resolved in 7.4).

MS> At any rate, a query with an IN clause should help quite a bit

MS> SELECT     bv_books. *
MS> FROM       bv_books
MS> WHERE      bv_books.book_id IN (
MS>                SELECT book_id
MS>                FROM bv_genres
MS>                WHERE bv_bookgenres.genre_id = 5830
MS>                )
MS> ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

It looks like it helps a bit (though you meant "FROM bv_bookgenres",
right?). I can't tell you how MUCH it helped though, because of two
reasons:

1) As soon as I run a query, it is cached in the memory and I can't
really find a good way to flush it out of there to test again except a
full computer reset (shutting postmaster down doesn't help). If you
have a better idea on this, do tell me =\ (Reminding again, I am on
Windows).

2) I *think* I resolved this issue, at least for most of the genre_ids
(didn't go through them all, but tried a few with different book count
and the results looked quite good). The fault was partly mine, a few
weeks ago I increase the statistics for the genre_id column a bit too
much (from 10 to 70), I was unsure how exactly it works (and still am)
but it helped for a few genre_ids that had a high book count, yet it
also hurt the performence for the genres without as much ids. I now
halved the stastics (to 58) and almost everything looks good now.

Because of that I'll stop working on that query for a while (unless
you have some more performance tips on the subject). Big thanks to
everyone who helped.. And I might bring this issue later again, it it
still will cause too much troubles.

RT> Try reindexing i_bookgenres_genre_id and capture the explain
RT> analyze for that.

Is that's what you meant "REINDEX INDEX i_bookgenres_genre_id"? But it
returns no messages what-so-ever =\. I can EXPLAIN it either.

RT> If it doesn't help try doing set enable_indexscan = false and
RT> capture the explain analyze for that.

Here it is:


------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN
Limit  (cost=41099.93..41099.96 rows=10 width=76) (actual time=6734.000..6734.000 rows=10 loops=1)
  ->  Sort  (cost=41099.93..41100.45 rows=208 width=76) (actual time=6734.000..6734.000 rows=10 loops=1)
        Sort Key: bv_books.vote_count
        ->  Merge Join  (cost=40229.21..41091.92 rows=208 width=76) (actual time=6078.000..6593.000 rows=1993 loops=1)
              Merge Cond: ("outer".book_id = "inner".book_id)
              ->  Sort  (cost=16817.97..16818.49 rows=208 width=4) (actual time=1062.000..1062.000 rows=1993 loops=1)
                    Sort Key: bv_bookgenres.book_id
                    ->  Seq Scan on bv_bookgenres  (cost=0.00..16809.96 rows=208 width=4) (actual time=0.000..1047.000
rows=1993loops=1) 
                          Filter: (genre_id = 5830)
              ->  Sort  (cost=23411.24..23841.04 rows=171918 width=76) (actual time=5016.000..5189.000 rows=171801
loops=1)
                    Sort Key: bv_books.book_id
                    ->  Seq Scan on bv_books  (cost=0.00..4048.18 rows=171918 width=76) (actual time=0.000..359.000
rows=171918loops=1) 
Total runtime: 6734.000 ms

------------------------------------------------------------------------------------------------------------------------------------------

Regards,
 Vitaly Belman

 ICQ: 1912453
 AIM: VitalyB1984
 MSN: tmdagent@hotmail.com
 Yahoo!: VitalyBe

Wednesday, May 26, 2004, 1:24:18 AM, you wrote:

MS> Vitaly,

MS> This looks like there might be some room for performance improvement...

 >> MS> I didn't see the table structure, but I assume
 >> MS> that the vote_avg and
 >> MS> vote_count fields are in bv_bookgenres.
 >>
 >> I didn't understand you. vote_avg is stored in bv_books.

MS> Ok.  That helps.  The confusion (on my end) came from the SELECT clause
MS> of the query you provided:

 >> SELECT     bv_books. * ,
 >>            vote_avg,
 >>            vote_count

MS> All fields from bv_books were selected (bv_books.*) along with vote_agv
MS> and vote_count.  My assumption was that vote_avg and vote_count were
MS> therefore not in bv_books.

MS> At any rate, a query with an IN clause should help quite a bit:

MS> SELECT     bv_books. *
MS> FROM       bv_books
MS> WHERE      bv_books.book_id IN (
MS>                SELECT book_id
MS>                FROM bv_genres
MS>                WHERE bv_bookgenres.genre_id = 5830
MS>                )
MS> ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

MS> Give it a whirl.

MS> Marty


MS> ---------------------------(end of
MS> broadcast)---------------------------
MS> TIP 6: Have you searched our list archives?

MS>                http://archives.postgresql.org


pgsql-performance by date:

Previous
From: Robert Treat
Date:
Subject: Re: where to find out when a table was last analyzed?
Next
From: Josh Berkus
Date:
Subject: Re: PostgreSQL caching