Performance problem with table containing a lot of text (blog) - Mailing list pgsql-performance

From Kari Lavikka
Subject Performance problem with table containing a lot of text (blog)
Date
Msg-id Pine.HPX.4.62.0708282016250.3324@purple.bdb.fi
Whole thread Raw
Responses Re: Performance problem with table containing a lot of text (blog)  (Dan Harris <fbsd@drivefaster.net>)
Re: Performance problem with table containing a lot of text (blog)  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-performance
Hello!

Some background info.. We have a blog table that contains about eight
million blog entries. Average length of an entry is 1200 letters. Because
each 8k page can accommodate only a few entries, every query that involves
several entries causes several random seeks to disk.  We are having
problems with queries like:

1) give me a list of months when I have written someting
2) give me id's of entries I have written on month X year X
3) give me the number of blog entries my friends have written since last
    time

Clustering would probably decrease random seeks but it is not an option.
It locks the table and operation would take "some" time. It should also be
done periodically to maintain clustering.

I guess that file system cache gets filled with text contents of blog
entries although they are totally useless for queries like these. Contents
of individual blog entries are cached to memcached on application level
anyway. There's rarely any need to fetch them from database.

It would be nice if I could flag a column to be toasted always, regardless
of it's length.

Because there isn't such option maybe I should create a separate table for
blog text content. Does anybody have better ideas for this? :)

Thanks!


P.S. Here's a plan for query #3. Users can have several bookmark groups
they are following. User can limit visibility of an entry to some of
his/her bookmark group. Those are not any kind of bottlenecks anyway...

  Sort  (cost=34112.60..34117.94 rows=2138 width=14)
    Sort Key: count(*), upper((u.nick)::text)
    ->  HashAggregate  (cost=33962.28..33994.35 rows=2138 width=14)
          ->  Nested Loop  (cost=8399.95..33946.24 rows=2138 width=14)
                ->  Nested Loop  (cost=8399.95..9133.16 rows=90 width=22)
                      ->  HashAggregate  (cost=8399.95..8402.32 rows=237 width=8)
                            ->  Nested Loop  (cost=0.00..8395.99 rows=792 width=8)
                                  ->  Index Scan using user_bookmark_uid on user_bookmark ub  (cost=0.00..541.39
rows=2368width=12) 
                                        Index Cond: (uid = 256979)
                                  ->  Index Scan using user_bookmark_group_pkey on user_bookmark_group bg
(cost=0.00..3.30rows=1 width=4) 
                                        Index Cond: ("outer".bookmark_group_id = bg.bookmark_group_id)
                                        Filter: (("type" >= 0) AND ("type" <= 1) AND (trace_blog = 'y'::bpchar))
                      ->  Index Scan using users_uid_accepted_only on users u  (cost=0.00..3.06 rows=1 width=14)
                            Index Cond: (u.uid = "outer".marked_uid)
                ->  Index Scan using blog_entry_uid_beid on blog_entry be  (cost=0.00..275.34 rows=24 width=8)
                      Index Cond: ((be.uid = "outer".marked_uid) AND (COALESCE("outer".last_seen_blog_entry_id, 0) <
be.blog_entry_id))
                      Filter: ((visibility = 'p'::bpchar) AND ((status = 'p'::bpchar) OR (status = 'l'::bpchar)) AND
((bookmark_group_idIS NULL) OR (subplan))) 
                      SubPlan
                        ->  Index Scan using user_bookmark_pkey on user_bookmark fub  (cost=0.00..3.42 rows=1 width=0)
                              Index Cond: ((bookmark_group_id = $0) AND (marked_uid = 256979))

P.S. That particular user has quite many unread entries though...

     |\__/|
     ( oo )    Kari Lavikka - tuner@bdb.fi - (050) 380 3808
__ooO(  )Ooo_______ _____ ___ _ _  _   _    _      _                  _
       ""

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fwd: 8.2 Query 10 times slower than 8.1 (view-heavy)
Next
From: Dan Harris
Date:
Subject: Re: Performance problem with table containing a lot of text (blog)