Various performance questions - Mailing list pgsql-performance

From Dror Matalon
Subject Various performance questions
Date
Msg-id 20031026194449.GD2979@rlx11.zapatec.com
Whole thread Raw
Responses Re: Various performance questions  (Greg Stark <gsstark@mit.edu>)
Re: Various performance questions  (Tarhon-Onu Victor <mituc@iasi.rdsnet.ro>)
List pgsql-performance
Hi,

We're in the process of setting up a new database server. The
application is an online rss aggregator which you can see at
www.fastbuzz.com (still running with the old hardware).

The new machine is a dual Xeon with 2 Gigs of ram

The OS is freebsd 4.9.

shared_buffers = 10000
sort_mem = 32768
effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192

1. While it seems to work correctly, I'm unclear on why this number is
correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
seems like the number should be more like 1 - 1.5 Gigs.

2.  The main performance challenges are with the items table which has around
five million rows and grows at the rate of more than 100,000 rows a day.

If I do a select count(*) from the items table it take 55 - 60 seconds
to execute. I find it interesting that it takes that long whether it's
doing it the first time and fetching the pages from disk or on
subsequent request where it fetches the pages from memory.
I know that it's not touching the disks because I'm running an iostat in
a different window. Here's the explain analyze:

explain analyze select count(*) from items;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245377.53..245377.53 rows=1 width=0) (actual time=55246.035..55246.040 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..233100.62 rows=4910762 width=0)
(actual time=0.054..30220.641 rows=4910762 loops=1)
 Total runtime: 55246.129 ms
(3 rows)

and the number of pages:

select relpages from pg_class where relname = 'items';
 relpages
----------
   183993


So does it make sense that it would take close to a minute to count the 5 million rows
even if all pages are in memory?

3. Relpages is 183993 so file size should be  183993*8192 = 1507270656,
roughly 1.5 gig. The actual disk size is 1073741824 or roughly 1 gig.
Why the difference?



4. If I put a certain filter/condition on the query it tells me that it's doing
a sequential scan, and yet it takes less time than a full sequential
scan:

explain analyze select count(*) from items where channel < 5000;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057
loops=1)
         Filter: (channel < 5000)
 Total runtime: 26224.703 ms


How can it do a sequential scan and apply a filter to it in less time
than the full sequential scan? Is it actually using an index without
really telling me?


Here's the structure of the items table

    Column     |           Type           | Modifiers
---------------+--------------------------+-----------
 articlenumber | integer                  | not null
 channel       | integer                  | not null
 title         | character varying        |
 link          | character varying        |
 description   | character varying        |
 comments      | character varying(500)   |
 dtstamp       | timestamp with time zone |
 signature     | character varying(32)    |
 pubdate       | timestamp with time zone |
Indexes:
    "item_channel_link" btree (channel, link)
    "item_created" btree (dtstamp)
    "item_signature" btree (signature)
    "items_channel_article" btree (channel, articlenumber)
    "items_channel_tstamp" btree (channel, dtstamp)


5. Any other comments/suggestions on the above setup.

Thanks,

Dror

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

pgsql-performance by date:

Previous
From: Reece Hart
Date:
Subject: explicit casting required for index use
Next
From: Yonatan Goraly
Date:
Subject: Re: Slow performance with no apparent reason