Thread: Why so slow?

Why so slow?

From

"Bealach-na Bo"

Date:

27 April 2006, 15:12:16

Hi folks,

Sorry to be bringing this up again, but I'm stumped by this problem
and hope you can shed some light on it.

I'm running postgresql 8.0 on a RLE4 server with 1.5 GB of RAM and a
Xenon 2 GHz CPU. The OS is bog standard and I've not done any kernel
tuning on it. The file system is also bog standard ext3 with no raid
of any kind. I know I could improve this aspect of the set up with
more disks and raid 0+1 etc, but the lack of performance that I'm
experiencing is not likely to be attributable to this sort of
thing. More likely it's my bad understanding of Postgresql - I hope
it's my bad understanding of Postgresql!!

My database is very simple and not by the book (normal forms etc. are
not all as they should be). My biggest table, by a factor of 3000 or
so is one of 4 tables in my tiny database.  It looks like this



\d job_log
                                     Table "job_log"
     Column     |            Type             |                    Modifiers
----------------+-----------------------------+--------------------------------------------------
job_log_id     | integer                     | not null default
nextval('job_log_id_seq'::text)
first_registry | timestamp without time zone |
customer_name  | character(50)               |
node_id        | integer                     |
job_type       | character(50)               |
job_name       | character(256)              |
job_start      | timestamp without time zone |
job_timeout    | interval                    |
job_stop       | timestamp without time zone |
nfiles_in_job  | integer                     |
status         | integer                     |
error_code     | smallint                    |
file_details   | text                        |
Indexes:
    "job_log_id_pkey" PRIMARY KEY, btree (job_log_id)
    "idx_customer_name_filter" btree (customer_name)
    "idx_job_name_filter" btree (job_name)
    "idx_job_start_filter" btree (job_start)
    "idx_job_stop_filter" btree (job_stop)
Check constraints:
    "job_log_status_check" CHECK (status = 0 OR status = 1 OR status = 8 OR
status = 9)
Foreign-key constraints:
    "legal_node" FOREIGN KEY (node_id) REFERENCES node(node_id)


The node table is tiny (2500 records).  What I'm pulling my hair out
over is that ANY Query, even something as simple as select count(*)
form job_log takes of the order of tens of minutes to complete. Just
now I'm trying to run an explain analyze on the above query, but so
far, it's taken 35min! with no result and there is a postgres process at
the top of top

What am I doing wrong??

Many thanks,

Bealach

Re: Why so slow?

From

Andreas Kretschmer

Date:

27 April 2006, 15:27:56

Bealach-na Bo <bealach_na_bo@hotmail.com> schrieb:
> The node table is tiny (2500 records).  What I'm pulling my hair out
> over is that ANY Query, even something as simple as select count(*)
> form job_log takes of the order of tens of minutes to complete. Just
> now I'm trying to run an explain analyze on the above query, but so
> far, it's taken 35min! with no result and there is a postgres process at
> the top of top
>
> What am I doing wrong??

The 'explain analyse' don't return a result, but it returns the query
plan and importance details, how PG works.

That's why you should paste the query and the 'explain analyse' -
output. This is very important.

Anyway, do you periodical vacuum your DB? My guess: no, and that's why
you have many dead rows.

20:26 < akretschmer|home> ??vacuum
20:26 < rtfm_please> For information about vacuum
20:26 < rtfm_please> see http://developer.postgresql.org/~wieck/vacuum_cost/
20:26 < rtfm_please> or http://www.postgresql.org/docs/current/static/sql-vacuum.html
20:26 < rtfm_please> or http://www.varlena.com/varlena/GeneralBits/116.php

20:27 < akretschmer|home> ??explain
20:27 < rtfm_please> For information about explain
20:27 < rtfm_please> see http://techdocs.postgresql.org/oscon2005/robert.treat/OSCON_Explaining_Explain_Public.sxi
20:27 < rtfm_please> or http://www.gtsm.com/oscon2003/toc.html
20:27 < rtfm_please> or http://www.postgresql.org/docs/current/static/sql-explain.html


Read this links for more informations about vacuum and explain.


HTH, Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Why so slow?

From

"Bealach-na Bo"

Date:

28 April 2006, 08:41:37

OK, here is a much more detailed output. I still don't quite
understand why simple queries like counting the number of rows in a
table should take minutes to complete. Surely, any performance
enhancement to be had by vacuuming is closely related to indexes
which, in turn, are closely related to sorting and searching. A simple
count of 365590 does not involve indexes (or does it??) and should not take
minutes. Should I be forcing the
way postgresql plans my queries?

Here is my first attempt at vacuum that got nowhere and I had to
cancel it.

----------psql session start----------
vacuum verbose analyze job_log;
INFO:  vacuuming "job_log"
INFO:  index "job_log_id_pkey" now contains 10496152 row versions in 59665
pages
DETAIL:  0 index row versions were removed.
28520 index pages have been deleted, 20000 are currently reusable.
CPU 1.44s/3.49u sec elapsed 33.71 sec.
INFO:  index "idx_job_stop_filter" now contains 10496152 row versions in
71149 pages
DETAIL:  0 index row versions were removed.
24990 index pages have been deleted, 20000 are currently reusable.
CPU 2.11s/3.61u sec elapsed 115.69 sec.
INFO:  index "idx_job_start_filter" now contains 10496152 row versions in
57891 pages
DETAIL:  0 index row versions were removed.
19769 index pages have been deleted, 19769 are currently reusable.
CPU 1.58s/3.44u sec elapsed 23.11 sec.
Cancel request sent
----------psql session finish----------


I thought that combining indexes would improve things and dropped the
3 separate ones above and created this one

----------psql session start----------
create index idx_job_log_filter on job_log(job_name,job_start,job_stop);

select count(*) from job_log;
count
--------
365590
(1 row)

explain analyse select count(*) from job_log;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1382171.88..1382171.88 rows=1 width=0) (actual
time=207011.882..207011.883 rows=1 loops=1)
   ->  Seq Scan on job_log  (cost=0.00..1381257.90 rows=365590 width=0)
(actual time=199879.510..206708.523 rows=365590 loops=1)
Total runtime: 207014.363 ms
(3 rows)
----------psql session finish----------

Then I tried another vacuum and decided to be very patient

----------psql session start----------
vacuum verbose analyze job_log;
INFO:  vacuuming "job_log"
INFO:  index "job_log_id_pkey" now contains 10496152 row versions in 59665
pages
DETAIL:  0 index row versions were removed.
28520 index pages have been deleted, 20000 are currently reusable.
CPU 1.39s/3.39u sec elapsed 24.19 sec.
INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.59s/0.20u sec elapsed 10.28 sec.
INFO:  "job_log": removed 2795915 row versions in 368091 pages
DETAIL:  CPU 33.30s/30.11u sec elapsed 2736.54 sec.
INFO:  index "job_log_id_pkey" now contains 7700230 row versions in 59665
pages
DETAIL:  2795922 index row versions were removed.
37786 index pages have been deleted, 20000 are currently reusable.
CPU 2.76s/6.45u sec elapsed 152.14 sec.
INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.52s/0.20u sec elapsed 7.75 sec.
INFO:  "job_log": removed 2795922 row versions in 220706 pages
DETAIL:  CPU 19.81s/17.92u sec elapsed 1615.95 sec.
INFO:  index "job_log_id_pkey" now contains 4904317 row versions in 59665
pages
DETAIL:  2795913 index row versions were removed.
45807 index pages have been deleted, 20000 are currently reusable.
CPU 2.22s/5.30u sec elapsed 129.02 sec.
INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.50s/0.22u sec elapsed 7.61 sec.
INFO:  "job_log": removed 2795913 row versions in 188139 pages
DETAIL:  CPU 17.03s/15.37u sec elapsed 1369.45 sec.
INFO:  index "job_log_id_pkey" now contains 2108405 row versions in 59665
pages
DETAIL:  2795912 index row versions were removed.
53672 index pages have been deleted, 20000 are currently reusable.
CPU 2.13s/4.57u sec elapsed 122.74 sec.
INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.53s/0.23u sec elapsed 8.24 sec.
INFO:  "job_log": removed 2795912 row versions in 187724 pages
DETAIL:  CPU 16.84s/15.22u sec elapsed 1367.50 sec.
INFO:  index "job_log_id_pkey" now contains 365590 row versions in 59665
pages
DETAIL:  1742815 index row versions were removed.
57540 index pages have been deleted, 20000 are currently reusable.
CPU 1.38s/2.85u sec elapsed 76.52 sec.
INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.54s/0.31u sec elapsed 7.99 sec.
INFO:  "job_log": removed 1742815 row versions in 143096 pages
DETAIL:  CPU 12.77s/11.75u sec elapsed 1046.10 sec.
INFO:  "job_log": found 12926477 removable, 365590 nonremovable row versions
in 1377602 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 7894754 unused item pointers.
0 pages are entirely empty.
CPU 124.49s/117.57u sec elapsed 8888.80 sec.
INFO:  vacuuming "pg_toast.pg_toast_17308"
INFO:  index "pg_toast_17308_index" now contains 130 row versions in 12
pages
DETAIL:  2543 index row versions were removed.
9 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.11 sec.
INFO:  "pg_toast_17308": removed 2543 row versions in 617 pages
DETAIL:  CPU 0.04s/0.05u sec elapsed 4.85 sec.
INFO:  "pg_toast_17308": found 2543 removable, 130 nonremovable row versions
in 650 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.06s/0.06u sec elapsed 5.28 sec.
INFO:  analyzing "rshuser.job_log"
INFO:  "job_log": scanned 3000 of 1377602 pages, containing 695 live rows
and 0 dead rows; 695 rows in sample, 319144 estimated total rows
VACUUM


explain analyse select count(*) from job_log;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1382171.88..1382171.88 rows=1 width=0) (actual
time=207267.094..207267.095 rows=1 loops=1)
   ->  Seq Scan on job_log  (cost=0.00..1381257.90 rows=365590 width=0)
(actual time=200156.539..206962.895 rows=365590 loops=1)
Total runtime: 207267.153 ms
(3 rows)

----------psql session finish----------


I also took snapshots of top output while I ran the above


----------top output start----------
Cpu(s):  0.7% us,  0.7% sy,  0.0% ni, 49.7% id, 48.5% wa,  0.5% hi,  0.0% si
Mem:   1554788k total,  1538268k used,    16520k free,     6220k buffers
Swap:  1020024k total,      176k used,  1019848k free,  1404280k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
3368 postgres  18   0 37492  29m  11m D  2.7  1.9   3:00.54 postmaster



Cpu(s):  0.7% us,  0.8% sy,  0.0% ni, 49.7% id, 48.5% wa,  0.3% hi,  0.0% si
Mem:   1554788k total,  1538580k used,    16208k free,     2872k buffers
Swap:  1020024k total,      176k used,  1019848k free,  1414908k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
3368 postgres  15   0 37492  29m  11m D  2.3  1.9   5:26.03 postmaster


Cpu(s):  0.5% us,  5.8% sy,  0.0% ni, 48.7% id, 44.4% wa,  0.5% hi,  0.0% si
Mem:   1554788k total,  1538196k used,    16592k free,     1804k buffers
Swap:  1020024k total,      176k used,  1019848k free,  1444576k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
3368 postgres  15   0 20956  13m  11m D 11.0  0.9   6:25.10 postmaster
----------top output end----------


I know my database needs a major redesign.  But I'm having a hard time
explaining the poor performance nevertheless.


Regards,

Bealach


>From: Andreas Kretschmer <akretschmer@spamfence.net>
>To: pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] Why so slow?
>Date: Thu, 27 Apr 2006 20:28:23 +0200
>
>Bealach-na Bo <bealach_na_bo@hotmail.com> schrieb:
> > The node table is tiny (2500 records).  What I'm pulling my hair out
> > over is that ANY Query, even something as simple as select count(*)
> > form job_log takes of the order of tens of minutes to complete. Just
> > now I'm trying to run an explain analyze on the above query, but so
> > far, it's taken 35min! with no result and there is a postgres process at
> > the top of top
> >
> > What am I doing wrong??
>
>The 'explain analyse' don't return a result, but it returns the query
>plan and importance details, how PG works.
>
>That's why you should paste the query and the 'explain analyse' -
>output. This is very important.
>
>Anyway, do you periodical vacuum your DB? My guess: no, and that's why
>you have many dead rows.
>
>20:26 < akretschmer|home> ??vacuum
>20:26 < rtfm_please> For information about vacuum
>20:26 < rtfm_please> see
>http://developer.postgresql.org/~wieck/vacuum_cost/
>20:26 < rtfm_please> or
>http://www.postgresql.org/docs/current/static/sql-vacuum.html
>20:26 < rtfm_please> or http://www.varlena.com/varlena/GeneralBits/116.php
>
>20:27 < akretschmer|home> ??explain
>20:27 < rtfm_please> For information about explain
>20:27 < rtfm_please> see
>http://techdocs.postgresql.org/oscon2005/robert.treat/OSCON_Explaining_Explain_Public.sxi
>20:27 < rtfm_please> or http://www.gtsm.com/oscon2003/toc.html
>20:27 < rtfm_please> or
>http://www.postgresql.org/docs/current/static/sql-explain.html
>
>
>Read this links for more informations about vacuum and explain.
>
>
>HTH, Andreas
>--
>Really, I'm not out to destroy Microsoft. That will just be a completely
>unintentional side effect.                              (Linus Torvalds)
>"If I was god, I would recompile penguin with --enable-fly."    (unknow)
>Kaufbach, Saxony, Germany, Europe.              N 51.05082�, E 13.56889�
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: Why so slow?

From

"Jim C. Nasby"

Date:

28 April 2006, 11:59:27

On Fri, Apr 28, 2006 at 11:41:06AM +0000, Bealach-na Bo wrote:
> OK, here is a much more detailed output. I still don't quite
> understand why simple queries like counting the number of rows in a
> table should take minutes to complete. Surely, any performance
> enhancement to be had by vacuuming is closely related to indexes
> which, in turn, are closely related to sorting and searching. A simple
> count of 365590 does not involve indexes (or does it??) and should not take
> minutes. Should I be forcing the
> way postgresql plans my queries?
>
> Here is my first attempt at vacuum that got nowhere and I had to
> cancel it.
>
> ----------psql session start----------
> vacuum verbose analyze job_log;
> INFO:  vacuuming "job_log"
> INFO:  index "job_log_id_pkey" now contains 10496152 row versions in 59665
> pages
> DETAIL:  0 index row versions were removed.
> 28520 index pages have been deleted, 20000 are currently reusable.
> CPU 1.44s/3.49u sec elapsed 33.71 sec.
> INFO:  index "idx_job_stop_filter" now contains 10496152 row versions in
> 71149 pages
> DETAIL:  0 index row versions were removed.
> 24990 index pages have been deleted, 20000 are currently reusable.
> CPU 2.11s/3.61u sec elapsed 115.69 sec.
> INFO:  index "idx_job_start_filter" now contains 10496152 row versions in
> 57891 pages
> DETAIL:  0 index row versions were removed.
> 19769 index pages have been deleted, 19769 are currently reusable.
> CPU 1.58s/3.44u sec elapsed 23.11 sec.
> Cancel request sent
> ----------psql session finish----------
>
>
> I thought that combining indexes would improve things and dropped the
> 3 separate ones above and created this one
>
> ----------psql session start----------
> create index idx_job_log_filter on job_log(job_name,job_start,job_stop);
>
> select count(*) from job_log;
> count
> --------
> 365590
> (1 row)

The above shows that the indexes contained 10M rows and 160M of dead
space each. That means you weren't vacuuming nearly enough.

> explain analyse select count(*) from job_log;
>                                                           QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=1382171.88..1382171.88 rows=1 width=0) (actual
> time=207011.882..207011.883 rows=1 loops=1)
>   ->  Seq Scan on job_log  (cost=0.00..1381257.90 rows=365590 width=0)
> (actual time=199879.510..206708.523 rows=365590 loops=1)
> Total runtime: 207014.363 ms
> (3 rows)
> ----------psql session finish----------
>
> Then I tried another vacuum and decided to be very patient
>
> ----------psql session start----------
> vacuum verbose analyze job_log;
> INFO:  vacuuming "job_log"
> INFO:  index "job_log_id_pkey" now contains 10496152 row versions in 59665
> pages
> DETAIL:  0 index row versions were removed.
> 28520 index pages have been deleted, 20000 are currently reusable.
> CPU 1.39s/3.39u sec elapsed 24.19 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.59s/0.20u sec elapsed 10.28 sec.
> INFO:  "job_log": removed 2795915 row versions in 368091 pages
> DETAIL:  CPU 33.30s/30.11u sec elapsed 2736.54 sec.
> INFO:  index "job_log_id_pkey" now contains 7700230 row versions in 59665
> pages
> DETAIL:  2795922 index row versions were removed.
> 37786 index pages have been deleted, 20000 are currently reusable.
> CPU 2.76s/6.45u sec elapsed 152.14 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.52s/0.20u sec elapsed 7.75 sec.
> INFO:  "job_log": removed 2795922 row versions in 220706 pages
> DETAIL:  CPU 19.81s/17.92u sec elapsed 1615.95 sec.
> INFO:  index "job_log_id_pkey" now contains 4904317 row versions in 59665
> pages
> DETAIL:  2795913 index row versions were removed.
> 45807 index pages have been deleted, 20000 are currently reusable.
> CPU 2.22s/5.30u sec elapsed 129.02 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.50s/0.22u sec elapsed 7.61 sec.
> INFO:  "job_log": removed 2795913 row versions in 188139 pages
> DETAIL:  CPU 17.03s/15.37u sec elapsed 1369.45 sec.
> INFO:  index "job_log_id_pkey" now contains 2108405 row versions in 59665
> pages
> DETAIL:  2795912 index row versions were removed.
> 53672 index pages have been deleted, 20000 are currently reusable.
> CPU 2.13s/4.57u sec elapsed 122.74 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.53s/0.23u sec elapsed 8.24 sec.
> INFO:  "job_log": removed 2795912 row versions in 187724 pages
> DETAIL:  CPU 16.84s/15.22u sec elapsed 1367.50 sec.
> INFO:  index "job_log_id_pkey" now contains 365590 row versions in 59665
> pages
> DETAIL:  1742815 index row versions were removed.
> 57540 index pages have been deleted, 20000 are currently reusable.
> CPU 1.38s/2.85u sec elapsed 76.52 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.54s/0.31u sec elapsed 7.99 sec.
> INFO:  "job_log": removed 1742815 row versions in 143096 pages
> DETAIL:  CPU 12.77s/11.75u sec elapsed 1046.10 sec.
> INFO:  "job_log": found 12926477 removable, 365590 nonremovable row
> versions in 1377602 pages

13M dead rows, and the table is 1.4M pages, or 11GB. No wonder it's
slow.

You need to run a vacuum full, and then you need to vacuum far more
often. If you're running 8.1, turn on autovacuum and cut each default
scale factor in half, to 0.2 and 0.1.

> DETAIL:  0 dead row versions cannot be removed yet.
> There were 7894754 unused item pointers.
> 0 pages are entirely empty.
> CPU 124.49s/117.57u sec elapsed 8888.80 sec.
> INFO:  vacuuming "pg_toast.pg_toast_17308"
> INFO:  index "pg_toast_17308_index" now contains 130 row versions in 12
> pages
> DETAIL:  2543 index row versions were removed.
> 9 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.11 sec.
> INFO:  "pg_toast_17308": removed 2543 row versions in 617 pages
> DETAIL:  CPU 0.04s/0.05u sec elapsed 4.85 sec.
> INFO:  "pg_toast_17308": found 2543 removable, 130 nonremovable row
> versions in 650 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.06s/0.06u sec elapsed 5.28 sec.
> INFO:  analyzing "rshuser.job_log"
> INFO:  "job_log": scanned 3000 of 1377602 pages, containing 695 live rows
> and 0 dead rows; 695 rows in sample, 319144 estimated total rows
> VACUUM
>
>
> explain analyse select count(*) from job_log;
>                                                           QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=1382171.88..1382171.88 rows=1 width=0) (actual
> time=207267.094..207267.095 rows=1 loops=1)
>   ->  Seq Scan on job_log  (cost=0.00..1381257.90 rows=365590 width=0)
> (actual time=200156.539..206962.895 rows=365590 loops=1)
> Total runtime: 207267.153 ms
> (3 rows)
>
> ----------psql session finish----------
>
>
> I also took snapshots of top output while I ran the above
>
>
> ----------top output start----------
> Cpu(s):  0.7% us,  0.7% sy,  0.0% ni, 49.7% id, 48.5% wa,  0.5% hi,  0.0% si
> Mem:   1554788k total,  1538268k used,    16520k free,     6220k buffers
> Swap:  1020024k total,      176k used,  1019848k free,  1404280k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3368 postgres  18   0 37492  29m  11m D  2.7  1.9   3:00.54 postmaster
>
>
>
> Cpu(s):  0.7% us,  0.8% sy,  0.0% ni, 49.7% id, 48.5% wa,  0.3% hi,  0.0% si
> Mem:   1554788k total,  1538580k used,    16208k free,     2872k buffers
> Swap:  1020024k total,      176k used,  1019848k free,  1414908k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3368 postgres  15   0 37492  29m  11m D  2.3  1.9   5:26.03 postmaster
>
>
> Cpu(s):  0.5% us,  5.8% sy,  0.0% ni, 48.7% id, 44.4% wa,  0.5% hi,  0.0% si
> Mem:   1554788k total,  1538196k used,    16592k free,     1804k buffers
> Swap:  1020024k total,      176k used,  1019848k free,  1444576k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3368 postgres  15   0 20956  13m  11m D 11.0  0.9   6:25.10 postmaster
> ----------top output end----------
>
>
> I know my database needs a major redesign.  But I'm having a hard time
> explaining the poor performance nevertheless.
>
>
> Regards,
>
> Bealach
>
>
> >From: Andreas Kretschmer <akretschmer@spamfence.net>
> >To: pgsql-performance@postgresql.org
> >Subject: Re: [PERFORM] Why so slow?
> >Date: Thu, 27 Apr 2006 20:28:23 +0200
> >
> >Bealach-na Bo <bealach_na_bo@hotmail.com> schrieb:
> >> The node table is tiny (2500 records).  What I'm pulling my hair out
> >> over is that ANY Query, even something as simple as select count(*)
> >> form job_log takes of the order of tens of minutes to complete. Just
> >> now I'm trying to run an explain analyze on the above query, but so
> >> far, it's taken 35min! with no result and there is a postgres process at
> >> the top of top
> >>
> >> What am I doing wrong??
> >
> >The 'explain analyse' don't return a result, but it returns the query
> >plan and importance details, how PG works.
> >
> >That's why you should paste the query and the 'explain analyse' -
> >output. This is very important.
> >
> >Anyway, do you periodical vacuum your DB? My guess: no, and that's why
> >you have many dead rows.
> >
> >20:26 < akretschmer|home> ??vacuum
> >20:26 < rtfm_please> For information about vacuum
> >20:26 < rtfm_please> see
> >http://developer.postgresql.org/~wieck/vacuum_cost/
> >20:26 < rtfm_please> or
> >http://www.postgresql.org/docs/current/static/sql-vacuum.html
> >20:26 < rtfm_please> or http://www.varlena.com/varlena/GeneralBits/116.php
> >
> >20:27 < akretschmer|home> ??explain
> >20:27 < rtfm_please> For information about explain
> >20:27 < rtfm_please> see
> >http://techdocs.postgresql.org/oscon2005/robert.treat/OSCON_Explaining_Explain_Public.sxi
> >20:27 < rtfm_please> or http://www.gtsm.com/oscon2003/toc.html
> >20:27 < rtfm_please> or
> >http://www.postgresql.org/docs/current/static/sql-explain.html
> >
> >
> >Read this links for more informations about vacuum and explain.
> >
> >
> >HTH, Andreas
> >--
> >Really, I'm not out to destroy Microsoft. That will just be a completely
> >unintentional side effect.                              (Linus Torvalds)
> >"If I was god, I would recompile penguin with --enable-fly."    (unknow)
> >Kaufbach, Saxony, Germany, Europe.              N 51.05082?, E 13.56889?
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 9: In versions below 8.0, the planner will ignore your desire to
> >       choose an index scan if your joining column's datatypes do not
> >       match
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Why so slow?

From

Alan Hodgson

Date:

28 April 2006, 12:03:31

On April 28, 2006 04:41 am, "Bealach-na Bo" <bealach_na_bo@hotmail.com>
wrote:
> INFO:  index "job_log_id_pkey" now contains 10496152 row versions in
> 59665 pages

See the 10496152  above?  That means you have 10496152 rows of data in your
table.  If those, only 365000 are alive.  That means you have basically
never vacuumed this table before, correct?

Every update or delete creates a new dead row.  count(*) scans the whole
table, dead rows included.  That's why it takes so long, the table acts as
though it has 10496152 rows when doing sequential scans.

Do a VACCUM FULL on it or CLUSTER it on on a index, both of which will empty
out all the free space and make it behave as it should.  Note; VACUUM FULL
will take quite a while and requires an exclusive lock on the table.
CLUSTER also requires an exclusive lock but should be a lot faster for this
table.

Oh, and get autovacuum setup and working, posthaste.

--
No long, complicated contracts. No actuarial tables to pore over.  Social
Security operates on a very simple principle: the politicians take your
money from you and squander it "

Re: Why so slow?

From

"Bealach-na Bo"

Date:

28 April 2006, 14:31:45

> > INFO:  index "job_log_id_pkey" now contains 10496152 row versions in
> > 59665 pages
>
>See the 10496152  above?  That means you have 10496152 rows of data in your
>table.  If those, only 365000 are alive.  That means you have basically
>never vacuumed this table before, correct?

Almost correct :| I have vacuumed this table monthly (obviously not nearly
enough), but it
is basically a log of events of which there are a very large number of each
day.

>
>Every update or delete creates a new dead row.  count(*) scans the whole
>table, dead rows included.  That's why it takes so long, the table acts as
>though it has 10496152 rows when doing sequential scans.

Oh! This explains my problems.

>
>Do a VACCUM FULL on it or CLUSTER it on on a index, both of which will
>empty
>out all the free space and make it behave as it should.  Note; VACUUM FULL
>will take quite a while and requires an exclusive lock on the table.
>CLUSTER also requires an exclusive lock but should be a lot faster for this
>table.
>
>Oh, and get autovacuum setup and working, posthaste.

The exclusive lock is going to cause problems for me since the table is very
active. Is there a way of getting around that or do I need to schedule the
application that accesses this table?

I'm running version 8.0. Is there autovacuum for this version too?

Regards,
Bealach

Re: Why so slow?

From

"Bealach-na Bo"

Date:

28 April 2006, 14:37:50

>The above shows that the indexes contained 10M rows and 160M of dead
>space each. That means you weren't vacuuming nearly enough.

How is it that a row in the table can grow to a size far exceeding the sum
of the maximum sized of the fields it consists of?

>13M dead rows, and the table is 1.4M pages, or 11GB. No wonder it's
>slow.

I had  a look at the disk files and they are HUGE indeed.

>
>You need to run a vacuum full, and then you need to vacuum far more
>often. If you're running 8.1, turn on autovacuum and cut each default
>scale factor in half, to 0.2 and 0.1.

I'm not runing 8.1. Is there a way of doing this in 8.0 or do I need to
write a shell script + cron job?

Regards,

Bealach

Re: Why so slow?

From

Alan Hodgson

Date:

28 April 2006, 14:43:45

On April 28, 2006 10:31 am, "Bealach-na Bo" <bealach_na_bo@hotmail.com>
wrote:
> The exclusive lock is going to cause problems for me since the table is
> very active. Is there a way of getting around that or do I need to
> schedule the application that accesses this table?

If you don't need access to the old data constantly:

 - copy the live data to a new table
 - TRUNCATE the old table (which needs an exclusive lock but is very fast)
 - insert the data back in
 - for an event log I would imagine this could work

If you do need the old data while the app is running then I'm not sure what
you can do.

>
> I'm running version 8.0. Is there autovacuum for this version too?

There is an autovacuum daemon in contrib; it's more annoying to setup and
keep running than the one built into 8.1, but it works OK.

--
Eat right. Exercise regularly. Die anyway.

Re: Why so slow?

From

Bruno Wolff III

Date:

28 April 2006, 16:00:05

On Fri, Apr 28, 2006 at 17:37:30 +0000,
  Bealach-na Bo <bealach_na_bo@hotmail.com> wrote:
> >The above shows that the indexes contained 10M rows and 160M of dead
> >space each. That means you weren't vacuuming nearly enough.
>
> How is it that a row in the table can grow to a size far exceeding the sum
> of the maximum sized of the fields it consists of?

Because unless you run vacuum, the old deleted rows are not reused. Those
rows cannot be deleted immediately, because the rows may be visible to
other transactions. Periodic vacuums are used to find deleted rows which
are no longer visible to any transactions.

You probably want to read the following:
http://developer.postgresql.org/docs/postgres/routine-vacuuming.html

Re: Why so slow?

From

K C Lau

Date:

28 April 2006, 23:22:12

At 03:00 06/04/29, Bruno Wolff III wrote:
>On Fri, Apr 28, 2006 at 17:37:30 +0000,
>   Bealach-na Bo <bealach_na_bo@hotmail.com> wrote:
> > >The above shows that the indexes contained 10M rows and 160M of dead
> > >space each. That means you weren't vacuuming nearly enough.
> >
> > How is it that a row in the table can grow to a size far exceeding the sum
> > of the maximum sized of the fields it consists of?
>
>Because unless you run vacuum, the old deleted rows are not reused. Those
>rows cannot be deleted immediately, because the rows may be visible to
>other transactions. Periodic vacuums are used to find deleted rows which
>are no longer visible to any transactions.
>
>You probably want to read the following:
>http://developer.postgresql.org/docs/postgres/routine-vacuuming.html

Would recycling dead tuples on the fly (mentioned in p.14 in the article
http://www.postgresql.org/files/developer/transactions.pdf ) significantly
reduce the need for periodic vacuums?

Without knowing the internals, I have this simplistic idea: if Postgres
maintains the current lowest transaction ID for all active transactions, it
probably could recycle dead tuples on the fly. The current lowest
transaction ID could be maintained in a doubly linked list with maximum
<max_connections> entries. A backward link in the tuple header might be
needed too.

Any comments?

Cheers,
KC.

Re: Why so slow?

From

Tom Lane

Date:

28 April 2006, 23:39:46

K C Lau <kclau60@netvigator.com> writes:
> Without knowing the internals, I have this simplistic idea: if Postgres
> maintains the current lowest transaction ID for all active transactions, it
> probably could recycle dead tuples on the fly.

[ yawn... ]  Yes, we've heard that before.  The hard part is getting rid
of index entries.

            regards, tom lane

Re: Why so slow?

From

K C Lau

Date:

29 April 2006, 00:19:29

At 10:39 06/04/29, Tom Lane wrote:
>K C Lau <kclau60@netvigator.com> writes:
> > Without knowing the internals, I have this simplistic idea: if Postgres
> > maintains the current lowest transaction ID for all active
> transactions, it
> > probably could recycle dead tuples on the fly.
>
>[ yawn... ]  Yes, we've heard that before.  The hard part is getting rid
>of index entries.
>
>                         regards, tom lane

I apologize for simplistic ideas again. I presume that the equivalent tuple
header information is not maintained for index entries. What if they are,
probably only for the most commonly used index types to allow recycling
where possible? The extra space required would be recycled too. It would
probably also help save a trip to the tuple data pages to determine the
validity of index entries during index scans.

Cheers,
KC.

Re: Why so slow?

From

Michael Stone

Date:

29 April 2006, 08:29:42

On Sat, Apr 29, 2006 at 11:18:10AM +0800, K C Lau wrote:
>I apologize for simplistic ideas again. I presume that the equivalent tuple
>header information is not maintained for index entries. What if they are,
>probably only for the most commonly used index types to allow recycling
>where possible?

Alternatively, you could just run vacuum...

Mike Stone

Re: Why so slow?

From

"Bealach-na Bo"

Date:

30 April 2006, 07:30:09

>If you don't need access to the old data constantly:
>
>  - copy the live data to a new table
>  - TRUNCATE the old table (which needs an exclusive lock but is very fast)
>  - insert the data back in
>  - for an event log I would imagine this could work

Obtaining exclusive locks on this table is very difficult, or rather,
will make life very difficult for others, so I'm averse to running
vacuum full or truncate (though I don't know how fast truncate is)
on a regular basis. I might just get away with running it
once a month, but no more.

(Lazy) vacuum, however is a much more palatable option. But (lazy)
vacuum does not always reclaim space. Will this affect performance and
does this mean that a full vacuum is unavoidable? Or can I get away
with daily (lazy) vacuums?  Disk space is not an issue for me, but
performance is a BIG issue. Of course, I realize that I could improve
the latter with better schema design - I'm working on a new schema,
but can't kill this one yet :|.

Regards,

Bealach

Re: Why so slow?

From

Bill Moran

Date:

30 April 2006, 11:03:59

"Bealach-na Bo" <bealach_na_bo@hotmail.com> wrote:

> >If you don't need access to the old data constantly:
> >
> >  - copy the live data to a new table
> >  - TRUNCATE the old table (which needs an exclusive lock but is very fast)
> >  - insert the data back in
> >  - for an event log I would imagine this could work
>
> Obtaining exclusive locks on this table is very difficult, or rather,
> will make life very difficult for others, so I'm averse to running
> vacuum full or truncate (though I don't know how fast truncate is)
> on a regular basis. I might just get away with running it
> once a month, but no more.
>
> (Lazy) vacuum, however is a much more palatable option. But (lazy)
> vacuum does not always reclaim space. Will this affect performance and
> does this mean that a full vacuum is unavoidable? Or can I get away
> with daily (lazy) vacuums?  Disk space is not an issue for me, but
> performance is a BIG issue. Of course, I realize that I could improve
> the latter with better schema design - I'm working on a new schema,
> but can't kill this one yet :|.

My understanding is basically that if you vacuum with the correct
frequency, you'll never need to vacuum full.  This is why the
autovacuum system is so nice, it adjusts the frequency of vacuum according
to how much use the DB is getting.

The problem is that if you get behind, plain vacuum is unable to get things
caught up again, and a vacuum full is required to recover disk space.

At this point, it seems like you need to do 2 things:
1) Schedule lazy vacuum to run, or configure autovacuum.
2) Schedule some downtime to run "vacuum full" to recover some disk space.

#2 only needs done once to get you back on track, assuming that #1 is
done properly.

A little bit of wasted space in the database is OK, and lazy vacuum done
on a reasonable schedule will keep the level of wasted space to an
acceptable level.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: Why so slow?

From

Markus Schaber

Date:

02 May 2006, 14:16:26

Hi, Bill,

Bill Moran wrote:

> My understanding is basically that if you vacuum with the correct
> frequency, you'll never need to vacuum full.  This is why the
> autovacuum system is so nice, it adjusts the frequency of vacuum according
> to how much use the DB is getting.

Additonally, the "free_space_map" setting has to be high enough, it has
to cover enough space to put in all pages that get dead rows between two
vacuum runs.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Why so slow?

From

"Jim C. Nasby"

Date:

02 May 2006, 17:13:52

On Sat, Apr 29, 2006 at 11:18:10AM +0800, K C Lau wrote:
>
> At 10:39 06/04/29, Tom Lane wrote:
> >K C Lau <kclau60@netvigator.com> writes:
> >> Without knowing the internals, I have this simplistic idea: if Postgres
> >> maintains the current lowest transaction ID for all active
> >transactions, it
> >> probably could recycle dead tuples on the fly.
> >
> >[ yawn... ]  Yes, we've heard that before.  The hard part is getting rid
> >of index entries.
> >
> >                        regards, tom lane
>
> I apologize for simplistic ideas again. I presume that the equivalent tuple
> header information is not maintained for index entries. What if they are,
> probably only for the most commonly used index types to allow recycling
> where possible? The extra space required would be recycled too. It would
> probably also help save a trip to the tuple data pages to determine the
> validity of index entries during index scans.

You should read through the -hacker archives, most of this stuff has
been gone over multiple times.

Storing tuple header info in indexes would be a huge drawback, as it
would result in about 20 extra bytes per index entry.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Why so slow?

From

"Jim C. Nasby"

Date:

02 May 2006, 17:17:52

On Sun, Apr 30, 2006 at 10:03:46AM -0400, Bill Moran wrote:
> At this point, it seems like you need to do 2 things:
> 1) Schedule lazy vacuum to run, or configure autovacuum.
> 2) Schedule some downtime to run "vacuum full" to recover some disk space.
>
> #2 only needs done once to get you back on track, assuming that #1 is
> done properly.

You'll also want to reindex since vacuum full won't clean the indexes
up. You might also want to read
http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10087 and
http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10116.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Why so slow?

From

Bill Moran

Date:

02 May 2006, 20:28:51

"Jim C. Nasby" <jnasby@pervasive.com> wrote:

> On Sun, Apr 30, 2006 at 10:03:46AM -0400, Bill Moran wrote:
> > At this point, it seems like you need to do 2 things:
> > 1) Schedule lazy vacuum to run, or configure autovacuum.
> > 2) Schedule some downtime to run "vacuum full" to recover some disk space.
> >
> > #2 only needs done once to get you back on track, assuming that #1 is
> > done properly.
>
> You'll also want to reindex since vacuum full won't clean the indexes
> up. You might also want to read
> http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10087 and
> http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10116.

Reindexing is in a different class than vacuuming.  Neglecting to vacuum
creates a problem that gets worse and worse as time goes on.  Neglecting
to reindex does not create an infinately growing problem, since empty
index pages are recycled automatically.  It's also conceivable that some
usage patterns don't need to reindex at all.

http://www.postgresql.org/docs/8.1/interactive/routine-reindex.html

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: Why so slow?

From

Michael Stone

Date:

03 May 2006, 08:22:40

On Tue, May 02, 2006 at 07:28:34PM -0400, Bill Moran wrote:
>Reindexing is in a different class than vacuuming.

Kinda, but it is in the same class as vacuum full. If vacuum neglect (or
dramatic change in usage) has gotten you to the point of 10G of overhead
on a 2G table you can get a dramatic speedup if you vacuum full, by
dumping a lot of unused space. But in that case you might have a similar
amount of overhead in indices, which isn't going to go away unless you
reindex. In either case the unused rows will be reused as needed, but if
you know you aren't going to need the space again anytime soon you might
need to vacuum full/reindex.

Mike Stone

Re: Why so slow?

From

"Jim C. Nasby"

Date:

04 May 2006, 14:54:22

On Wed, May 03, 2006 at 07:22:21AM -0400, Michael Stone wrote:
> On Tue, May 02, 2006 at 07:28:34PM -0400, Bill Moran wrote:
> >Reindexing is in a different class than vacuuming.
>
> Kinda, but it is in the same class as vacuum full. If vacuum neglect (or
> dramatic change in usage) has gotten you to the point of 10G of overhead
> on a 2G table you can get a dramatic speedup if you vacuum full, by
> dumping a lot of unused space. But in that case you might have a similar
s/might/will/
> amount of overhead in indices, which isn't going to go away unless you
> reindex. In either case the unused rows will be reused as needed, but if
> you know you aren't going to need the space again anytime soon you might
> need to vacuum full/reindex.

You can also do a CLUSTER on the table, which rewrites both the table
and all the indexes from scratch. But there was some kind of issue with
doing that that was fixed in HEAD, but I don't think it's been
back-ported. I also don't remember exactly what the issue was... :/
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461