Thread: Why will vacuum not end?

Why will vacuum not end?

From
"Shea,Dan [CIS]"
Date:
This vacuum is running  a marathon.  Why will it not end and show me free
space map INFO?  We have deleted a lot of data and I would like to be
confident that these deletions will be used as free space, rather than
creating more table files.



PWFPM_DEV=# select now();vacuum verbose  forecastelement;select now();
              now
-------------------------------
 2004-04-14 18:36:13.725285+00
(1 row)

INFO:  vacuuming "public.forecastelement"
INFO:  index "forecastelement_rwv_idx" now contains 473380072 row versions
in 4986653 pages
DETAIL:  5592106 index row versions were removed.
44688 index pages have been deleted, 1 are currently reusable.
CPU 4942.30s/336.27u sec elapsed 74710.07 sec.
INFO:  "forecastelement": removed 5592106 row versions in 126370 pages
DETAIL:  CPU 58.43s/16.99u sec elapsed 366.24 sec.
INFO:  index "forecastelement_rwv_idx" now contains 472296119 row versions
in 5027529 pages
DETAIL:  5592097 index row versions were removed.
89120 index pages have been deleted, 0 are currently reusable.
CPU 4260.08s/327.29u sec elapsed 59592.38 sec.
INFO:  "forecastelement": removed 5592097 row versions in 124726 pages
DETAIL:  CPU 33.38s/14.21u sec elapsed 210.36 sec.
INFO:  index "forecastelement_rwv_idx" now contains 467784889 row versions
in 5037914 pages
DETAIL:  5592089 index row versions were removed.
134286 index pages have been deleted, 0 are currently reusable.
CPU 4185.86s/318.19u sec elapsed 57048.65 sec.
INFO:  "forecastelement": removed 5592089 row versions in 121657 pages
DETAIL:  CPU 51.19s/14.19u sec elapsed 238.31 sec.
INFO:  index "forecastelement_rwv_idx" now contains 462960132 row versions
in 5039886 pages
DETAIL:  5592067 index row versions were removed.
179295 index pages have been deleted, 0 are currently reusable.
CPU 4002.76s/313.63u sec elapsed 54806.09 sec.
INFO:  "forecastelement": removed 5592067 row versions in 122510 pages
DETAIL:  CPU 25.32s/14.47u sec elapsed 187.73 sec.
INFO:  index "forecastelement_rwv_idx" now contains 457555142 row versions
in 5041631 pages
DETAIL:  5592085 index row versions were removed.
224480 index pages have been deleted, 0 are currently reusable.
CPU 4149.42s/310.94u sec elapsed 55880.65 sec.
INFO:  "forecastelement": removed 5592085 row versions in 122500 pages
DETAIL:  CPU 16.70s/14.47u sec elapsed 180.27 sec.
INFO:  index "forecastelement_rwv_idx" now contains 452191660 row versions
in 5044414 pages
DETAIL:  5592089 index row versions were removed.
269665 index pages have been deleted, 0 are currently reusable.
CPU 4213.10s/304.61u sec elapsed 55159.36 sec.
INFO:  "forecastelement": removed 5592089 row versions in 122663 pages
DETAIL:  CPU 37.28s/14.63u sec elapsed 206.96 sec.
INFO:  index "forecastelement_rwv_idx" now contains 446807778 row versions
in 5046541 pages
DETAIL:  5592077 index row versions were removed.
314747 index pages have been deleted, 0 are currently reusable.
CPU 4039.49s/297.15u sec elapsed 55086.56 sec.
INFO:  "forecastelement": removed 5592077 row versions in 122558 pages
DETAIL:  CPU 20.21s/14.74u sec elapsed 227.53 sec.
















Re: Why will vacuum not end?

From
Christopher Kings-Lynne
Date:
> This vacuum is running  a marathon.  Why will it not end and show me free
> space map INFO?  We have deleted a lot of data and I would like to be
> confident that these deletions will be used as free space, rather than
> creating more table files.

Does another postgres query running have a lock on that table?

Chris


Re: Why will vacuum not end?

From
"Shea,Dan [CIS]"
Date:
No, but data is constantly being inserted by userid scores.  It is postgres
runnimg the vacuum.
Dan.

-----Original Message-----
From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au]
Sent: Tuesday, April 20, 2004 12:02 AM
To: Shea,Dan [CIS]
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why will vacuum not end?


> This vacuum is running  a marathon.  Why will it not end and show me free
> space map INFO?  We have deleted a lot of data and I would like to be
> confident that these deletions will be used as free space, rather than
> creating more table files.

Does another postgres query running have a lock on that table?

Chris

Re: Why will vacuum not end?

From
Bill Moran
Date:
Shea,Dan [CIS] wrote:
> No, but data is constantly being inserted by userid scores.  It is postgres
> runnimg the vacuum.
> Dan.
>
> -----Original Message-----
> From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au]
> Sent: Tuesday, April 20, 2004 12:02 AM
> To: Shea,Dan [CIS]
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Why will vacuum not end?
>
>>This vacuum is running  a marathon.  Why will it not end and show me free
>>space map INFO?  We have deleted a lot of data and I would like to be
>>confident that these deletions will be used as free space, rather than
>>creating more table files.
>
> Does another postgres query running have a lock on that table?

This may be a dumb question (but only because I don't know the answer)

Doesn't/shouldn't vacuum have some kind of timeout so if a table is locked
it will give up eventually (loudly complaining when it does so)?

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: Why will vacuum not end?

From
Christopher Kings-Lynne
Date:
> No, but data is constantly being inserted by userid scores.  It is postgres
> runnimg the vacuum.
> Dan.

Well, inserts create some locks - perhaps that's the problem...

Otherwise, check the pg_locks view to see if you can figure it out.

Chris


Re: Why will vacuum not end?

From
"Shea,Dan [CIS]"
Date:
PWFPM_DEV=# select * from pg_locks;
 relation | database | transaction |  pid  |           mode           |
granted
----------+----------+-------------+-------+--------------------------+-----
----
    17472 |    17347 |             |  2618 | ShareUpdateExclusiveLock | t
          |          |    10858533 | 28778 | ExclusiveLock            | t
    17472 |    17347 |             |  2618 | ShareUpdateExclusiveLock | t
          |          |    10803814 |  2618 | ExclusiveLock            | t
    16759 |    17347 |             | 28778 | AccessShareLock          | t
(5 rows)

PWFPM_DEV=#

17347 is the database PWFPM_DEV iod, The pids are below

[root@murphy root]# ps -ef |grep 28778|grep -v "grep"
postgres 28778   504  0 18:06 ?        00:00:00 postgres: scores PWFPM_DEV
[local] idle
[root@murphy root]# ps -ef |grep 2618|grep -v "grep"
postgres  2618   504  8 Apr22 ?        02:31:00 postgres: postgres PWFPM_DEV
[local] VACUUM
[root@murphy root]#
A vacuum is running now.  I restarted the database, set vacuum_mem =
'196608'; and started a new vacuum.  I also stopped inserting into the
database.
I hoping I will get some results.

PWFPM_DEV=# select now();vacuum verbose analyze forecastelement;select
now();
              now
-------------------------------
 2004-04-22 13:38:02.083592+00
(1 row)

INFO:  vacuuming "public.forecastelement"
INFO:  index "forecastelement_rwv_idx" now contains 391385895 row versions
in 5051132 pages
DETAIL:  27962015 index row versions were removed.
771899 index pages have been deleted, 496872 are currently reusable.
CPU 4499.54s/385.76u sec elapsed 55780.91 sec.
INFO:  "forecastelement": removed 33554117 row versions in 737471 pages
DETAIL:  CPU 135.61s/83.99u sec elapsed 1101.26 sec.
-----Original Message-----
From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au]
Sent: Tuesday, April 20, 2004 9:26 PM
To: Shea,Dan [CIS]
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why will vacuum not end?


> No, but data is constantly being inserted by userid scores.  It is
postgres
> runnimg the vacuum.
> Dan.

Well, inserts create some locks - perhaps that's the problem...

Otherwise, check the pg_locks view to see if you can figure it out.

Chris


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Why will vacuum not end?

From
Josh Berkus
Date:
Guys,

> Well, inserts create some locks - perhaps that's the problem...
>
> Otherwise, check the pg_locks view to see if you can figure it out.

FWIW, I've had this happen a couple of times, too.   Unfortunately, it's
happend in the middle of the day so that I had to cancel the processes and
get the system back to normal in too much of a hurry to consider documenting
it.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Why will vacuum not end?

From
"Shea,Dan [CIS]"
Date:
Josh, how long should a vacuum take on a 87 GB table with a 39 GB index?

I do not think that the verbose option of vacuum is verbose enough.
The vacuum keeps redoing the index, but there is no indication as to why it
is doing this.

I see alot of activity with transaction logs being recycled (15 to 30 every
3 to 20 minutes).
Is the vacuum causing this?


-----Original Message-----
From: Josh Berkus [mailto:josh@agliodbs.com]
Sent: Friday, April 23, 2004 2:48 PM
To: Shea,Dan [CIS]; 'Christopher Kings-Lynne'
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why will vacuum not end?


Guys,

> Well, inserts create some locks - perhaps that's the problem...
>
> Otherwise, check the pg_locks view to see if you can figure it out.

FWIW, I've had this happen a couple of times, too.   Unfortunately, it's
happend in the middle of the day so that I had to cancel the processes and
get the system back to normal in too much of a hurry to consider documenting

it.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Why will vacuum not end?

From
Josh Berkus
Date:
Dan,

> Josh, how long should a vacuum take on a 87 GB table with a 39 GB index?

Depends:
-- What's your disk support?
-- VACUUM, VACUUM ANALYZE, or VACUUM FULL?
-- What's your vacuum_mem setting?
-- What are checkpoint and wal settings?

> I see alot of activity with transaction logs being recycled (15 to 30 every
> 3 to 20 minutes).
> Is the vacuum causing this?

Probably, yes.   How many checkpoint_buffers do you allow?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Why will vacuum not end?

From
Manfred Koizar
Date:
On Sat, 24 Apr 2004 10:45:40 -0400, "Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>
wrote:
>[...] 87 GB table with a 39 GB index?

>The vacuum keeps redoing the index, but there is no indication as to why it
>is doing this.

If VACUUM finds a dead tuple, if does not immediately remove index
entries pointing to that tuple.  It instead collects such tuple ids and
later does a bulk delete, i.e. scans the whole index and removes all
index items pointing to one of those tuples.  The number of tuple ids
that can be remembered is controlled by vacuum_mem: it is

    VacuumMem * 1024 / 6

Whenever this number of dead tuples has been found, VACUUM scans the
index (which takes ca. 60000 seconds, more than 16 hours), empties the
list and continues to scan the heap ...

From the number of dead tuples you can estimate how often your index
will be scanned.  If dead tuples are evenly distributed, expect there to
be 15 index scans with your current vacuum_mem setting of 196608.  So
your VACUUM will run for 11 days :-(

OTOH this would mean that there are 500 million dead tuples.  Do you
think this is possible?

Servus
 Manfred

Re: Why will vacuum not end?

From
"Shea,Dan [CIS]"
Date:
Manfred is indicating the reason it is taking so long is due to the number
of dead tuples in my index and the vacuum_mem setting.
The last delete that I did before starting a vacuum had 219,177,133
deletions.
Dan.
>Dan,

>> Josh, how long should a vacuum take on a 87 GB table with a 39 GB index?

>Depends:
>-- What's your disk support?

>-- VACUUM, VACUUM ANALYZE, or VACUUM FULL?
VACUUM ANALYZE
>-- What's your vacuum_mem setting?
set vacuum_mem = '196608'
#fsync = true                   # turns forced synchronization on or off
#wal_sync_method = fsync
>-- What are checkpoint and wal settings?
wal_buffers = 64
checkpoint_segments = 30
checkpoint_timeout = 300

>> I see alot of activity with transaction logs being recycled (15 to 30
every
>> 3 to 20 minutes).
>> Is the vacuum causing this?

>Probably, yes.   How many checkpoint_buffers do you allow?
I am not sure what the checkpoint_buffers are, we are running 7.4.0?
>--
>Josh Berkus
>Aglio Database Solutions
>San Francisco

Re: Why will vacuum not end?

From
"Shea,Dan [CIS]"
Date:
There were defintely 219,177,133 deletions.
The deletions are most likely from the beginning, it was based on the
reception_time of the data.
I would rather not use re-index, unless it is faster then using vacuum.
What do you think would be the best way to get around this?
Increase vacuum_mem to a higher amount 1.5 to 2 GB or try a re-index (rather
not re-index so that data can be queried without soing a seqscan).
Once the index is cleaned up, how does vacuum handle the table?
Does it take as long as the index or is it faster?



-----Original Message-----
From: Manfred Koizar [mailto:mkoi-pg@aon.at]
Sent: Saturday, April 24, 2004 1:57 PM
To: Shea,Dan [CIS]
Cc: 'Josh Berkus'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why will vacuum not end?


On Sat, 24 Apr 2004 10:45:40 -0400, "Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>
wrote:
>[...] 87 GB table with a 39 GB index?

>The vacuum keeps redoing the index, but there is no indication as to why it
>is doing this.

If VACUUM finds a dead tuple, if does not immediately remove index
entries pointing to that tuple.  It instead collects such tuple ids and
later does a bulk delete, i.e. scans the whole index and removes all
index items pointing to one of those tuples.  The number of tuple ids
that can be remembered is controlled by vacuum_mem: it is

    VacuumMem * 1024 / 6

Whenever this number of dead tuples has been found, VACUUM scans the
index (which takes ca. 60000 seconds, more than 16 hours), empties the
list and continues to scan the heap ...

From the number of dead tuples you can estimate how often your index
will be scanned.  If dead tuples are evenly distributed, expect there to
be 15 index scans with your current vacuum_mem setting of 196608.  So
your VACUUM will run for 11 days :-(

OTOH this would mean that there are 500 million dead tuples.  Do you
think this is possible?

Servus
 Manfred

Re: Why will vacuum not end?

From
Manfred Koizar
Date:
On Sat, 24 Apr 2004 15:48:19 -0400, "Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>
wrote:
>Manfred is indicating the reason it is taking so long is due to the number
>of dead tuples in my index and the vacuum_mem setting.

<nitpicking>
Not dead tuples in the index, but dead tuples in the table.
</nitpicking>

>The last delete that I did before starting a vacuum had 219,177,133
>deletions.

Ok, with vacuum_mem = 196608 the bulk delete batch size is ca. 33.5 M
tuple ids.  219 M dead tuples will cause 7 index scans.  The time for an
index scan is more or less constant, 60000 seconds in your case.  So
yes, a larger vacuum_mem will help, but only if you really have as much
*free* memory.  Forcing the machine into swapping would make things
worse.

BTW, VACUUM frees millions of index pages, is your FSM large enough?

Servus
 Manfred

Re: Why will vacuum not end?

From
Manfred Koizar
Date:
On Sat, 24 Apr 2004 15:58:08 -0400, "Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>
wrote:
>There were defintely 219,177,133 deletions.
>The deletions are most likely from the beginning, it was based on the
>reception_time of the data.
>I would rather not use re-index, unless it is faster then using vacuum.

I don't know whether it would be faster.  But if you decide to reindex,
make sure sort_mem is *huge*!

>What do you think would be the best way to get around this?
>Increase vacuum_mem to a higher amount 1.5 to 2 GB or try a re-index (rather
>not re-index so that data can be queried without soing a seqscan).

Just out of curiosity:  What kind of machine is this running on?  And
how long does a seq scan take?

>Once the index is cleaned up, how does vacuum handle the table?

If you are lucky VACUUM frees half the index pages.  And if we assume
that the most time spent scanning an index goes into random page
accesses, future VACUUMs will take "only" 30000 seconds per index scan.

Servus
 Manfred

Re: Why will vacuum not end?

From
Josh Berkus
Date:
Dan,

> There were defintely 219,177,133 deletions.
> The deletions are most likely from the beginning, it was based on the
> reception_time of the data.

You need to run VACUUM more often, I think.    Vacuuming out 219 million dead
tuples is going to take a long time no matter how you look at it.   If you
vacuum more often, presumably there will be less for Vacuum to do each time.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Why will vacuum not end?

From
"Shea,Dan [CIS]"
Date:
It is set at max_fsm_pages = 1500000 .

We are running a
DELL PowerEdge 6650 with 4 CPU's
Mem:  3611320k av from top.
The database is on a shared device (SAN) raid5, 172 GB.
Qlogic Fibre optic cards(desc: "QLogic Corp.|QLA2312 Fibre Channel Adapter")
connected to the Dell version of an EMC SAN (FC4700 I believe).

I have set vacuum_mem = 917504;
and started another vacuum verbose on the table in question.
Tried to set vacuum_mem to 1114112 and vacuum failed, then tried 917504 and
vacuum started.

PWFPM_DEV=# set vacuum_mem = '1114112';
SET
PWFPM_DEV=# show vacuum_mem;
 vacuum_mem
------------
 1114112
(1 row)

PWFPM_DEV=# vacuum verbose  forecastelement;

INFO:  vacuuming "public.forecastelement"
ERROR:  invalid memory alloc request size 1140850686
PWFPM_DEV=# set vacuum_mem = 917504;
SET
PWFPM_DEV=# show vacuum_mem;
 vacuum_mem
------------
 917504
(1 row)

PWFPM_DEV=# select now();vacuum verbose  forecastelement;select now();
              now
-------------------------------
 2004-04-25 01:40:23.367123+00
(1 row)

INFO:  vacuuming "public.forecastelement"

I performed a query that used a seqscan

PWFPM_DEV=# explain analyze select count(*) from forecastelement;
                                                                  QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------
 Aggregate  (cost=16635987.60..16635987.60 rows=1 width=0) (actual
time=13111152.844..13111152.847 rows=1 loops=1)
   ->  Seq Scan on forecastelement  (cost=0.00..15403082.88 rows=493161888
width=0) (actual time=243.562..12692714.422 rows=264422681 loops=1)
 Total runtime: 13111221.978 ms
(3 rows)

Dan.

-----Original Message-----
From: Manfred Koizar [mailto:mkoi-pg@aon.at]
Sent: Saturday, April 24, 2004 8:29 PM
To: Shea,Dan [CIS]
Cc: 'Josh Berkus'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why will vacuum not end?


On Sat, 24 Apr 2004 15:58:08 -0400, "Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>
wrote:
>There were defintely 219,177,133 deletions.
>The deletions are most likely from the beginning, it was based on the
>reception_time of the data.
>I would rather not use re-index, unless it is faster then using vacuum.

I don't know whether it would be faster.  But if you decide to reindex,
make sure sort_mem is *huge*!

>What do you think would be the best way to get around this?
>Increase vacuum_mem to a higher amount 1.5 to 2 GB or try a re-index
(rather
>not re-index so that data can be queried without soing a seqscan).

Just out of curiosity:  What kind of machine is this running on?  And
how long does a seq scan take?

>Once the index is cleaned up, how does vacuum handle the table?

If you are lucky VACUUM frees half the index pages.  And if we assume
that the most time spent scanning an index goes into random page
accesses, future VACUUMs will take "only" 30000 seconds per index scan.

Servus
 Manfred

Re: Why will vacuum not end?

From
Manfred Koizar
Date:
On Sun, 25 Apr 2004 09:05:11 -0400, "Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>
wrote:
>It is set at max_fsm_pages = 1500000 .

This might be too low.  Your index has ca. 5 M pages, you are going to
delete half of its entries, and what you delete is a contiguous range of
values.  So up to 2.5 M index pages might be freed (minus inner nodes
and pages not completely empty).  And there will be lots of free heap
pages too ...

I wrote:
>If you are lucky VACUUM frees half the index pages.  And if we assume
>that the most time spent scanning an index goes into random page
>accesses, future VACUUMs will take "only" 30000 seconds per index scan.

After a closer look at the code and after having slept over it I'm not
so sure any more that the number of tuple ids to be removed has only
minor influence on the time spent for a bulk delete run.  After the
current VACUUM has finished would you be so kind to run another VACUUM
VERBOSE with only a few dead tuples and post the results here?

Servus
 Manfred