Thread: unused tuples constantly increasing

unused tuples constantly increasing

From
Chantal Ackermann
Date:
hello all,

With PostgreSQL version 7.3, I have the following problem while
updating, and vacuuming a table containing over 2,000,000 rows:

What I am doing: I want to build a tsearch index for one of the columns
(called "sentence") in this table. For this I added a new column called
"sentenceidx" which now has to be populated with the parsed text from
"sentence", using txt2txtidx (comes with contrib/tsearch).

The script I use updates 2000 rows at once, then runs a VACUUM VERBOSE
on the table. The update of the 2000 rows takes always less then a
second (more likely about 0.6 sec) which I think is ok. The VACUUM,
though, takes always more than 110 secs. Thus, the script has been
running for quite some time, now.

What concerns me even more is that the value for "UnUsed" in the vacuum
output is constantly increasing, even though I increased max_fsm_pages
four times (to be 40000) (and restarted the postmaster).

I have searched the net, but I am obviously searching for the wrong
keywords as I have not found a solution I can understand and apply.
Probably all this is just the result of a poor optimization of the
database: while this script runs, postgres writes constantly to disc,
while "top" tells:

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
19217 postgres  10   0  124M 124M  123M R     8.7 24.2   9:14 postmaster



I am pasting three consecutive outputs from "VACUUM VERBOSE sentences;",
part of postgresql.conf, and some system properties.

The 2 million rows are only a little part of the whole data that should
populate the db in the future, so I am very interested in advice on how
to optimize postgres well.

regards,

Chantal

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Here is the output from VACUUM:

UPDATE 2000
psql:1202: INFO:  --Relation public.sentences--
psql:1202: INFO:  Index sentences_pkey: Pages 10644; Tuples 2262126:
Deleted 2000.
         CPU 0.95s/1.18u sec elapsed 39.75 sec.
psql:1202: INFO:  Removed 2000 tuples in 51 pages.
         CPU 0.00s/0.00u sec elapsed 0.00 sec.
psql:1202: INFO:  Pages 80239: Changed 143, Empty 0; Tup 2262126: Vac
2000, Keep 0, UnUsed 518555.
         Total CPU 6.84s/1.99u sec elapsed 117.63 sec.
psql:1202: INFO:  --Relation pg_toast.pg_toast_462586--
psql:1202: INFO:  Pages 1: Changed 0, Empty 0; Tup 5: Vac 0, Keep 0,
UnUsed 0.
         Total CPU 0.00s/0.00u sec elapsed 0.06 sec.
VACUUM
UPDATE 2000
psql:1204: INFO:  --Relation public.sentences--
psql:1204: INFO:  Index sentences_pkey: Pages 10650; Tuples 2262126:
Deleted 2000.
         CPU 0.92s/1.16u sec elapsed 38.84 sec.
psql:1204: INFO:  Removed 2000 tuples in 53 pages.
         CPU 0.01s/0.01u sec elapsed 0.01 sec.
psql:1204: INFO:  Pages 80280: Changed 147, Empty 0; Tup 2262126: Vac
2000, Keep 0, UnUsed 519456.
         Total CPU 6.99s/1.93u sec elapsed 117.15 sec.
psql:1204: INFO:  --Relation pg_toast.pg_toast_462586--
psql:1204: INFO:  Pages 1: Changed 0, Empty 0; Tup 5: Vac 0, Keep 0,
UnUsed 0.
         Total CPU 0.00s/0.00u sec elapsed 0.01 sec.
VACUUM
UPDATE 2000
psql:1206: INFO:  --Relation public.sentences--
psql:1206: INFO:  Index sentences_pkey: Pages 10656; Tuples 2262126:
Deleted 2000.
         CPU 0.90s/1.05u sec elapsed 39.61 sec.
psql:1206: INFO:  Removed 2000 tuples in 50 pages.
         CPU 0.01s/0.00u sec elapsed 0.00 sec.
psql:1206: INFO:  Pages 80314: Changed 137, Empty 0; Tup 2262126: Vac
2000, Keep 0, UnUsed 520277.
         Total CPU 6.94s/1.73u sec elapsed 116.77 sec.
psql:1206: INFO:  --Relation pg_toast.pg_toast_462586--
psql:1206: INFO:  Pages 1: Changed 0, Empty 0; Tup 5: Vac 0, Keep 0,
UnUsed 0.
         Total CPU 0.00s/0.00u sec elapsed 0.01 sec.
VACUUM
UPDATE 2000

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

This is the part of postgresql.conf which might be relevant:

#
#       Shared Memory Size
#
shared_buffers = 15200          # 2*max_connections, min 16, typically
                # 8KB each, default 64
max_fsm_relations = 200         # min 10, fsm is free space map, ~40
                # bytes, default 100
max_fsm_pages = 40000           # min 1000, fsm is free space map, ~6
                # bytes, default 10000
#max_locks_per_transaction = 64 # min 10
#wal_buffers = 32               # min 4, typically 8KB each, default 32

#
#       Non-shared Memory Sizes
#
#sort_mem = 1024                # min 64, size in KB
vacuum_mem = 16384              # min 1024, size in KB, default 8192

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

System: Linux 2.2.18, i686
Postgres: 7.3
CPU: 933MHz
RAM: 524Mb

shmall: 16777216 (I did not manage to increase it)
shmmax: 134217728



Re: unused tuples constantly increasing

From
"Shridhar Daithankar"
Date:
On 10 Jan 2003 at 14:57, Chantal Ackermann wrote:

> I have been running VACUUM ANALYZE at first, then removed ANALYZE in
> hope it would improve the performance, and because I don't think that
> ANALYZE is really necessary. In fact, the update itself doesn't slow
> down. It keeps taking about 0.6 sec or even less time. And I don't think
> ANALYZE speeds VACUUM up on the long run, does it?

Well, actually there are several things that vacuum do. When you run analyze,
it does a subset of those things and thus could be bit faster.

Basically it needs to mark those pages     for reuse which update has earlier
used. So this frees up some amount of RAM for postgresql. Besides it updates
statistics too.

I would suggest interlacing smaller updates and vacuum rather than doing big
update and a big vacuum. That should take less time overall..

HTH
Bye
 Shridhar

--
QOTD:    "This is a one line proof... if we start sufficiently far to the    left."


Re: unused tuples constantly increasing

From
Tom Lane
Date:
Chantal Ackermann <chantal.ackermann@biomax.de> writes:
> What I am doing: I want to build a tsearch index for one of the columns
> (called "sentence") in this table. For this I added a new column called
> "sentenceidx" which now has to be populated with the parsed text from
> "sentence", using txt2txtidx (comes with contrib/tsearch).

> The script I use updates 2000 rows at once, then runs a VACUUM VERBOSE
> on the table.

"Update" here means storing a new value into the column that was
formerly NULL?  So the rows are getting wider as you update them?
How much wider (how big is sentenceidx compared to the rest of the row)?

I think what you're seeing is simply that the new, fatter rows don't
fit in the spaces that are freed up by vacuuming the old ones.  If you
were updating enough rows so that many got freed on the same page,
there would be a better shot at recycling the space --- but AFAICT only
one or two per page are getting freed in any one VACUUM cycle.  You are
getting some recycling: these messages show about 140 pages modified
in each cycle, but only 40 pages added, so it is recycling space.  But
the table is inevitably going to grow, just because you are putting in
more data than was there before.

My advice is to be a little less frantic about space management.
The rule of thumb I usually use is that one should vacuum when there's
been about 10% turnover in a table.  Vacuuming after only 0.1% turnover
is a waste of cycles.

            regards, tom lane

Re: unused tuples constantly increasing

From
Chantal Ackermann
Date:
hello Shridhar, hello Tom,

long story short:

Neither adding ANALYZE nor reducing the rows from 2000 at one time to
500 changes anything: the UnUsed value keeps growing, as does the
overall amount of pages. The VACUUM ANALYZE takes the same time or even
more.

Following your advice, Tom, I increased the rows to 200,000. The update
now takes about 140 secs, the VACUUM ANALYZE more than 200 secs. The
amount of pages increases a lot, as does UnUsed. Moreover, the
transaction log files get recycled a lot - which did not happen with
2000 rows at one time. But nevertheless - with 200000 rows at one time
there are only few cycles untill all 2 million rows are updated, so in
the end this should be the faster way.

thank you for your help!

Chantal