Hi
Running PostgreSQL 9.4, I am running in the following issue.
On a huge table, I have to remove the content (set to '') of a column that
makes for 99% of the weight of the table. Since the production can not be
stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the
table, so I tried the following trick to drop the content while reducing the
table weight. (I re-indented it for this email)
ppr=500
for i in `seq 0 60` ; do
psql $MYDB -c "UPDATE $TABLE
SET content = ''::bytea
WHERE ctid = ANY(ARRAY(
SELECT ('(' || p.i || ',' || s.i || ')')::tid
FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) p(i),
generate_series(0, 2048) AS s(i)
));"
psql $MYDB -c "VACUUM $TABLE;"
done
This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected
the final result to be even lower. I did a copy of the table, and it needed
only 30MB, 3800 pages, so there was still something wrong.
I did a simple (and slow) query to get the number of rows per page on both
table. The new one had slightly more rows per page, but this was in no way
enough to justify such a gap. Then I saw that after page 2080 the table had,
for the following 27000 pages, between 0 and 5 rows per page. So vacuum could
not remove the pages because there were gaps.
I figured out that, after a vacuum, updating the rows at the end of the table
should be put back at the beginning of the free space and thus after an update
of the rows of the last 27000 pages the table could be reduced to the optimal
size.
But when I raw the following, I was very surprised (NB: I know this query does
a full scan, I won't do it on a 100GB table, I promise)
VACUUM VERBOSE $TABLE;
SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
=> 29 rows
UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
=> 29 rows updated
SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
==> 29 rows again !
So instead of filling huge gaps (I've got for instance an 80 pages gap, from
id 2082 to id 2160), Pg put the data back at the end of the table.
I'm quite puzzled by this situation. I believed PostgreSQL would use the free
space to put the new data, so my update loop would give me a clean, «packed»
table.
What behaviour did I miss here ? How can I get PostgreSQL to use that free
space without falling back to a vacuum full ? (And without using tools like
pg_repack if possible, because this seems like not needed in that situation)
Thanks
Pierre