[GENERAL] PostgreSQL not reusing free space in table ? - Mailing list pgsql-general

From Pierre Ducroquet
Subject [GENERAL] PostgreSQL not reusing free space in table ?
Date
Msg-id 11684202.13JvWVbjWa@laptop-pierred
Whole thread Raw
Responses Re: [GENERAL] PostgreSQL not reusing free space in table ?
List pgsql-general
Hi

Running PostgreSQL 9.4, I am running in the following issue.
On a huge table, I have to remove the content (set to '') of a column that
makes for 99% of the weight of the table. Since the production can not be
stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the
table, so I tried the following trick to drop the content while reducing the
table weight. (I re-indented it for this email)

ppr=500
for i in `seq 0 60` ; do
    psql $MYDB -c "UPDATE $TABLE
        SET content = ''::bytea
        WHERE ctid = ANY(ARRAY(
            SELECT ('(' || p.i || ',' || s.i || ')')::tid
            FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) p(i),
                 generate_series(0, 2048) AS s(i)
        ));"
    psql $MYDB -c "VACUUM $TABLE;"
done

This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected
the final result to be even lower. I did a copy of the table, and it needed
only 30MB, 3800 pages, so there was still something wrong.
I did a simple (and slow) query to get the number of rows per page on both
table. The new one had slightly more rows per page, but this was in no way
enough to justify such a gap. Then I saw that after page 2080 the table had,
for the following 27000 pages, between 0 and 5 rows per page. So vacuum could
not remove the pages because there were gaps.
I figured out that, after a vacuum, updating the rows at the end of the table
should be put back at the beginning of the free space and thus after an update
of the rows of the last 27000 pages the table could be reduced to the optimal
size.
But when I raw the following, I was very surprised (NB: I know this query does
a full scan, I won't do it on a 100GB table, I promise)

VACUUM VERBOSE $TABLE;

SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
=> 29 rows
UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
=> 29 rows updated
SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
==> 29 rows again !

So instead of filling huge gaps (I've got for instance an 80 pages gap, from
id 2082 to id 2160), Pg put the data back at the end of the table.


I'm quite puzzled by this situation. I believed PostgreSQL would use the free
space to put the new data, so my update loop would give me a clean, «packed»
table.

What behaviour did I miss here ? How can I get PostgreSQL to use that free
space without falling back to a vacuum full ? (And without using tools like
pg_repack if possible, because this seems like not needed in that situation)


Thanks

 Pierre


Attachment

pgsql-general by date:

Previous
From: Israel Brewster
Date:
Subject: Re: [GENERAL] Improve PostGIS performance with 62 million rows?
Next
From: Michael Paquier
Date:
Subject: Re: [GENERAL] requested timeline doesn't contain minimum recovery point