Thread: [GENERAL] PostgreSQL not reusing free space in table ?

[GENERAL] PostgreSQL not reusing free space in table ?

From
Pierre Ducroquet
Date:
Hi

Running PostgreSQL 9.4, I am running in the following issue.
On a huge table, I have to remove the content (set to '') of a column that
makes for 99% of the weight of the table. Since the production can not be
stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the
table, so I tried the following trick to drop the content while reducing the
table weight. (I re-indented it for this email)

ppr=500
for i in `seq 0 60` ; do
    psql $MYDB -c "UPDATE $TABLE
        SET content = ''::bytea
        WHERE ctid = ANY(ARRAY(
            SELECT ('(' || p.i || ',' || s.i || ')')::tid
            FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) p(i),
                 generate_series(0, 2048) AS s(i)
        ));"
    psql $MYDB -c "VACUUM $TABLE;"
done

This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected
the final result to be even lower. I did a copy of the table, and it needed
only 30MB, 3800 pages, so there was still something wrong.
I did a simple (and slow) query to get the number of rows per page on both
table. The new one had slightly more rows per page, but this was in no way
enough to justify such a gap. Then I saw that after page 2080 the table had,
for the following 27000 pages, between 0 and 5 rows per page. So vacuum could
not remove the pages because there were gaps.
I figured out that, after a vacuum, updating the rows at the end of the table
should be put back at the beginning of the free space and thus after an update
of the rows of the last 27000 pages the table could be reduced to the optimal
size.
But when I raw the following, I was very surprised (NB: I know this query does
a full scan, I won't do it on a 100GB table, I promise)

VACUUM VERBOSE $TABLE;

SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
=> 29 rows
UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
=> 29 rows updated
SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
==> 29 rows again !

So instead of filling huge gaps (I've got for instance an 80 pages gap, from
id 2082 to id 2160), Pg put the data back at the end of the table.


I'm quite puzzled by this situation. I believed PostgreSQL would use the free
space to put the new data, so my update loop would give me a clean, «packed»
table.

What behaviour did I miss here ? How can I get PostgreSQL to use that free
space without falling back to a vacuum full ? (And without using tools like
pg_repack if possible, because this seems like not needed in that situation)


Thanks

 Pierre


Attachment

Re: [GENERAL] PostgreSQL not reusing free space in table ?

From
Merlin Moncure
Date:
On Fri, Jan 6, 2017 at 4:09 AM, Pierre Ducroquet
<pierre.ducroquet@people-doc.com> wrote:
> Hi
>
> Running PostgreSQL 9.4, I am running in the following issue.
> On a huge table, I have to remove the content (set to '') of a column that
> makes for 99% of the weight of the table. Since the production can not be
> stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the
> table, so I tried the following trick to drop the content while reducing the
> table weight. (I re-indented it for this email)
>
> ppr=500
> for i in `seq 0 60` ; do
>         psql $MYDB -c "UPDATE $TABLE
>                 SET content = ''::bytea
>                 WHERE ctid = ANY(ARRAY(
>                         SELECT ('(' || p.i || ',' || s.i || ')')::tid
>                         FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) p(i),
>                                  generate_series(0, 2048) AS s(i)
>                 ));"
>         psql $MYDB -c "VACUUM $TABLE;"
> done
>
> This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected
> the final result to be even lower. I did a copy of the table, and it needed
> only 30MB, 3800 pages, so there was still something wrong.
> I did a simple (and slow) query to get the number of rows per page on both
> table. The new one had slightly more rows per page, but this was in no way
> enough to justify such a gap. Then I saw that after page 2080 the table had,
> for the following 27000 pages, between 0 and 5 rows per page. So vacuum could
> not remove the pages because there were gaps.
> I figured out that, after a vacuum, updating the rows at the end of the table
> should be put back at the beginning of the free space and thus after an update
> of the rows of the last 27000 pages the table could be reduced to the optimal
> size.
> But when I raw the following, I was very surprised (NB: I know this query does
> a full scan, I won't do it on a 100GB table, I promise)
>
> VACUUM VERBOSE $TABLE;
>
> SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> => 29 rows
> UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
> => 29 rows updated
> SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> ==> 29 rows again !
>
> So instead of filling huge gaps (I've got for instance an 80 pages gap, from
> id 2082 to id 2160), Pg put the data back at the end of the table.
>
>
> I'm quite puzzled by this situation. I believed PostgreSQL would use the free
> space to put the new data, so my update loop would give me a clean, «packed»
> table.
>
> What behaviour did I miss here ? How can I get PostgreSQL to use that free
> space without falling back to a vacuum full ? (And without using tools like
> pg_repack if possible, because this seems like not needed in that situation)

hm.  Maybe HOT is involved?  Does your column have an index on it?

merlin


Re: [GENERAL] PostgreSQL not reusing free space in table ?

From
Pierre Ducroquet
Date:
On Friday, January 6, 2017 9:00:06 AM CET Merlin Moncure wrote:
> On Fri, Jan 6, 2017 at 4:09 AM, Pierre Ducroquet
>
> <pierre.ducroquet@people-doc.com> wrote:
> > Hi
> >
> > Running PostgreSQL 9.4, I am running in the following issue.
> > On a huge table, I have to remove the content (set to '') of a column that
> > makes for 99% of the weight of the table. Since the production can not be
> > stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of
> > the table, so I tried the following trick to drop the content while
> > reducing the table weight. (I re-indented it for this email)
> >
> > ppr=500
> > for i in `seq 0 60` ; do
> >
> >         psql $MYDB -c "UPDATE $TABLE
> >
> >                 SET content = ''::bytea
> >                 WHERE ctid = ANY(ARRAY(
> >
> >                         SELECT ('(' || p.i || ',' || s.i || ')')::tid
> >                         FROM generate_series(`$i * $ppr`, `($i + 1) *
> >                         $ppr`) p(i),
> >
> >                                  generate_series(0, 2048) AS s(i)
> >
> >                 ));"
> >
> >         psql $MYDB -c "VACUUM $TABLE;"
> >
> > done
> >
> > This reduced my table from 1200MB down to 240MB, 29000 pages, while I
> > expected the final result to be even lower. I did a copy of the table,
> > and it needed only 30MB, 3800 pages, so there was still something wrong.
> > I did a simple (and slow) query to get the number of rows per page on both
> > table. The new one had slightly more rows per page, but this was in no way
> > enough to justify such a gap. Then I saw that after page 2080 the table
> > had, for the following 27000 pages, between 0 and 5 rows per page. So
> > vacuum could not remove the pages because there were gaps.
> > I figured out that, after a vacuum, updating the rows at the end of the
> > table should be put back at the beginning of the free space and thus
> > after an update of the rows of the last 27000 pages the table could be
> > reduced to the optimal size.
> > But when I raw the following, I was very surprised (NB: I know this query
> > does a full scan, I won't do it on a 100GB table, I promise)
> >
> > VACUUM VERBOSE $TABLE;
> >
> > SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> > => 29 rows
> > UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
> > => 29 rows updated
> > SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> > ==> 29 rows again !
> >
> > So instead of filling huge gaps (I've got for instance an 80 pages gap,
> > from id 2082 to id 2160), Pg put the data back at the end of the table.
> >
> >
> > I'm quite puzzled by this situation. I believed PostgreSQL would use the
> > free space to put the new data, so my update loop would give me a clean,
> > «packed» table.
> >
> > What behaviour did I miss here ? How can I get PostgreSQL to use that free
> > space without falling back to a vacuum full ? (And without using tools
> > like
> > pg_repack if possible, because this seems like not needed in that
> > situation)
> hm.  Maybe HOT is involved?  Does your column have an index on it?
>

No, the column is a non-indexed bytea.

Attachment