Re: perf pb solved only after pg_dump and restore - Mailing list pgsql-performance

From Markus Schaber
Subject Re: perf pb solved only after pg_dump and restore
Date
Msg-id 44F2C6AC.5060602@logix-tt.com
Whole thread Raw
In response to Re: perf pb solved only after pg_dump and restore  (Guillaume Cottenceau <gc@mnc.ch>)
Responses Re: perf pb solved only after pg_dump and restore  (Guillaume Cottenceau <gc@mnc.ch>)
List pgsql-performance
Hi, Guillaume,

Guillaume Cottenceau wrote:

> We have a couple of logs files which get larger over time
> (millions of rows). As they are log files, they can be trimmed
> from older values.

Ah, ok, you DELETEd the old rows.

So I assume that you never UPDATE, but only INSERT new entries and
sometimes DELETE a big bunch of entries from the beginning.

This is a special usage pattern, where the normal "VACUUM" is not well
suited for.

DELETing rows itsself does not free any space. Only after your
transaction is committed, a following VACUUM FULL or CLUSTER does
actually free the space.

VACUUM and VACUUM ANALYZE only remove obsolete rows from the pages and
marks them free (by entering them into the free space map, as long as
that one is large enough). That means that your table will actually stay
as large as before, having 90% of free pages at the beginning and 10%
used pages at the very end. New INSERTs and UPDATEs will prefer to use
pages from the free space map before allocating new pages, but the
existing rows will stay forever.

Now, VACUUM FULL actively moves rows to the beginning of the table,
allowing to cut the end of the table, while CLUSTER recreates the table
from scratch, in index order. Both lead to a compact storage, having all
used rows at the beginning, and no free pages.

So, I think, in your case VACUUM FULL and CLUSTER would both have solved
your problem.

>  max_fsm_pages is 20000
> Do they look low?
> Notice: table data is only 600M after trim (without indexes),
> while it was probably 3x to 10x this size before the trim.

10x the size means 6G, so 5.4G of data were freed by the trim. Each page
has 8k in size, so the fsm needs about 675000 pages. So, yes, for your
usage, they look low, and give very suboptimal results.

>> have index bloat.
>
> Can you elaborate? I have created a couple of indexes (according
> to multiple models of use in our application) and they do take up
> quite some disk space (table dump is 600M but after restore it
> takes up 1.5G on disk) but I thought they could only do good or
> never be used, not impair performance..

Like tables, indices may suffer from getting bloated by old, unused
entries. Especially the GIST based indices in 7.4 (used by PostGIS and
other plugins) suffered from that problem[1], but recent PostgreSQL
versions have improved in this area.

Now, when the query planner decides to use an index, the index access is
extremely slow because of all the deleted entries the index scan has to
skip.

However, from the additional information you gave above, I doubt it was
index bloat.

>> Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.
>
> So these would have reordered the data for faster sequential
> access which is not the case of VACUUM ANALYZE?

A VACUUM FULL would have reordered the data, and a REINDEX would have
optimized the index.

>> It also might make sense to issue a CLUSTER instead (which combines the
>> effects of VACUUM FULL, REINDEX and physically reordering the data).
>
> I was reluctant in using CLUSTER because you have to choose an
> index and there are multiple indexes on the large tables..

Usually, CLUSTERing on one index does not necessarily slow down accesses
on other indices, compared to the non-clustered (= random) table before.

If you have some indices that are somehow related (e. G. a timestamp and
a serial number), CLUSTERing on one index does automatically help the
other index, especially as the query planer uses corellation statistics.

Btw, if your queries often include 2 or 3 columns, a multi-column index
(and clustering on that index) might be the best.

>> When the free_space_map is to low, VACUUM ANALYZE should have told you
>> via a warning (at least, if your logging is set appropriately).
>
> Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I
> can't be sure :/

AFAIK, the warning is also output on the psql command line.

HTH,
Markus

[1] We once had an index that was about 100 times larger before REINDEX.

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

pgsql-performance by date:

Previous
From: "Guillaume Smet"
Date:
Subject: Re: perf pb solved only after pg_dump and restore
Next
From: Guillaume Cottenceau
Date:
Subject: Re: perf pb solved only after pg_dump and restore