Thread: PostgreSQL slow after VACUUM

PostgreSQL slow after VACUUM

From

Nikola Milutinovic

Date:

26 November 2004, 06:33:20

Hi all.

I have the following:

- Mandrake Linux 9.1
- PostgreSQL 7.3.2 MDK5

There is one DB and one DB user. The DB is cleared and loaded with the
data of same volume each month (monthly report). The volume is not small
and it usually takes 3 hours to load. Loading is done with SQL files
which use transactions, 10,000 SQL statements per transaction.

A couple of days ago, disk became full, since we were not doing VACUUM
on the DB at all. So, I deleted all records from the 3 tables the DB has
and performed "VACUUM FULL ANALYZE". This reclaimed the space.

My problem is that the load is now taking (to my estimate) 20 times more!

Anything I could do to find out what's going on? There is nothing in the
logs that I can see.

Nix.

Re: PostgreSQL slow after VACUUM

From

"gnari"

Date:

26 November 2004, 08:22:38

From: "Nikola Milutinovic" <Nikola.Milutinovic@ev.co.yu>


> There is one DB and one DB user. The DB is cleared and loaded with the
> data of same volume each month (monthly report). The volume is not small
> and it usually takes 3 hours to load. Loading is done with SQL files
> which use transactions, 10,000 SQL statements per transaction.
>
> A couple of days ago, disk became full, since we were not doing VACUUM
> on the DB at all. So, I deleted all records from the 3 tables the DB has
> and performed "VACUUM FULL ANALYZE". This reclaimed the space.
>
> My problem is that the load is now taking (to my estimate) 20 times more!

it sounds from your description, that you are not analyzing the table
after the batch loads.
a analyze on an empty table effectively destroys all statistics.

gnari

Re: PostgreSQL slow after VACUUM

From

Arjen van der Meijden

Date:

26 November 2004, 08:46:05

Hi Nix,

The problem is, that while doing the vacuum full ANALYZE the table was
empty. It therefore gathered statistics of a situation which isn't there
anymore when you fill up the table. In an empty or small table, it is
normal to do sequential scans. Which you most of the time don't want in
a large table.
My suggestion is to VACUUM (FULL) the table after you've deleted the
data. Then fill up the table and do a ANALYZE when you're done filling it.

That way, the analysis of the data will be much more accurate. My guess
is, it'll use indexes much sooner and be much faster. Have a look at the
output of EXPLAIN ANALYZE yourstatement; before doing ANALYZE and after.

Best regards,

Arjen

On 26-11-2004 7:35, Nikola Milutinovic wrote:
> Hi all.
>
> I have the following:
>
> - Mandrake Linux 9.1
> - PostgreSQL 7.3.2 MDK5
>
> There is one DB and one DB user. The DB is cleared and loaded with the
> data of same volume each month (monthly report). The volume is not small
> and it usually takes 3 hours to load. Loading is done with SQL files
> which use transactions, 10,000 SQL statements per transaction.
>
> A couple of days ago, disk became full, since we were not doing VACUUM
> on the DB at all. So, I deleted all records from the 3 tables the DB has
> and performed "VACUUM FULL ANALYZE". This reclaimed the space.
>
> My problem is that the load is now taking (to my estimate) 20 times more!
>
> Anything I could do to find out what's going on? There is nothing in the
> logs that I can see.
>
> Nix.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>

Re: PostgreSQL slow after VACUUM

From

Greg Stark

Date:

26 November 2004, 10:47:45

Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl> writes:

> My suggestion is to VACUUM (FULL) the table after you've deleted the data.
> Then fill up the table and do a ANALYZE when you're done filling it.

For now you might be able to improve matters by doing an ANALYZE even while
it's busy doing the import. Depending on how the import is coded it may or may
not have an immediate effect.

VACUUM and ANALYZE (and VACUUM ANALYZE) are fairly quick and don't block
access to the tables, though they can slow it down somewhat. You probably want
to be running them frequently so you don't have to resort to VACUUM FULL.

--
greg

Re: PostgreSQL slow after VACUUM

From

Pierre-Frédéric Caillaud

Date:

26 November 2004, 12:58:59

    It seems this issue has been mentionned several times lately...
    I'd propose something to do to avoid it :

    * When TRUNCAT'ing a table :
        - the row-count is reset to 0 (of course !)
        - however, the column stats are kept, on the basis that the data which
will be inserted later in the table will most likely have at least some
correlation to the data that was in it before. Of course, VACUUM ANALYZE
will update the statistics with real ones when it is run.
    This would make TRUNCATE behave more like DELETE FROM...

    * When INSERT'ing into a table :
    The OP's problem was that the query is planned as if the table was almost
empty (hence seq scan), well it is in fact empty, but many rows are
inserted. Of course, the query can't be replanned in the middle of its
execution (although that'd be a nice feature to ad on the withlist for
postgres version 2020)...
    When planning for inserts, the planner currently uses  the estimated
number of rows in the target table, taken from the statistics.
    However, at this time, the planner has an estimation of how many rows
it's going to insert :
        - INSERT ... SELECT, or CREATE TABLE ... AS SELECT :
            The SELECT from which the rows will come is planned, so there
            should be an estimated number of rows readily available.
        - COPY FROM...
            The number of rows could be estimated by peeking at the first N
            rows in the file (not on COPY FROM stdin though), or an additional
            parameter to COPY could be give,.

    Then, the INSERT can be planned as if the target table contained
(estimated number of rows in the target table) + (estimated number of rows
to be inserted), and it will choose a good plan for populating empty
tables...

    What do you think ?



>
> The problem is, that while doing the vacuum full ANALYZE the table was
> empty. It therefore gathered statistics of a situation which isn't there
> anymore when you fill up the table. In an empty or small table, it is
> normal to do sequential scans. Which you most of the time don't want in
> a large table.
> My suggestion is to VACUUM (FULL) the table after you've deleted the
> data. Then fill up the table and do a ANALYZE when you're done filling
> it.
>
> That way, the analysis of the data will be much more accurate. My guess
> is, it'll use indexes much sooner and be much faster. Have a look at the
> output of EXPLAIN ANALYZE yourstatement; before doing ANALYZE and after.
>
> Best regards,
>
> Arjen

Re: PostgreSQL slow after VACUUM

From

Martijn van Oosterhout

Date:

26 November 2004, 13:36:59

On Fri, Nov 26, 2004 at 02:00:48PM +0100, Pierre-Frédéric Caillaud wrote:
>
>     It seems this issue has been mentionned several times lately...
>     I'd propose something to do to avoid it :
>
>     * When TRUNCAT'ing a table :
>         - the row-count is reset to 0 (of course !)
>         - however, the column stats are kept, on the basis that the
>         data which  will be inserted later in the table will most likely have at

Currently, TRUNCATE doesn't affect the statistics, nor does it set the
row count to zero. Which means that when new data is inserted it will
plan as if the table were still full. So no change required here.

>     * When INSERT'ing into a table :
>     The OP's problem was that the query is planned as if the table was
>     almost  empty (hence seq scan), well it is in fact empty, but many rows are
> inserted. Of course, the query can't be replanned in the middle of its
> execution (although that'd be a nice feature to ad on the withlist for
> postgres version 2020)...

The size of the table you are inserting to is irrelevent to the
planner. All that matters is where the data is coming from. Think about
it, UPDATE, DELETE and INSERT are just fancy wrappers around SELECT to
do something special with the rows that are finally selected. Which you
use has very little effect on the plan finally used. I guess as a
special case, DELETE can optimise the fact that no data need be
returned, only a list of rows...

The original user's problem stemmed from the fact they were running
ANALYZE on an empty table, *that* was killing the statistics. Stop
doing that and the statistics will remain as if the table was full. The
VACUUM will set the rowcount back to zero, but that's it.

Seems PostgreSQL is already doing what you suggest anyway...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-88463.dat

Re: PostgreSQL slow after VACUUM

From

Tom Lane

Date:

26 November 2004, 18:35:11

Nikola Milutinovic <Nikola.Milutinovic@ev.co.yu> writes:
> - PostgreSQL 7.3.2 MDK5
> ...
> A couple of days ago, disk became full, since we were not doing VACUUM
> on the DB at all. So, I deleted all records from the 3 tables the DB has
> and performed "VACUUM FULL ANALYZE". This reclaimed the space.

The subsequent discussion pointed out that you probably shouldn't have
ANALYZEd right at that point, but I didn't see anyone suggest that you
should have done TRUNCATEs rather than delete all/vacuum full.  The
TRUNCATE way is a good deal faster, and it will also eliminate index
bloat while vacuum full won't.

BTW, 7.3.2 has several known serious bugs; I'd recommend an update to
7.3.8, if not upgrading to 7.4.*.

            regards, tom lane