Re: PostgreSQL slow after VACUUM - Mailing list pgsql-general

From Pierre-Frédéric Caillaud
Subject Re: PostgreSQL slow after VACUUM
Date
Msg-id opsh2o3mtacq72hf@musicbox
Whole thread Raw
In response to Re: PostgreSQL slow after VACUUM  (Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl>)
Responses Re: PostgreSQL slow after VACUUM
List pgsql-general
    It seems this issue has been mentionned several times lately...
    I'd propose something to do to avoid it :

    * When TRUNCAT'ing a table :
        - the row-count is reset to 0 (of course !)
        - however, the column stats are kept, on the basis that the data which
will be inserted later in the table will most likely have at least some
correlation to the data that was in it before. Of course, VACUUM ANALYZE
will update the statistics with real ones when it is run.
    This would make TRUNCATE behave more like DELETE FROM...

    * When INSERT'ing into a table :
    The OP's problem was that the query is planned as if the table was almost
empty (hence seq scan), well it is in fact empty, but many rows are
inserted. Of course, the query can't be replanned in the middle of its
execution (although that'd be a nice feature to ad on the withlist for
postgres version 2020)...
    When planning for inserts, the planner currently uses  the estimated
number of rows in the target table, taken from the statistics.
    However, at this time, the planner has an estimation of how many rows
it's going to insert :
        - INSERT ... SELECT, or CREATE TABLE ... AS SELECT :
            The SELECT from which the rows will come is planned, so there
            should be an estimated number of rows readily available.
        - COPY FROM...
            The number of rows could be estimated by peeking at the first N
            rows in the file (not on COPY FROM stdin though), or an additional
            parameter to COPY could be give,.

    Then, the INSERT can be planned as if the target table contained
(estimated number of rows in the target table) + (estimated number of rows
to be inserted), and it will choose a good plan for populating empty
tables...

    What do you think ?



>
> The problem is, that while doing the vacuum full ANALYZE the table was
> empty. It therefore gathered statistics of a situation which isn't there
> anymore when you fill up the table. In an empty or small table, it is
> normal to do sequential scans. Which you most of the time don't want in
> a large table.
> My suggestion is to VACUUM (FULL) the table after you've deleted the
> data. Then fill up the table and do a ANALYZE when you're done filling
> it.
>
> That way, the analysis of the data will be much more accurate. My guess
> is, it'll use indexes much sooner and be much faster. Have a look at the
> output of EXPLAIN ANALYZE yourstatement; before doing ANALYZE and after.
>
> Best regards,
>
> Arjen

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: PostgreSQL Config.
Next
From: Martijn van Oosterhout
Date:
Subject: Re: PostgreSQL slow after VACUUM