Re: Degraded performance during table rewrite - Mailing list pgsql-general

From Tom Lane
Subject Re: Degraded performance during table rewrite
Date
Msg-id 1115603.1593811564@sss.pgh.pa.us
Whole thread Raw
In response to Degraded performance during table rewrite  (Mohamed Wael Khobalatte <mkhobalatte@grubhub.com>)
Responses Re: Degraded performance during table rewrite
List pgsql-general
Mohamed Wael Khobalatte <mkhobalatte@grubhub.com> writes:
> ... the migration itself runs as follows (each in a transaction, looping
> through records and sleeping for a bit)

> WITH del AS (
>   DELETE FROM #{old_table}
>   WHERE id IN (
>     SELECT id
>     FROM #{old_table}
>     WHERE id > #{max_deleted_id} -- This is the max deleted from the
> previous batch, we grab it programmatically.
>     ORDER BY id ASC
>     LIMIT #{batch_size}
>   )
>   RETURNING *
> )
> INSERT INTO #{table}
> SELECT * FROM del
> RETURNING id

> This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of
> the old table lowers is back to 150ms, but I don't understand why, because
> we structure the query to jump over all previously dead rows. There is an
> old thread in which Tom Lane mentions that the planner might itself be
> walking that primary index. Is this applicable here? And is there anything
> we can do besides more aggressive and continued vacuuming of the old table
> (or a change in autovacuum settings)? Ideally, we want to run this
> overnight without much supervision.

Yeah, given that the slowdown seems to be in the planner, and given your
further observation that v12 is better, I'd say that this is an issue
with get_actual_variable_range.  That's going to be invoked to try to
determine the selectivity of the "WHERE id > #{max_deleted_id}" clause,
if the constant is past the last value in the histogram for the id
column.

The improvement you see in v12 actually came in in v11, and I think
I'll just quote the commit log:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL_11_BR [3ca930fc3] 2017-09-07 19:41:51 -0400

    Improve performance of get_actual_variable_range with recently-dead tuples.

    In commit fccebe421, we hacked get_actual_variable_range() to scan the
    index with SnapshotDirty, so that if there are many uncommitted tuples
    at the end of the index range, it wouldn't laboriously scan through all
    of them looking for a live value to return.  However, that didn't fix it
    for the case of many recently-dead tuples at the end of the index;
    SnapshotDirty recognizes those as committed dead and so we're back to
    the same problem.

    To improve the situation, invent a "SnapshotNonVacuumable" snapshot type
    and use that instead.  The reason this helps is that, if the snapshot
    rejects a given index entry, we know that the indexscan will mark that
    index entry as killed.  This means the next get_actual_variable_range()
    scan will proceed past that entry without visiting the heap, making the
    scan a lot faster.  We may end up accepting a recently-dead tuple as
    being the estimated extremal value, but that doesn't seem much worse than
    the compromise we made before to accept not-yet-committed extremal values.

    The cost of the scan is still proportional to the number of dead index
    entries at the end of the range, so in the interval after a mass delete
    but before VACUUM's cleaned up the mess, it's still possible for
    get_actual_variable_range() to take a noticeable amount of time, if you've
    got enough such dead entries.  But the constant factor is much much better
    than before, since all we need to do with each index entry is test its
    "killed" bit.

    We chose to back-patch commit fccebe421 at the time, but I'm hesitant to
    do so here, because this form of the problem seems to affect many fewer
    people.  Also, even when it happens, it's less bad than the case fixed
    by commit fccebe421 because we don't get the contention effects from
    expensive TransactionIdIsInProgress tests.

    Dmitriy Sarafannikov, reviewed by Andrey Borodin

    Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru


There are a number of possibilities for working around this in your
particular situation, short of an upgrade to v11+.  You could try doing a
manual VACUUM between deletion steps, but that could fail to fix it if
anything else is running concurrently (because the VACUUM might not think
it's safe to recycle the recently-dead tuples yet).  I think possibly
a better approach is to try to avoid the situation wherein estimating
"WHERE id > #{max_deleted_id}" requires determining the table's true
endpoint id value.  For that, the last id value seen in the pg_stats
histogram for the id column has to be greater than the max_deleted_id
value.  So you might find that increasing the deletion batch size
(thereby reducing max_deleted_id) does the trick; or you could increase
the statistics target for that column, making the histogram larger and
hence (probably) making its endpoint higher.

            regards, tom lane



pgsql-general by date:

Previous
From: Mohamed Wael Khobalatte
Date:
Subject: Re: Degraded performance during table rewrite
Next
From: Ron
Date:
Subject: Re: Degraded performance during table rewrite