Re: Sudden insert performance degradation - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Sudden insert performance degradation
Date
Msg-id CAMkU=1zZvVd5UuDYOou-mSTXbWunRne6N5ewRj2y3Re__L3+1Q@mail.gmail.com
Whole thread Raw
In response to Sudden insert performance degradation  (Henrique Montenegro <typoon@gmail.com>)
Responses Re: Sudden insert performance degradation  (Henrique Montenegro <typoon@gmail.com>)
List pgsql-performance
On Mon, Jul 13, 2020 at 10:23 AM Henrique Montenegro <typoon@gmail.com> wrote:

insert into users_no_dups (
    created_ts,
    user_id,
    name,
    url
) (
    select
        created_ts,
        user_id,
        name,
        url
    from
        users
) on conflict do nothing

Once the size of the only index exceeds shared_buffers by a bit (the amount of "a bit" depends on your RAM, kernel version, settings for dirty_background_ratio, dirty_expire_centisecs, and probably other things, and is not easy to predict) the performance falls off a cliff when inserting values in a random order.  Every insert dirties a random index leaf page, which quickly gets evicted from shared_buffers to make room for other random leaf pages to be read in, and then turns into flush calls when the kernel freaks out about the amount and age of dirty pages held in memory.

What happens if you add an "ORDER BY user_id" to your above select?
 
shared_buffers = 8GB
RAM: 256GB

Or, crank up shared_buffers by a lot.  Like, beyond the size of the growing index, or up to 240GB if the index ever becomes larger than that.  And make the time between checkpoints longer.  If the dirty buffers are retained in shared_buffers longer, chances of them getting dirtied repeatedly between writes is much higher than if you just toss them to the kernel and hope for the best.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Henrique Montenegro
Date:
Subject: Re: Sudden insert performance degradation
Next
From: Henrique Montenegro
Date:
Subject: Re: Sudden insert performance degradation