Re: Sudden insert performance degradation - Mailing list pgsql-performance

From Henrique Montenegro
Subject Re: Sudden insert performance degradation
Date
Msg-id CAH_aqbvBU3_efWM=mNQ6pAbs736aNerHd83pciMdwsq_NtHBeQ@mail.gmail.com
Whole thread Raw
In response to Re: Sudden insert performance degradation  (Henrique Montenegro <typoon@gmail.com>)
Responses Re: Sudden insert performance degradation  (Henrique Montenegro <typoon@gmail.com>)
List pgsql-performance
Alright, so it seems like partitioning and changing the shared_buffers as well as adding the order by helped to a certain extent, but the writes are still slow. Inserting a 1 million records file is taking almost 3 minutes (way better than the 20+ minutes, but still pretty slow compared to the 20 seconds it used to take).

The interesting thing for me right now is: If I try to insert the data from a file that has already been inserted (meaning all the data will end up being rejected due to the unique constraint), it only takes between 1 and 4 seconds for the insertion to finish executing. For regular files (which usually have 30% new unique records (meaning about 300k new records)), it is taking those 3 minutes.

**UPDATE**

I started writing this email and then it occurred to me something I should try. Leaving the information above for historical reasons.

Basically I went ahead and ran a `reindex` on all the partitions now to see if it would improve the performance and seems like that did it! I used the following script to reindex all of the partitions (the name of my partitions all start with ubp_):

```
DO $$DECLARE r record;
BEGIN
    FOR r IN select indexname from pg_indexes where tablename like 'ubp_%'
    LOOP
        EXECUTE 'reindex index ' || r.indexname;
    END LOOP;
END$$;
```

After doing this, processing of each file is taking anything between 8 and 20 seconds (most of them seem to be taking 8 seconds though). So, this is great!

In summary, what I ended up having to do was:

* Raise shared_buffers to 160GB
* Add an `order by` to the `select` subquery in the `insert` statement
* Partition the table
* Tune postgres configurations as shown below:

~~~
ssl = off
shared_buffers = 160GB
work_mem = 12GB
maintenance_work_mem = 12GB
max_stack_depth = 4MB
synchronous_commit = off
wal_writer_flush_after = 128MB
max_wal_size = 32GB
min_wal_size = 80MB
effective_cache_size = 96GB
~~~

I can't tell if the raising of the `shared_buffers` was the reason for the performance gains or the adding of the `order by` was the responsible. Doesn't hurt to do both anyways. I know for a fact that the `reindex` of each partition made a huge difference in the end as explained above (bringing insert time down from 3 minutes to 8 seconds).

I have about 1800 files in my backlog to be processed now (18 billion records). I have started processing them and will report back in case performance degrades once again.

Thanks everybody for the help so far! I really appreciate it.

Henrique

PS: I checked the `dirty` ratios for the OS:

$ sysctl vm.dirty_ratio
vm.dirty_ratio = 20

$ sysctl vm.dirty_background_ratio
vm.dirty_background_ratio = 10

$ sysctl vm.dirty_expire_centisecs
vm.dirty_expire_centisecs = 3000

These are default values; if what I understood from them is right, it seems to me that these values should be fine.

On Mon, Jul 13, 2020 at 9:02 PM Henrique Montenegro <typoon@gmail.com> wrote:


On Mon, Jul 13, 2020 at 8:05 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jul 13, 2020 at 10:23 AM Henrique Montenegro <typoon@gmail.com> wrote:

insert into users_no_dups (
    created_ts,
    user_id,
    name,
    url
) (
    select
        created_ts,
        user_id,
        name,
        url
    from
        users
) on conflict do nothing

Once the size of the only index exceeds shared_buffers by a bit (the amount of "a bit" depends on your RAM, kernel version, settings for dirty_background_ratio, dirty_expire_centisecs, and probably other things, and is not easy to predict) the performance falls off a cliff when inserting values in a random order.  Every insert dirties a random index leaf page, which quickly gets evicted from shared_buffers to make room for other random leaf pages to be read in, and then turns into flush calls when the kernel freaks out about the amount and age of dirty pages held in memory.
 
That is interesting to  know. I will do some research on those things.


What happens if you add an "ORDER BY user_id" to your above select?

I don't know. I will give it a try right now.
 
shared_buffers = 8GB
RAM: 256GB

Or, crank up shared_buffers by a lot.  Like, beyond the size of the growing index, or up to 240GB if the index ever becomes larger than that.  And make the time between checkpoints longer.  If the dirty buffers are retained in shared_buffers longer, chances of them getting dirtied repeatedly between writes is much higher than if you just toss them to the kernel and hope for the best.


I cranked it up to 160GB to see how it goes.

Cheers,

Jeff

I created the partitions as well as mentioned before. I was able to partition the table based on the user_id (found some logic to it). I was transferring the data from the original table (about 280 million records; 320GB) to the new partitioned table and things were going well with write speeds between 30MB/s and 50MB/s. After reading 270GB of the 320GB (in 4 and a half hours) and writing it to the new partitioned table, write speed went down to 7KB/s. It is so frustrating.

I will keep the partitions and try your suggestions to see how it goes.

I apologize for the long time between replies, it is just that testing this stuff takes 4+ hours each run.

If there are any other suggestions of things for me to look meanwhile as well, please keep them coming.

Thanks!

Henrique

pgsql-performance by date:

Previous
From: Henrique Montenegro
Date:
Subject: Re: Sudden insert performance degradation
Next
From: Henrique Montenegro
Date:
Subject: Re: Sudden insert performance degradation