Thread: RDS IO Read time
Hi all,
Hopefully, someone here could help us understand whats going on with our deployment..
We are running Postgres 17.4 on AWS RDS on an ec2 instance that has:
- 32vCPU
- 128GB RAM
- gp3 with 25K IOPS and 4000MiB/s throughput
- the instance supports up to 3125GB/s of throughput.
Whenever a query fetches data from the disk (index scan, bitmap scans, etc.) we’re reaching 23 – 30MB/s of IO Read time.
We changed our autovacuum settings to be more aggressive (0 scale factor and 10K threshold) so our index only scans will (hopefully) read data from the disk rarely. However, we can’t optimize all the queries to use index-only-scan since our users can dynamically select which columns to see and filtering.
Is this expected IO read rate? I can’t help but feel we’re missing something here..
Thanks a lot! 😊
On 3/31/25 06:54, Eden Aharoni wrote: > Hi all, > > Hopefully, someone here could help us understand whats going on with our > deployment.. > > We are running Postgres 17.4 on AWS RDS on an ec2 instance that has: > > * 32vCPU > * 128GB RAM > * gp3 with 25K IOPS and 4000MiB/s throughput > * the instance supports up to 3125GB/s of throughput. > > Whenever a query fetches data from the disk (index scan, bitmap scans, > etc.) we’re reaching 23 – 30MB/s of IO Read time. > > We changed our autovacuum settings to be more aggressive (0 scale factor > and 10K threshold) so our index only scans will (hopefully) read data > from the disk rarely. However, we can’t optimize all the queries to use > index-only-scan since our users can dynamically select which columns to > see and filtering. > > Is this expected IO read rate? I can’t help but feel we’re missing > something here.. RDS is a black box controlled by AWS, you are going to need to reach out to their tech support. > > Thanks a lot! 😊 > -- Adrian Klaver adrian.klaver@aklaver.com
> On Mar 31, 2025, at 06:54, Eden Aharoni <edena@legitsecurity.com> wrote: > Is this expected IO read rate? I can’t help but feel we’re missing something here.. Really, no particular I/O rate is "expected": if PostgreSQL needs that much data, it'll use that much I/O to get it. Fromyour description, it's likely that it's a case of the working set for the database just not fitting into the memory youhave, so PostgreSQL needs to go out to secondary storage a lot to fetch the data. The best first step is to use Performance Insights to see which queries are using I/O, and run sample ones with EXPLAIN (ANALYZE,BUFFERS) to see where the I/O is being used within the query. Given that you allow users to assemble arbitraryqueries, it's likely that PostgreSQL is having to use a wide variety of indexes (or sequential scans), so it can'tsuccessfully cache a particular set in memory.
Sent: Monday, March 31, 2025 6:15:39 PM
To: Eden Aharoni <edena@legitsecurity.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: [EXTERNAL] Re: RDS IO Read time
> On Mar 31, 2025, at 06:54, Eden Aharoni <edena@legitsecurity.com> wrote:
> Is this expected IO read rate? I can’t help but feel we’re missing something here..
Really, no particular I/O rate is "expected": if PostgreSQL needs that much data, it'll use that much I/O to get it. From your description, it's likely that it's a case of the working set for the database just not fitting into the memory you have, so PostgreSQL needs to go out to secondary storage a lot to fetch the data.
The best first step is to use Performance Insights to see which queries are using I/O, and run sample ones with EXPLAIN (ANALYZE, BUFFERS) to see where the I/O is being used within the query. Given that you allow users to assemble arbitrary queries, it's likely that PostgreSQL is having to use a wide variety of indexes (or sequential scans), so it can't successfully cache a particular set in memory.
> On Mar 31, 2025, at 10:32, Eden Aharoni <edena@legitsecurity.com> wrote: > > First, thanks for the reply :) > So, I do know which part is taking a lot of IO time and it's to be honest any node that reads from the disk.. of course,we're running EXPLAIN on our queries (to be more specific we use auto_explain) but we can't seem to find what couldcause an index scan that reads 34 MB to take more than a second (1.2 sec).. we do know that our dataset doesn't fitthe memory and we were ok with that but with IO being so slow we just don't know what to do anymore. Any other suggestionsother than contact AWS (which we did but it seems this path won't lead anywhere). The performance of EBS is definitely a question for AWS. I can say that, in our experience, EBS mounts for RDS almost neverapproach the stated maximum throughput, although io2 tends to be closer than gp2 or gp3.
Sent: Monday, March 31, 2025 8:43:19 PM
To: Eden Aharoni <edena@legitsecurity.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: [EXTERNAL] RDS IO Read time
> On Mar 31, 2025, at 10:32, Eden Aharoni <edena@legitsecurity.com> wrote:
>
> First, thanks for the reply :)
> So, I do know which part is taking a lot of IO time and it's to be honest any node that reads from the disk.. of course, we're running EXPLAIN on our queries (to be more specific we use auto_explain) but we can't seem to find what could cause an index scan that reads 34 MB to take more than a second (1.2 sec).. we do know that our dataset doesn't fit the memory and we were ok with that but with IO being so slow we just don't know what to do anymore. Any other suggestions other than contact AWS (which we did but it seems this path won't lead anywhere).
The performance of EBS is definitely a question for AWS. I can say that, in our experience, EBS mounts for RDS almost never approach the stated maximum throughput, although io2 tends to be closer than gp2 or gp3.
> On Mar 31, 2025, at 10:54, Eden Aharoni <edena@legitsecurity.com> wrote: > > So you believe it's strictly an EBS issue? Well, PostgreSQL certainly can read faster than 34MB/s off of disk. With the data you've given, I can't really say if it'spurely an EBS issue.
Sent: Monday, March 31, 2025 9:23:45 PM
To: Eden Aharoni <edena@legitsecurity.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: [EXTERNAL] RDS IO Read time
> On Mar 31, 2025, at 10:54, Eden Aharoni <edena@legitsecurity.com> wrote:
>
> So you believe it's strictly an EBS issue?
Well, PostgreSQL certainly can read faster than 34MB/s off of disk. With the data you've given, I can't really say if it's purely an EBS issue.
On 3/31/25 11:23 AM, Christophe Pettus wrote: > > >> On Mar 31, 2025, at 10:54, Eden Aharoni <edena@legitsecurity.com> wrote: >> >> So you believe it's strictly an EBS issue? You are asking about disk I/O, that is the purview of the OS and hardware. In your case both are created and managed by AWS, only they know what is going on behind the scenes. > > Well, PostgreSQL certainly can read faster than 34MB/s off of disk. With the data you've given, I can't really say ifit's purely an EBS issue. > -- Adrian Klaver adrian.klaver@aklaver.com
> On Mar 31, 2025, at 11:30, Eden Aharoni <edena@legitsecurity.com> wrote: > > Can you please tell me what other data might help? You can show your work on how you got the megabytes/second number. (Be aware that on a general open-source mailing list,there's only so much debugging that we can do of a specific problem that is not a PostgreSQL bug.)
Aggregate (cost=8581.22..8581.23 rows=1 width=4) (actual time=1209.498..1210.530 rows=1 loops=1)
Buffers: shared hit=3108 read=4314 dirtied=1
I/O Timings: shared read=1169.977
-> Gather (cost=8580.80..8581.21 rows=4 width=8) (actual time=1209.371..1210.526 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=3108 read=4314 dirtied=1
I/O Timings: shared read=1169.977
-> Aggregate (cost=7580.80..7580.81 rows=1 width=8) (actual time=1206.518..1206.520 rows=1 loops=5)
Buffers: shared hit=3105 read=4314 dirtied=1
I/O Timings: shared read=1169.977
-> Hash Join (cost=2620.70..7578.45 rows=937 width=0) (actual time=148.965..1206.330 rows=1897 loops=5)
Buffers: shared hit=3105 read=4314 dirtied=1
I/O Timings: shared read=1169.977
-> Index Only Scan using IX_Collaborators_TenantId_IsDeleted_RunningIndex on Collaborators c (cost=0.55..4619.26 rows=11407 width=80) (actual time=2.815..1058.793 rows=7474 loops=5)
Index Cond: ((c."TenantId" = '0f8026c775a5499fdd3ccd193a15f94c3b4ae58a'::text) AND (c."IsDeleted" = false))
Heap Fetches: 3711
Buffers: shared hit=2332 read=3563
I/O Timings: shared read=1028.548
-> Hash (cost=2479.01..2479.01 rows=11292 width=79) (actual time=144.988..144.989 rows=1983 loops=5)
Buffers: shared hit=637 read=751 dirtied=1
I/O Timings: shared read=141.429
-> Index Only Scan using IX_IntegrationCollaboration_TenantId_IsIntegrationDeleted_IsDe~ on IntegrationCollaboration i (cost=0.55..2479.01 rows=11292 width=79) (actual time=1.804..144.122 rows=1983 loops=5)
Index Cond: (i."TenantId" = '0f8026c775a5499fdd3ccd193a15f94c3b4ae58a'::text)
Heap Fetches: 112
Buffers: shared hit=637 read=751 dirtied=1
I/O Timings: shared read=141.429
Planning:
Buffers: shared hit=440
Execution time: 1210.592 ms
So we can see here that the query read 4314 blocks in 1.2 sec.. (ive done 4314 * 8kb / 1.2sec which is about 28 MB/s)…
Hope you’ll have any clue 😊
From: Christophe Pettus <xof@thebuild.com>
Date: Monday, 31 March 2025 at 21:44
To: Eden Aharoni <edena@legitsecurity.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: [EXTERNAL] RDS IO Read time
> On Mar 31, 2025, at 11:30, Eden Aharoni <edena@legitsecurity.com> wrote:
>
> Can you please tell me what other data might help?
You can show your work on how you got the megabytes/second number. (Be aware that on a general open-source mailing list, there's only so much debugging that we can do of a specific problem that is not a PostgreSQL bug.)
> On Mar 31, 2025, at 12:36, Eden Aharoni <edena@legitsecurity.com> wrote: > Hope you’ll have any clue 😊 Based on that, I'd take it up with AWS. It does seem that the EBS mount is under-performing.