Thread: PostgreSQL Read IOPS limit per connection
PG Version:
PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1), compiled by gcc (Debian 4.9.2-10+deb8u1) 4.9.2, 64-bit
Installed via apt-get:
apt-get install -y postgresql-9.6=9.6.10-1.pgdg80+1 postgresql-client-9.6=9.6.10-1.pgdg80+1 postgresql-contrib-9.6=9.6.10-1.pgdg80+1
On a Debian 9.4 machine, 4.9 Kernel:
uname -a: Linux srv-7 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
Running inside a Docker 17.05 container.
Hardware:
Server: Dell R430 96 GB RAM, 2 Xeon processors with 10 cores, 20 threads each, total 40 threads.
Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf, RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache enabled, connected via dedicated iSCSI switches and dedicated Ethernet ports, in link aggregation mode (2x1Gbps max bandwidth).
Data files and log files on above SAN storage on same volume, dedicated volume for temporary files.
Performance issue:
I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS per connection.
Running pgbench with multiple clients (-c 30) we are able to see 20K+ IOPS , which is what we expect. But, if we use just one client, we get 1200 IOPS, avg disk queue size around 1:
pgbench -U postgres -S -T 60 -c 1
iotop:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-10 0.00 0.00 1242.00 1.00 10796.00 20.00 17.40 0.96 0.78 0.78 0.00 0.68 84.00
We tried to increase effective_io_size from 1 to 30, to no effect on multiple tests.
Running the fio disk benchmarking tool, we found the same number of IOPS (1200) on a random read test if we set the io depth to 1.
If we increase the io depth to 30, we find about the same number of IOPS (20K) we see on pgbench with multiple clients:
--fio config file
[job]
bs=8k
rw=randread
random_generator=lfsr
direct=1
ioengine=libaio
iodepth=30
time_based
runtime=60s
size=128M
filename=/var/lib/postgresql/data_9.6/file.fio
iotsat:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-10 0.00 0.00 19616.00 0.00 156928.00 0.00 16.00 29.53 1.51 1.51 0.00 0.05 100.00
Which leads us to believe PG is limited to an IO depth of 1 per connection (PG submits just 1 I/O request per connection, not multiple ones), even though effective_io_concurrency could lead to greater I/O queue and probably greater IOPS as well.
Is this some inherent limitation of PG or am I misunderstanding something?
One of the issues I’m trying to solve is related to extracting data from a large table, which users a full table scan. We see the same 1200 IOPS limit of pgbench when we SELECT on this table using just one connection. If there is a limitation per connection, I might set up the application to have several connections, and then issue SELECTs for different sections of the table, and later join the data, but it looks cumbersome, especially if the DB can do extract data using more IOPS.
Best regards,
On Thu, Dec 27, 2018 at 02:44:55PM -0200, Haroldo Kerry wrote: > PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1), > Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs > https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf, > RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache > enabled, connected via dedicated iSCSI switches and dedicated Ethernet > ports, in link aggregation mode (2x1Gbps max bandwidth). > I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS > per connection. postgres uses one server backend per client. > We tried to increase effective_io_size from 1 to 30, to no effect on > multiple tests. https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR => "Currently, this setting only affects bitmap heap scans." > Is this some inherent limitation of PG or am I misunderstanding something? It is a hsitoric limitation, but nowadays there's parallel query, which uses 2ndary "backend worker" processes. It's supported in v9.6 but much more versatile in v10 and v11. Justin
On Thu, Dec 27, 2018 at 02:44:55PM -0200, Haroldo Kerry wrote:
> PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1),
> Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs
> https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf,
> RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache
> enabled, connected via dedicated iSCSI switches and dedicated Ethernet
> ports, in link aggregation mode (2x1Gbps max bandwidth).
> I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS
> per connection.
postgres uses one server backend per client.
> We tried to increase effective_io_size from 1 to 30, to no effect on
> multiple tests.
https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
=> "Currently, this setting only affects bitmap heap scans."
> Is this some inherent limitation of PG or am I misunderstanding something?
It is a hsitoric limitation, but nowadays there's parallel query, which uses
2ndary "backend worker" processes.
It's supported in v9.6 but much more versatile in v10 and v11.
Justin
Haroldo Kerry CTO/COO Rua do Rócio, 220, 7° andar, conjunto 72 São Paulo – SP / CEP 04552-000 |
Performance issue:
I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS per connection.
Running pgbench with multiple clients (-c 30) we are able to see 20K+ IOPS , which is what we expect. But, if we use just one client, we get 1200 IOPS, avg disk queue size around 1:
One of the issues I’m trying to solve is related to extracting data from a large table, which users a full table scan. We see the same 1200 IOPS limit of pgbench when we SELECT on this table using just one connection. If there is a limitation per connection, I might set up the application to have several connections, and then issue SELECTs for different sections of the table, and later join the data, but it looks cumbersome, especially if the DB can do extract data using more IOPS.
On Thu, Dec 27, 2018 at 08:20:23PM -0500, Jeff Janes wrote: > Also, you would want to use the newest version of PostgreSQL, as 9.6 > doesn't have parallel query, which is much more generally applicable than > effective_io_concurrency is. It *does* have parallel query (early, somewhat limited support), but not enabled by default. https://www.postgresql.org/docs/9.6/parallel-query.html There was some confusion due to being disabled in 9.6, only: https://www.postgresql.org/message-id/20180620151349.GB7500%40momjian.us Cheers, Justin
On Thu, Dec 27, 2018 at 7:29 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Dec 27, 2018 at 08:20:23PM -0500, Jeff Janes wrote: > > Also, you would want to use the newest version of PostgreSQL, as 9.6 > > doesn't have parallel query, which is much more generally applicable than > > effective_io_concurrency is. effective_io_concurrency only applies to certain queries. When it does apply it can work wonders. See: https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azsTdQ@mail.gmail.com for an example of how it can benefit. parallel query is not going to help single threaded pg_bench results. you are going to be entirely latency bound (network from bebench to postgres, then postgres to storage). On my dell crapbox I was getting 2200tps so you have some point of slowness relative to me, probably not the disk itself. Geetting faster performance is an age-old problem; you need to aggregate specific requests into more general ones, move the controlling logic into the database itself, or use various other strategies. Lowering latency is a hardware problem and can force trade-offs (like, don't use a SAN) and has specific boundaries that are not easy to bust through. merlin
On Thu, Dec 27, 2018 at 7:29 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Dec 27, 2018 at 08:20:23PM -0500, Jeff Janes wrote:
> > Also, you would want to use the newest version of PostgreSQL, as 9.6
> > doesn't have parallel query, which is much more generally applicable than
> > effective_io_concurrency is.
effective_io_concurrency only applies to certain queries. When it
does apply it can work wonders. See:
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azsTdQ@mail.gmail.com
for an example of how it can benefit.
parallel query is not going to help single threaded pg_bench results.
you are going to be entirely latency bound (network from bebench to
postgres, then postgres to storage). On my dell crapbox I was getting
2200tps so you have some point of slowness relative to me, probably
not the disk itself.
Geetting faster performance is an age-old problem; you need to
aggregate specific requests into more general ones, move the
controlling logic into the database itself, or use various other
strategies. Lowering latency is a hardware problem and can force
trade-offs (like, don't use a SAN) and has specific boundaries that
are not easy to bust through.
merlin
Haroldo Kerry CTO/COO Rua do Rócio, 220, 7° andar, conjunto 72 São Paulo – SP / CEP 04552-000 |
@Justin @Merlin @ Jeff,Thanks so much for your time and insights, they improved our understanding of the underpinnings of PostgreSQL and allowed us to deal the issues we were facing.Using parallel query on our PG 9.6 improved a lot the query performance - it turns out that a lot of our real world queries could benefit of parallel query, we saw about 4x improvements after turning it on, and now we see much higher storage IOPS thanks to the multiple workers.On our tests effective_io_concurrency did not show such a large effect as the link you sent, I'll have a new look at it, maybe we are doing something wrong or the fact that the SSDs are on the SAN and not local affects the results.On the process we also learned that changing the default Linux I/O scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN Storage setup, we used to see latency peaks of 6,000 milliseconds on busy periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold improvement.
Regards,
Mark Avinash Hogg
Director of Business Development
2ndQuadrant
+1(647) 770 9821 Cell
On Wed, Jan 9, 2019 at 3:52 PM Haroldo Kerry <hkerry@callix.com.br> wrote:@Justin @Merlin @ Jeff,Thanks so much for your time and insights, they improved our understanding of the underpinnings of PostgreSQL and allowed us to deal the issues we were facing.Using parallel query on our PG 9.6 improved a lot the query performance - it turns out that a lot of our real world queries could benefit of parallel query, we saw about 4x improvements after turning it on, and now we see much higher storage IOPS thanks to the multiple workers.On our tests effective_io_concurrency did not show such a large effect as the link you sent, I'll have a new look at it, maybe we are doing something wrong or the fact that the SSDs are on the SAN and not local affects the results.On the process we also learned that changing the default Linux I/O scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN Storage setup, we used to see latency peaks of 6,000 milliseconds on busy periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold improvement.The links sent was using a contrived query to force a type of scan that benefits from that kind of query; it's a very situational benefit. It would be interesting if you couldn't reproduce using the same mechanic.merlin