Thread: PostgreSQL Read IOPS limit per connection

PostgreSQL Read IOPS limit per connection

From
Haroldo Kerry
Date:

PG Version:

PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1), compiled by gcc (Debian 4.9.2-10+deb8u1) 4.9.2, 64-bit

Installed via apt-get:

apt-get install -y postgresql-9.6=9.6.10-1.pgdg80+1 postgresql-client-9.6=9.6.10-1.pgdg80+1 postgresql-contrib-9.6=9.6.10-1.pgdg80+1

On a Debian 9.4 machine, 4.9 Kernel:

uname -a: Linux srv-7 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux

Running inside a Docker 17.05 container.

Hardware:

Server: Dell R430 96 GB RAM, 2 Xeon processors with 10 cores, 20 threads each, total 40 threads.

Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf, RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache enabled, connected via dedicated iSCSI switches and dedicated Ethernet ports, in link aggregation mode (2x1Gbps max bandwidth).

Data files and log files on above SAN storage on same volume, dedicated volume for temporary files.

 

Performance issue:

I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS per connection.

Running pgbench with multiple clients (-c 30) we are able to see 20K+ IOPS , which is what we expect. But, if we use just one client, we get 1200 IOPS, avg disk queue size around 1:

pgbench -U postgres -S -T 60 -c 1

iotop:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

dm-10             0.00     0.00 1242.00    1.00 10796.00    20.00    17.40     0.96    0.78    0.78    0.00   0.68  84.00

 

We tried to increase effective_io_size from 1 to 30, to no effect on multiple tests.

Running the fio disk benchmarking tool, we found the same number of IOPS (1200) on a random read test if we set the io depth to 1.

If we increase the io depth to 30, we find about the same number of IOPS (20K) we see on pgbench with multiple clients:

--fio config file

[job]

bs=8k

rw=randread

random_generator=lfsr

direct=1

ioengine=libaio

iodepth=30

time_based

runtime=60s

size=128M

filename=/var/lib/postgresql/data_9.6/file.fio

iotsat:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

dm-10             0.00     0.00 19616.00    0.00 156928.00     0.00    16.00    29.53    1.51    1.51    0.00   0.05 100.00

 

Which leads us to believe PG is limited to an IO depth of 1 per connection (PG submits just 1 I/O request per connection, not multiple ones), even though effective_io_concurrency could lead to greater I/O queue and probably greater IOPS as well.

Is this some inherent limitation of PG or am I misunderstanding something?

One of the issues I’m trying to solve is related to extracting data from a large table, which users a full table scan. We see the same 1200 IOPS limit of pgbench when we SELECT on this table using just one connection. If there is a limitation per connection, I might set up the application to have several connections, and then issue SELECTs for different sections of the table, and later join the data, but it looks cumbersome, especially if the DB can do extract data using more IOPS.


Best regards,

Haroldo Kerry
CTO/COO

Re: PostgreSQL Read IOPS limit per connection

From
Justin Pryzby
Date:
On Thu, Dec 27, 2018 at 02:44:55PM -0200, Haroldo Kerry wrote:
> PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1),

> Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs
> https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf,
> RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache
> enabled, connected via dedicated iSCSI switches and dedicated Ethernet
> ports, in link aggregation mode (2x1Gbps max bandwidth).

> I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS
> per connection.

postgres uses one server backend per client.

> We tried to increase effective_io_size from 1 to 30, to no effect on
> multiple tests.

https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
=> "Currently, this setting only affects bitmap heap scans."

> Is this some inherent limitation of PG or am I misunderstanding something?

It is a hsitoric limitation, but nowadays there's parallel query, which uses
2ndary "backend worker" processes.

It's supported in v9.6 but much more versatile in v10 and v11.

Justin


Re: PostgreSQL Read IOPS limit per connection

From
Haroldo Kerry
Date:
Justin,
Thanks for the quick response, I'll check it out.

Happy holidays,
Haroldo Kerry

On Thu, Dec 27, 2018 at 2:55 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Dec 27, 2018 at 02:44:55PM -0200, Haroldo Kerry wrote:
> PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1),

> Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs
> https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf,
> RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache
> enabled, connected via dedicated iSCSI switches and dedicated Ethernet
> ports, in link aggregation mode (2x1Gbps max bandwidth).

> I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS
> per connection.

postgres uses one server backend per client.

> We tried to increase effective_io_size from 1 to 30, to no effect on
> multiple tests.

https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
=> "Currently, this setting only affects bitmap heap scans."

> Is this some inherent limitation of PG or am I misunderstanding something?

It is a hsitoric limitation, but nowadays there's parallel query, which uses
2ndary "backend worker" processes.

It's supported in v9.6 but much more versatile in v10 and v11.

Justin


--

Haroldo Kerry

CTO/COO

Rua do Rócio, 220, 7° andar, conjunto 72

São Paulo – SP / CEP 04552-000

hkerry@callix.com.br

www.callix.com.br

Re: PostgreSQL Read IOPS limit per connection

From
Jeff Janes
Date:

 

Performance issue:

I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS per connection.

Running pgbench with multiple clients (-c 30) we are able to see 20K+ IOPS , which is what we expect. But, if we use just one client, we get 1200 IOPS, avg disk queue size around 1:


The default transaction done by pgbench simply has no opportunity for dispatching multiple io requests per connection.  It just a series of single-row lookups and single-row updates or inserts.  You will have to use a different benchmark if you want to exercise this area.  Probably something analytics heavy.

Also, you would want to use the newest version of PostgreSQL, as 9.6 doesn't have parallel query, which is much more generally applicable than effective_io_concurrency is.

One of the issues I’m trying to solve is related to extracting data from a large table, which users a full table scan. We see the same 1200 IOPS limit of pgbench when we SELECT on this table using just one connection. If there is a limitation per connection, I might set up the application to have several connections, and then issue SELECTs for different sections of the table, and later join the data, but it looks cumbersome, especially if the DB can do extract data using more IOPS.

The kernel should detect a sequential read in progress and invoke readahead.  That should be able to keep the CPU quite busy with data for any decent IO system.  Are you sure IO is even the bottleneck for your query?

Perhaps your kernel readahead settings need to be tuned.  Also, you may benefit from parallel query features implemented in newer versions of PostgreSQL.  In any event, the default transactions of pgbench are not going to be useful for benchmarking what you care about.

Cheers,

Jeff

Re: PostgreSQL Read IOPS limit per connection

From
Justin Pryzby
Date:
On Thu, Dec 27, 2018 at 08:20:23PM -0500, Jeff Janes wrote:
> Also, you would want to use the newest version of PostgreSQL, as 9.6
> doesn't have parallel query, which is much more generally applicable than
> effective_io_concurrency is.

It *does* have parallel query (early, somewhat limited support),
but not enabled by default.
https://www.postgresql.org/docs/9.6/parallel-query.html

There was some confusion due to being disabled in 9.6, only:
https://www.postgresql.org/message-id/20180620151349.GB7500%40momjian.us

Cheers,
Justin


Re: PostgreSQL Read IOPS limit per connection

From
Merlin Moncure
Date:
On Thu, Dec 27, 2018 at 7:29 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Dec 27, 2018 at 08:20:23PM -0500, Jeff Janes wrote:
> > Also, you would want to use the newest version of PostgreSQL, as 9.6
> > doesn't have parallel query, which is much more generally applicable than
> > effective_io_concurrency is.

effective_io_concurrency only applies to certain queries.   When it
does apply it can work wonders. See:
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azsTdQ@mail.gmail.com
 for  an example of how it can benefit.

parallel query is not going to help single threaded pg_bench results.
you are going to be entirely latency bound (network from bebench to
postgres, then postgres to storage).  On my dell crapbox I was getting
2200tps so you have some point of slowness relative to me, probably
not the disk itself.

Geetting faster performance is an age-old problem; you need to
aggregate specific requests into more general ones, move the
controlling logic into the database itself, or use various other
strategies.  Lowering latency is a hardware problem and can force
trade-offs (like, don't use a SAN) and has specific boundaries that
are not easy to bust through.

merlin


Re: PostgreSQL Read IOPS limit per connection

From
Haroldo Kerry
Date:
@Justin  @Merlin @ Jeff, 
Thanks so much for your time and insights, they improved our understanding of the underpinnings of PostgreSQL and allowed us to deal the issues we were facing.
Using parallel query on our PG 9.6 improved a lot the query performance - it turns out that a lot of our real world queries could benefit of parallel query, we saw about 4x improvements after turning it on, and now we see much higher storage IOPS thanks to the multiple workers.
On our tests effective_io_concurrency did not show such a large effect as the link you sent, I'll have a new look at it, maybe we are doing something wrong or the fact that the SSDs are on the SAN and not local affects the results.  
On the process we also learned that changing the default Linux I/O scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN Storage setup, we used to see latency peaks of 6,000 milliseconds on busy periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold improvement.


Best regards,
Haroldo Kerry



On Wed, Jan 9, 2019 at 5:14 PM Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Dec 27, 2018 at 7:29 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Dec 27, 2018 at 08:20:23PM -0500, Jeff Janes wrote:
> > Also, you would want to use the newest version of PostgreSQL, as 9.6
> > doesn't have parallel query, which is much more generally applicable than
> > effective_io_concurrency is.

effective_io_concurrency only applies to certain queries.   When it
does apply it can work wonders. See:
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azsTdQ@mail.gmail.com
 for  an example of how it can benefit.

parallel query is not going to help single threaded pg_bench results.
you are going to be entirely latency bound (network from bebench to
postgres, then postgres to storage).  On my dell crapbox I was getting
2200tps so you have some point of slowness relative to me, probably
not the disk itself.

Geetting faster performance is an age-old problem; you need to
aggregate specific requests into more general ones, move the
controlling logic into the database itself, or use various other
strategies.  Lowering latency is a hardware problem and can force
trade-offs (like, don't use a SAN) and has specific boundaries that
are not easy to bust through.

merlin



--

Haroldo Kerry

CTO/COO

Rua do Rócio, 220, 7° andar, conjunto 72

São Paulo – SP / CEP 04552-000

hkerry@callix.com.br

www.callix.com.br

Re: PostgreSQL Read IOPS limit per connection

From
Merlin Moncure
Date:
On Wed, Jan 9, 2019 at 3:52 PM Haroldo Kerry <hkerry@callix.com.br> wrote:
@Justin  @Merlin @ Jeff, 
Thanks so much for your time and insights, they improved our understanding of the underpinnings of PostgreSQL and allowed us to deal the issues we were facing.
Using parallel query on our PG 9.6 improved a lot the query performance - it turns out that a lot of our real world queries could benefit of parallel query, we saw about 4x improvements after turning it on, and now we see much higher storage IOPS thanks to the multiple workers.
On our tests effective_io_concurrency did not show such a large effect as the link you sent, I'll have a new look at it, maybe we are doing something wrong or the fact that the SSDs are on the SAN and not local affects the results.  
On the process we also learned that changing the default Linux I/O scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN Storage setup, we used to see latency peaks of 6,000 milliseconds on busy periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold improvement.

The links sent was using a contrived query to force a type of scan that benefits from that kind of query; it's a very situational benefit.  It would be interesting if you couldn't reproduce using the same mechanic.

merlin

Re: PostgreSQL Read IOPS limit per connection

From
Mark Hogg
Date:
Hello,

I am happy to hear that you have received all the help.

Please feel free to contact us for professional assistance any time you may need in the future.

Most Welcome!
 

Regards,


Mark Avinash Hogg

Director of Business Development

2ndQuadrant

+1(647) 770 9821 Cell

www.2ndquadrant.com

mark.hogg@2ndquadrant.com



On Wed, 9 Jan 2019 at 19:20, Merlin Moncure (via Accelo) <mmoncure@gmail.com> wrote:
On Wed, Jan 9, 2019 at 3:52 PM Haroldo Kerry <hkerry@callix.com.br> wrote:
@Justin  @Merlin @ Jeff, 
Thanks so much for your time and insights, they improved our understanding of the underpinnings of PostgreSQL and allowed us to deal the issues we were facing.
Using parallel query on our PG 9.6 improved a lot the query performance - it turns out that a lot of our real world queries could benefit of parallel query, we saw about 4x improvements after turning it on, and now we see much higher storage IOPS thanks to the multiple workers.
On our tests effective_io_concurrency did not show such a large effect as the link you sent, I'll have a new look at it, maybe we are doing something wrong or the fact that the SSDs are on the SAN and not local affects the results.  
On the process we also learned that changing the default Linux I/O scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN Storage setup, we used to see latency peaks of 6,000 milliseconds on busy periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold improvement.

The links sent was using a contrived query to force a type of scan that benefits from that kind of query; it's a very situational benefit.  It would be interesting if you couldn't reproduce using the same mechanic.

merlin