Home > mailing lists

Thread: Postgresql TPS Bottleneck

Postgresql TPS Bottleneck

From

Date:

31 March 2022, 11:50:34

Hi everyone,

I am a bachelor's student and writing my thesis about the scaling and

performance of an application. The application is using postgresql as a

database but we can't scale any further currently as it seems postgres

is hitting the limit.

With the application, as well as with pgbench, we don't get more than

(max) 70k TPS on postgres. But the servers' resources are not utilized

completely (more below).

I've tried many different configurations but none of them had any major

performance impact (unless fsync and synchronous_commit = off).

This is the (custom) configuration I am using:

shared_buffers=65551953kB

effective_cache_size=147491895kB

huge_pages=on

min_wal_size=20GB

max_wal_size=200GB

wal_buffers=1GB

max_wal_senders=0

archive_mode=off

wal_level=minimal

work_mem=2GB

maintenance_work_mem=4GB

checkpoint_completion_target=0.9

checkpoint_timeout = 30min

random_page_cost=1.1

bgwriter_flush_after = 2MB

effective_io_concurrency = 200

# Disabled just for performance experiments

fsync = off

synchronous_commit = off

full_page_writes = on

max_worker_processes=64

max_parallel_workers=64

max_parallel_workers_per_gather=10

max_parallel_maintenance_workers=12

The system is as follows:

* 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16

cores/CPU))

* 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)

2666 MHz (0.4 ns))

* 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the

other for the data)

* 10 Gbps network link

* OS: Debian 11

* Postgres 13 from apt

(I've also written a stackoverflow post about it -

https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o )

performance - Postgresql bottleneck neither CPU, network nor I/O - Stack Overflow

We are testing our application for performance, which is using Postgresql 13 as a database. It is very insert and update heavy and we cannot get more than 65k TPS on the database. But none of the m...

stackoverflow.com

Below is just an example of the pgbench I ran:

pgbench -i -s 50 -U postgres -h <DB_HOST> -d <DB_NAME>

pgbench -c 64 -j 32 -t 100000 -h <DB_HOST> -U postgres <DB_NAME>

starting vacuum...end.

transaction type: <builtin: TPC-B (sort of)>

scaling factor: 50

query mode: simple

number of clients: 64

number of threads: 32

number of transactions per client: 100000

number of transactions actually processed: 6400000/6400000

latency average = 0.976 ms

tps = 65584.664360 (including connections establishing)

tps = 65594.330678 (excluding connections establishing)

As comparison (average of three runs with pgbench as above):

num clients default config custom config above

10 11336 16848

20 19528 30187

30 25769 39430

40 29792 50150

50 31096 60133

60 33900 64916

70 34986 64308

80 34170 63075

90 35108 59910

100 34864 58320

120 35124 55414

140 33403 53610

(with fsync=off alone I almost get the TPS from the right already)

For `-S -M prepared` the TPS is ~700k and for `-S` ~500k but as the

application is very write heavy this is not really useful for me.

With the app the CPU is only at 25% load and the disks are also no

problem. For pgbench its about 75% CPU but still no disk bottleneck

(about 5%).

There are also Grafana snapshots I created for the system (node-

exporter) and postgres (prometheus-postgres-exporter) while running

with our application (same configuration as above). Both do not show

any kind of bottleneck (except high amounts context switches and pages

in/out)

node: https://147.87.255.221:3000/dashboard/snapshot/3eXe1sS3QDL6cbvI7HkPjYnjrVLCNOOF

postgres: https://147.87.255.221:3000/dashboard/snapshot/wHkRphdr3D4k5kRckhn57Pc6ZD3st1x7

I have also looked at postgresql's lock tables while running the above

experiment, but there is nothing which seemed strange to me. There are

about 300 locks but all are granted (select * from pg_locks).

Also, the following query:

select wait_event, count(*) from pg_stat_activity where state='idle in

transaction' group by wait_event;

did not show some contention there the output looks always similar to

this (80 clients):

wait_event | count

--------------------------+-------

ClientRead | 2

SerializableFinishedList | 1

Thanks to the slack channel I got a link to edb which used a more

powerful server and they achieved also about 70k TPS but did not set

fsync=off. So maybe they were limited by disk IO (just guessing, as

unfortunately, it is not pointed out in the post).

https://www.enterprisedb.com/blog/pgbench-performance-benchmark-postgresql-12-and-edb-advanced-server-12

So, my question is if anyone knows what could be the bottleneck, or if

it is even possible to get more TPS in this write-heavy load.

(dmesg does also not contain error messages which would point to a

kernel misconfiguration)

Optimally I would like to fully use the CPU and get about 3-4 times

more TPS (if even possible).

Thanks already for everyone's time and help.

Re: Postgresql TPS Bottleneck

From

Guillaume Cottenceau

Date:

31 March 2022, 14:18:37

<wakandavision 'at' outlook.com> writes:

> Optimally I would like to fully use the CPU and get about 3-4 times
> more TPS (if even possible).

Disclaimer: I'm really not a pg performance expert.
I don't understand your hope to fully use the CPU; if your
scenario is disk-limited, which may very well be the case, then
of course you cannot fully use the CPU. With synchronous commits
and fsync, the system is probably spending time just waiting for
the disks to report the writes completion. Are iostat/vmstat
showing a lot of IO-wait?
Also, if you can live with a few lost transactions in case of
server crash, synchronous_commit=off is very ok and provides a
lot of performance gain.

-- 
Guillaume Cottenceau

Re: Postgresql TPS Bottleneck

From

MichaelDBA

Date:

31 March 2022, 18:55:00

While setting these 2 parameters to off will make things go faster 
(especially for fsync), it is unrealistic to have these settings in a 
production environment, especiall fsync=off.  You might get by with 
synchronous_commit=off, but with fsync=off you could end up with 
corruption in your database.  synchronous_commit may not make anything 
go faster just change where the time is being spent.

Regards,
Michael Vitale


wakandavision@outlook.com wrote on 3/31/2022 7:50 AM:
> fsync = off
> synchronous_commit = off

Re: Postgresql TPS Bottleneck

From

Tomas Vondra

Date:

31 March 2022, 19:16:50


On 3/31/22 13:50, wakandavision@outlook.com wrote:
> Hi everyone,
> 
> I am a bachelor's student and writing my thesis about the scaling and
> performance of an application. The application is using postgresql as a
> database but we can't scale any further currently as it seems postgres
> is hitting the limit.
> 
> With the application, as well as with pgbench, we don't get more than
> (max) 70k TPS on postgres. But the servers' resources are not utilized
> completely (more below).
> 
> I've tried many different configurations but none of them had any major
> performance impact (unless fsync and synchronous_commit = off).
> 
> This is the (custom) configuration I am using:
> 
> shared_buffers=65551953kB
> effective_cache_size=147491895kB
> huge_pages=on
> min_wal_size=20GB
> max_wal_size=200GB
> wal_buffers=1GB
> max_wal_senders=0
> archive_mode=off
> wal_level=minimal
> work_mem=2GB
> maintenance_work_mem=4GB
> checkpoint_completion_target=0.9
> checkpoint_timeout = 30min
> random_page_cost=1.1
> bgwriter_flush_after = 2MB
> effective_io_concurrency = 200
> # Disabled just for performance experiments
> fsync = off
> synchronous_commit = off
> full_page_writes = on
> max_worker_processes=64
> max_parallel_workers=64
> max_parallel_workers_per_gather=10
> max_parallel_maintenance_workers=12
> 
> The system is as follows:
> 
> * 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16
> cores/CPU))
> * 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)
> 2666 MHz (0.4 ns))
> * 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the
> other for the data)
> * 10 Gbps network link
> * OS: Debian 11
> * Postgres 13 from apt
> 
> (I've also written a stackoverflow post about it -
> https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
> )
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
>     
> performance - Postgresql bottleneck neither CPU, network nor I/O - Stack
> Overflow
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
> We are testing our application for performance, which is using
> Postgresql 13 as a database. It is very insert and update heavy and we
> cannot get more than 65k TPS on the database. But none of the m...
> stackoverflow.com
> 
> 
> 
> Below is just an example of the pgbench I ran:
> 
> pgbench -i -s 50 -U postgres -h <DB_HOST> -d <DB_NAME>
> pgbench -c 64 -j 32 -t 100000 -h <DB_HOST> -U postgres <DB_NAME>
> 

I'd bet you need to use "pgbench -N" because the regular transaction
updates the "branch" table, and you only have 50 branches. Which
probably means a lot of conflicts and locking. The other thing you might
try is "-M prepared" which saves time on query planning.

FWIW I really doubt "fsync=off" will give you any meaningful results.

Maybe try assessing the hardware capability first, using tools like fio
to measure IOPS with different workloads.

Then try pgbench with a single client, and finally increase the number
of clients and see how it behaves and compare it to what you expect.

In any case, every system has a bottleneck. You're clearly hitting one,
otherwise the numbers would go faster. Usually, it's either CPU bound,
in which case "perf top" might tell us more, or it's IO bound, in which
case try e.g. "iostat -x -k 1" or something.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgresql TPS Bottleneck

From

Mladen Gogala

Date:

01 April 2022, 02:40:25

On 3/31/22 07:50, wakandavision@outlook.com wrote:

Hi everyone,

I am a bachelor's student and writing my thesis about the scaling and
performance of an application. The application is using postgresql as a
database but we can't scale any further currently as it seems postgres
is hitting the limit.

With the application, as well as with pgbench, we don't get more than
(max) 70k TPS on postgres. But the servers' resources are not utilized
completely (more below).

I would try monitoring using "perf top" and "atop -d" to see what is going on on the system. Also, try using sar to figure out what's going on. Are you paging, waiting for I/O or having some other kind of bottleneck. Once you figure where is your system spending time, you can address the problem. In addition to that, analyze the log files with pgbadger to find out which queries are time consuming and try optimizing them.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: Postgresql TPS Bottleneck

From

Date:

20 April 2022, 06:49:16

Hi, thanks for your answer.

 We have a Grafana instance monitoring all those metrics, no one I asked so far could identify an obvious bottleneck. 

 However, I have done further experiments to see if we are missing something.

 While running the benchmark with our application I've run tools on the

 DB node to smoke up the resources. These were cpuburn, iperf and fio.

 While cpuburn did result in a small drop of Postgres TPS it was nothing

 which was not expected. However, iperf and fio did not have any impact

 at all (except iperf when more than our 10Gbps are sent - clearly). The

 disks were utilized 100% but Postgres stayed at about 65k TPS.

 The next thing I did was starting two independent Postgres instances on

 the same server and run independent client applications against each of

 them. This resulted in our application getting almost double of the TPS

 compared to running a single instance (from 13k to 23k) - Each Postgres

 instance had about 45k TPS which did not increase (?).

 I think what's also interesting is that our DB server has the TPS peak

 when using about 80 clients (more results in the TPS going down again),

 while when I search the internet most benchmarks peak at about 400-600

 clients.

 Does anyone have an idea what might be the problem? 

Maybe I am missing a kernel/Postgres configuration parameter?

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, March 31, 2022 9:16 PM
To: wakandavision@outlook.com <wakandavision@outlook.com>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: Re: Postgresql TPS Bottleneck

On 3/31/22 13:50, wakandavision@outlook.com wrote:
> Hi everyone,
>
> I am a bachelor's student and writing my thesis about the scaling and
> performance of an application. The application is using postgresql as a
> database but we can't scale any further currently as it seems postgres
> is hitting the limit.
>
> With the application, as well as with pgbench, we don't get more than
> (max) 70k TPS on postgres. But the servers' resources are not utilized
> completely (more below).
>
> I've tried many different configurations but none of them had any major
> performance impact (unless fsync and synchronous_commit = off).
>
> This is the (custom) configuration I am using:
>
> shared_buffers=65551953kB
> effective_cache_size=147491895kB
> huge_pages=on
> min_wal_size=20GB
> max_wal_size=200GB
> wal_buffers=1GB
> max_wal_senders=0
> archive_mode=off
> wal_level=minimal
> work_mem=2GB
> maintenance_work_mem=4GB
> checkpoint_completion_target=0.9
> checkpoint_timeout = 30min
> random_page_cost=1.1
> bgwriter_flush_after = 2MB
> effective_io_concurrency = 200
> # Disabled just for performance experiments
> fsync = off
> synchronous_commit = off
> full_page_writes = on
> max_worker_processes=64
> max_parallel_workers=64
> max_parallel_workers_per_gather=10
> max_parallel_maintenance_workers=12
>
> The system is as follows:
>
> * 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16
> cores/CPU))
> * 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)
> 2666 MHz (0.4 ns))
> * 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the
> other for the data)
> * 10 Gbps network link
> * OS: Debian 11
> * Postgres 13 from apt
>
> (I've also written a stackoverflow post about it -
> https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
> )
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
>
> performance - Postgresql bottleneck neither CPU, network nor I/O - Stack
> Overflow
> <https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o>
> We are testing our application for performance, which is using
> Postgresql 13 as a database. It is very insert and update heavy and we
> cannot get more than 65k TPS on the database. But none of the m...
> stackoverflow.com
>
>
>
> Below is just an example of the pgbench I ran:
>
> pgbench -i -s 50 -U postgres -h <DB_HOST> -d <DB_NAME>
> pgbench -c 64 -j 32 -t 100000 -h <DB_HOST> -U postgres <DB_NAME>
>

I'd bet you need to use "pgbench -N" because the regular transaction
updates the "branch" table, and you only have 50 branches. Which
probably means a lot of conflicts and locking. The other thing you might
try is "-M prepared" which saves time on query planning.

FWIW I really doubt "fsync=off" will give you any meaningful results.

Maybe try assessing the hardware capability first, using tools like fio
to measure IOPS with different workloads.

Then try pgbench with a single client, and finally increase the number
of clients and see how it behaves and compare it to what you expect.

In any case, every system has a bottleneck. You're clearly hitting one,
otherwise the numbers would go faster. Usually, it's either CPU bound,
in which case "perf top" might tell us more, or it's IO bound, in which
case try e.g. "iostat -x -k 1" or something.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgresql TPS Bottleneck

From

Jeff Janes

Date:

20 April 2022, 15:49:30

On Wed, Apr 20, 2022 at 5:13 AM <wakandavision@outlook.com> wrote:

The next thing I did was starting two independent Postgres instances on
the same server and run independent client applications against each of
them. This resulted in our application getting almost double of the TPS
compared to running a single instance (from 13k to 23k) - Each Postgres
instance had about 45k TPS which did not increase (?).

How could that be? Isn't there a one to one correspondence between app progress and PostgreSQL transactions? How could one almost double while the other did not increase? Anyway, 2x45 does seem like an increase (smallish) over 65.

Your bottleneck for pgbench may be IPC/context switches. I noticed that -S did about 7 times more than the default, and it only makes one round trip to the database while the default makes 7.

You could package up the different queries made by the default transaction into one function call, in order to do the same thing but with fewer round trips to the database. This would be an easy way to see if my theory is true. If it is, I don't know what that would mean for your app though, as we know nothing about its structure.

I have a patch handy (attached) which implements this feature as the builtin transaction "-b tpcb-func". If you don't want to recompile pgbench, you could dissect the patch to reimplement the same thing as a -f style transaction instead.

Note that packaging it up this way does violate the spirit of the benchmark, as clearly someone is supposed to look at the results of the first select before deciding to proceed with the rest of the transaction. But you don't seem very interested in the spirit of the tpc-b benchmark, just in using it as a tool to track down a bottleneck.

Cheers,

Jeff

Attachment

pgbench_function_v13.patch

Re: Postgresql TPS Bottleneck

From

Date:

20 April 2022, 16:35:07

Clearly, I have only supplied half of the information there. I'm really sorry about that. The TPS measurement of the application does in no way correspond to the TPS of Postgres.

They are measured completely different but it's the measure we actually are interested in - as we want to assess the scalability of the application.

What I wanted to show is that the server we are hosting Postgres on is not bottlenecked (in an obvious way), as running two instances in parallel on the same server gives us almost double

the performance in our application and double the resource usage on the DB server. But what actually is strange(?), is that the TPS of Postgres does not change much, i.e. it's just 'distributed' to the two instances.

It would seem like our application could not handle more throughput, but I did the same with three instances, where we stayed again with 'only' double the performance and the TPS of Postgres distributed to three instances

(each client application running on an independent node).

I'm really getting frustrated here as I (and no one I asked yet) has an explanation for this behavior.

From: Jeff Janes <jeff.janes@gmail.com>
Sent: Wednesday, April 20, 2022 5:49 PM
To: wakandavision@outlook.com <wakandavision@outlook.com>
Cc: Tomas Vondra <tomas.vondra@enterprisedb.com>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: Re: Postgresql TPS Bottleneck

On Wed, Apr 20, 2022 at 5:13 AM <wakandavision@outlook.com> wrote:

The next thing I did was starting two independent Postgres instances on
the same server and run independent client applications against each of
them. This resulted in our application getting almost double of the TPS
compared to running a single instance (from 13k to 23k) - Each Postgres
instance had about 45k TPS which did not increase (?).

Your bottleneck for pgbench may be IPC/context switches. I noticed that -S did about 7 times more than the default, and it only makes one round trip to the database while the default makes 7.

Cheers,

Jeff

Re: Postgresql TPS Bottleneck

From

Benedict Holland

Date:

20 April 2022, 18:18:22

Ypu wouldn't get an increasing running two instances on the same server. Distributed database severs is a complex application and tuning it will depend on storage and CPU capacity. It could be as simple as a bus. Are you running this locally or on the cloud? Are you running this on a distributed file system or across a network? There are a dozen different reasons why a database would not be using 100% of capacity from indexing to disk or bus bound or network bound.

Thanks,

Ben

On Wed, Apr 20, 2022, 1:27 PM <wakandavision@outlook.com> wrote:

Clearly, I have only supplied half of the information there. I'm really sorry about that. The TPS measurement of the application does in no way correspond to the TPS of Postgres.
They are measured completely different but it's the measure we actually are interested in - as we want to assess the scalability of the application.

What I wanted to show is that the server we are hosting Postgres on is not bottlenecked (in an obvious way), as running two instances in parallel on the same server gives us almost double
the performance in our application and double the resource usage on the DB server. But what actually is strange(?), is that the TPS of Postgres does not change much, i.e. it's just 'distributed' to the two instances.

It would seem like our application could not handle more throughput, but I did the same with three instances, where we stayed again with 'only' double the performance and the TPS of Postgres distributed to three instances
(each client application running on an independent node).

I'm really getting frustrated here as I (and no one I asked yet) has an explanation for this behavior.
From: Jeff Janes <jeff.janes@gmail.com>
Sent: Wednesday, April 20, 2022 5:49 PM
To: wakandavision@outlook.com <wakandavision@outlook.com>
Cc: Tomas Vondra <tomas.vondra@enterprisedb.com>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: Re: Postgresql TPS Bottleneck

On Wed, Apr 20, 2022 at 5:13 AM <wakandavision@outlook.com> wrote:

The next thing I did was starting two independent Postgres instances on
the same server and run independent client applications against each of
them. This resulted in our application getting almost double of the TPS
compared to running a single instance (from 13k to 23k) - Each Postgres
instance had about 45k TPS which did not increase (?).

How could that be? Isn't there a one to one correspondence between app progress and PostgreSQL transactions? How could one almost double while the other did not increase? Anyway, 2x45 does seem like an increase (smallish) over 65.

Your bottleneck for pgbench may be IPC/context switches. I noticed that -S did about 7 times more than the default, and it only makes one round trip to the database while the default makes 7.

You could package up the different queries made by the default transaction into one function call, in order to do the same thing but with fewer round trips to the database. This would be an easy way to see if my theory is true. If it is, I don't know what that would mean for your app though, as we know nothing about its structure.

I have a patch handy (attached) which implements this feature as the builtin transaction "-b tpcb-func". If you don't want to recompile pgbench, you could dissect the patch to reimplement the same thing as a -f style transaction instead.

Note that packaging it up this way does violate the spirit of the benchmark, as clearly someone is supposed to look at the results of the first select before deciding to proceed with the rest of the transaction. But you don't seem very interested in the spirit of the tpc-b benchmark, just in using it as a tool to track down a bottleneck.

Cheers,

Jeff