Thread: PostgreSQL performance on ARM i.MX6
Hi there,
I am investigating possible throughput with PostgreSQL 14.4 on an ARM i.MX6 Quad CPU (NXP sabre board).
Testing with a simple python script (running on the same CPU), I get ~1000 request/s.
import psycopg as pg
conn = pg.connect('dbname=test')
conn.autocommit = True
cur = conn.cursor()
while True:
cur.execute("call dummy_call(%s,%s,%s, ARRAY[%s, %s, %s]::real[]);", (1,2,3, 4.0, 5.0, 6.0), binary=True )
where the called procedure is basically a no-op:
CREATE OR REPLACE PROCEDURE dummy_call(
in arg1 int,
in arg2 int,
in arg3 int,
in arg4 double precision[])
AS $$
BEGIN
END
$$ LANGUAGE plpgsql;
This seems to be a quite low number of requests/s, given that there are no changes to the database.
Looking for suggestions what could cause this poor performance and where to start investigations.
Thanks,
Marc
The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.
On Tue, 23 May 2023 at 13:43, Druckenmueller, Marc <marc.druckenmueller@philips.com> wrote: > Testing with a simple python script (running on the same CPU), I get ~1000 request/s. Is the time spent in the client or in the server? Are there noticeable differences if you execute that statement in a loop in psql (with the variables already bound)? -- Daniele
"Druckenmueller, Marc" <marc.druckenmueller@philips.com> writes: > I am investigating possible throughput with PostgreSQL 14.4 on an ARM i.MX6 Quad CPU (NXP sabre board). > Testing with a simple python script (running on the same CPU), I get ~1000 request/s. That does seem pretty awful for modern hardware, but it's hard to tease apart the various potential causes. How beefy is that CPU really? Maybe the overhead is all down to client/server network round trips? Maybe psycopg is doing something unnecessarily inefficient? For comparison, on my development workstation I get [ create the procedure manually in db test ] $ cat bench.sql call dummy_call(1,2,3,array[1,2,3]::float8[]); $ pgbench -f bench.sql -n -T 10 test pgbench (16beta1) transaction type: bench.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 353891 number of failed transactions: 0 (0.000%) latency average = 0.028 ms initial connection time = 7.686 ms tps = 35416.189844 (without initial connection time) and it'd be more if I weren't using an assertions-enabled debug build. It would be interesting to see what you get from exactly that test case on your ARM board. BTW, one thing I see that's definitely an avoidable inefficiency in your test is that you're forcing the array parameter to real[] (i.e. float4) when the procedure takes double precision[] (i.e. float8). That forces an extra run-time conversion. Swapping between float4 and float8 in my pgbench test doesn't move the needle a lot, but it's noticeable. Another thing to think about is that psycopg might be defaulting to a TCP rather than Unix-socket connection, and that might add overhead depending on what kernel you're using. Although, rather than try to micro-optimize that, you probably ought to be thinking of how to remove network round trips altogether. I can get upwards of 300K calls/second if I push the loop to the server side: test=# \timing Timing is on. test=# do $$ declare x int := 1; a float8[] := array[1,2,3]; begin for i in 1..1000000 loop call dummy_call (x,x,x,a); end loop; end $$; DO Time: 3256.023 ms (00:03.256) test=# select 1000000/3.256023; ?column? --------------------- 307123.137643683721 (1 row) Again, it would be interesting to compare exactly that test case on your ARM board. regards, tom lane
On 2023-05-23 12:42, Druckenmueller, Marc wrote: > Hi there, > > I am investigating possible throughput with PostgreSQL 14.4 on an ARM > i.MX6 Quad CPU (NXP sabre board). > > Testing with a simple python script (running on the same CPU), I get > ~1000 request/s. I tweaked your script slightly, but this is what I got on the Raspberry Pi 4 that I have in the corner of the room. Almost twice the speed you are seeing. 0: this = 0.58 tot = 0.58 1: this = 0.55 tot = 1.13 2: this = 0.59 tot = 1.72 3: this = 0.55 tot = 2.27 4: this = 0.56 tot = 2.83 5: this = 0.57 tot = 3.40 6: this = 0.56 tot = 3.96 7: this = 0.55 tot = 4.51 8: this = 0.59 tot = 5.11 9: this = 0.60 tot = 5.71 That's with governor=performance and a couple of background tasks running as well as the python. PostgreSQL 15 in a container on a Debian O.S. I've not done any tuning on PostgreSQL (but your call isn't doing anything really) nor the Pi. The minor tweaks to your script were as below: import psycopg as pg import time conn = pg.connect('') conn.autocommit = True cur = conn.cursor() start = time.time() prev = start end = start for j in range(10): for i in range(1000): cur.execute("call dummy_call(%s,%s,%s, ARRAY[%s, %s, %s]::real[]);", (1,2,3, 4.0, 5.0, 6.0), binary=True ) end = time.time() print(f"{j}: this = {(end - prev):.2f} tot = {(end - start):.2f}") prev = end -- Richard Huxton
Hi there,
I am investigating possible throughput with PostgreSQL 14.4 on an ARM i.MX6 Quad CPU (NXP sabre board).
Testing with a simple python script (running on the same CPU), I get ~1000 request/s.