Why is there a tenfold difference between Postgres's alleged query execution time and packet transmission time? - Mailing list pgsql-performance

From Rollo Konig-Brock
Subject Why is there a tenfold difference between Postgres's alleged query execution time and packet transmission time?
Date
Msg-id CAOkhyB7Cm=5GuRYr6LJQcxw_XKkHDeV_FVsCk4LDzN-Yig5wrA@mail.gmail.com
Whole thread Raw
Responses Re: Why is there a tenfold difference between Postgres's alleged query execution time and packet transmission time?
List pgsql-performance

Hey all,

I've been pulling my hair out over this for days now, as I'm trying to build a low latency application. Databases should be fast, but I can not work out why so much latency is added between the actual database process and the application code. For simple queries, that should take less than a millisecond, this mystery latency is by far the biggest performance hit.

For example, I have a very simple table on a local Postgres database, which takes 324us to query. This is absolutely fine.

postgres=# EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF) select count("ID") from example t;
-[ RECORD 1 ]------------------------------------------------------
QUERY PLAN | Aggregate (actual rows=1 loops=1)
-[ RECORD 2 ]------------------------------------------------------
QUERY PLAN |   ->  Seq Scan on example t (actual rows=4119 loops=1)
-[ RECORD 3 ]------------------------------------------------------
QUERY PLAN | Planning Time: 0.051 ms
-[ RECORD 4 ]------------------------------------------------------
QUERY PLAN | Execution Time: 0.324 ms

But then if I want to connect to the database in Python, getting the actual data takes over 2500us! The loopback interface is not the problem here, I sanity checked that (it adds 100us at a stretch).

In [1]: %time cursor.execute("select count(\"ID\") from example;");r = cursor.fetchmany()
CPU times: user 316 µs, sys: 1.12 ms, total: 1.44 ms
Wall time: 2.73 ms

Investigating further, I opened up WireShark to see how long the packets themselves take.

Wireshark screenshot

From this I can tell that Postgres took 2594us to send the data. This kind of overhead for an loopback IPC call is way beyond what I would expect. Why is this? And how can I optimise this away?

This is a problem for the software ecosystem, because people start thinking databases are too slow to do simple lookups, and they start caching queries in Redis that take 300us to execute, but 3ms to actually fetch back to the client.  It adds up to an awful lot of latency.


As an aside, I've been really miffed by how bad database IPC performance has been in general.  Oracle has a problem where each ~1500 bytes makes receiving the packet 800us slower (on a gigabit network).


I'd really love to know what's happening here.


Rollo


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: LWLocks by LockManager slowing large DB
Next
From: "Daniel Westermann (DWE)"
Date:
Subject: Strange behavior once statistics are there