Thread: Slow ext'd query via client native implementation vs. libpq & simple psql

Slow ext'd query via client native implementation vs. libpq & simple psql

From
Richard Michael
Date:
Hello,

On PG 12.2, I am analyzing a performance problem when using a client
(Elixir/postgrex) querying via the extended query protocol.  I am comparing with
psql and a C program.  Logs for all three follow this short explanation.

The query is trivial: `SELECT [cols] FROM t WHERE id = X` on a 65K row table.

The Elixir client executes this as an extended query in >500-700ms, very slow.
If relevant, the client does not use libpq, it is a native implementation.

A simple query via psql `PREPARE q AS ... ; EXECUTE q;` executes in ~130ms.
(Aside, please let me know if psql can execute extended queries.)

To compare another extended query protocol client, I wrote a tiny C program
using libpq PQprepare()/PQexecPrepared() executes in ~100ms.

I have tried to make the three tests as similar as possible; all are via
localhost and use named statements.

All use an identical query plan. There is a btree index on the WHERE col, but
the table is small enough it is not used.

The above durations are consistent across server restart, reboot, and repetition
(i.e.  still >500ms if run multiple times), so it seems independent of
filesystem caching, buffers, etc.

Obviously the client's query execution is somehow different, but I do not know
what/why.

I have enabled auto_explain min_duration (0), analyze, buffers, verbose and
settings.  What more can I log or do to postgres to understand why the client is
behaving poorly?  Would wireshark on client messages reveal anything postgres
can't log?

Other suggestions much appreciated as well.

(I happen to be on OSX, but I can test elsewhere if necessary.)

Regards,
richard


Elixir/postgrex extended query: (always >500ms)
-------------------------------
2020-03-11 13:46:20.090 EDT [3401] LOG:  connection received: host=127.0.0.1 port=50128
2020-03-11 13:46:20.096 EDT [3401] LOG:  connection authorized: user=testuser database=biodata
2020-03-11 13:46:20.141 EDT [3401] LOG:  duration: 1.138 ms  parse ecto_98: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM "genes" AS g0 WHERE (g0."genome_id" = 1)
2020-03-11 13:46:20.144 EDT [3401] LOG:  duration: 2.292 ms  bind ecto_98: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM "genes" AS g0 WHERE (g0."genome_id" = 1)
2020-03-11 13:46:20.658 EDT [3401] LOG:  duration: 513.791 ms  execute ecto_98: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM "genes" AS g0 WHERE (g0."genome_id" = 1)
2020-03-11 13:46:20.658 EDT [3401] LOG:  duration: 513.792 ms  plan:
Query Text: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM "genes" AS g0 WHERE (g0."genome_id" = 1)
Seq Scan on public.genes g0  (cost=0.00..6591.62 rows=60632 width=224) (actual time=0.523..33.229 rows=60623 loops=1)
 Output: id, genome_id, seqid, source, feature, start, "end", score, strand, phase, attribute
 Filter: (g0.genome_id = 1)
 Rows Removed by Filter: 3907
 Buffers: shared read=5785
2020-03-11 13:46:20.887 EDT [3401] LOG:  disconnection: session time: 0:00:00.796 user=testuser database=biodata host=127.0.0.1 port=50128



psql simple PREPARE/EXECUTE:
----------------------------
2020-03-11 13:46:40.021 EDT [3438] LOG:  connection received: host=::1 port=50129
2020-03-11 13:46:40.044 EDT [3438] LOG:  connection authorized: user=testuser database=biodata application_name=psql
2020-03-11 13:46:40.106 EDT [3438] LOG:  duration: 58.071 ms  plan:
Query Text: PREPARE q AS SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM "genes" AS g0 WHERE (g0."genome_id" = 1); EXECUTE q;
Seq Scan on public.genes g0  (cost=0.00..6591.62 rows=60632 width=224) (actual time=0.060..25.542 rows=60623 loops=1)
 Output: id, genome_id, seqid, source, feature, start, "end", score, strand, phase, attribute
 Filter: (g0.genome_id = 1)
 Rows Removed by Filter: 3907
 Buffers: shared hit=42 read=5743
2020-03-11 13:46:40.182 EDT [3438] LOG:  duration: 136.173 ms  statement: PREPARE q AS SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM "genes" AS g0 WHERE (g0."genome_id" = 1); EXECUTE q;
2020-03-11 13:46:40.182 EDT [3438] DETAIL:  prepare: PREPARE q AS SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM "genes" AS g0 WHERE (g0."genome_id" = 1); EXECUTE q;
2020-03-11 13:46:41.140 EDT [3438] LOG:  disconnection: session time: 0:00:01.119 user=testuser database=biodata host=::1 port=50129



C libpq extended query:
-----------------------
2020-03-11 13:50:00.220 EDT [4299] LOG:  connection received: host=::1 port=50137
2020-03-11 13:50:00.232 EDT [4299] LOG:  connection authorized: user=testuser database=biodata
2020-03-11 13:50:00.234 EDT [4299] LOG:  duration: 0.437 ms  parse foo: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM genes AS g0 WHERE (g0."genome_id" = 1)
2020-03-11 13:50:00.235 EDT [4299] LOG:  duration: 0.489 ms  bind foo: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM genes AS g0 WHERE (g0."genome_id" = 1)
2020-03-11 13:50:00.342 EDT [4299] LOG:  duration: 106.874 ms  execute foo: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM genes AS g0 WHERE (g0."genome_id" = 1)
2020-03-11 13:50:00.342 EDT [4299] LOG:  duration: 106.861 ms  plan:
Query Text: SELECT g0."id", g0."genome_id", g0."seqid", g0."source", g0."feature", g0."start", g0."end", g0."score", g0."strand", g0."phase", g0."attribute" FROM genes AS g0 WHERE (g0."genome_id" = 1)
Seq Scan on public.genes g0  (cost=0.00..6591.62 rows=60632 width=224) (actual time=0.056..25.049 rows=60623 loops=1)
 Output: id, genome_id, seqid, source, feature, start, "end", score, strand, phase, attribute
 Filter: (g0.genome_id = 1)
 Rows Removed by Filter: 3907
 Buffers: shared hit=74 read=5711
2020-03-11 13:50:00.345 EDT [4299] LOG:  disconnection: session time: 0:00:00.125 user=testuser database=biodata host=::1 port=50137

Re: Slow ext'd query via client native implementation vs. libpq &simple psql

From
Justin Pryzby
Date:
On Wed, Mar 11, 2020 at 03:31:48PM -0400, Richard Michael wrote:
> The query is trivial: `SELECT [cols] FROM t WHERE id = X` on a 65K row
> table.

> The Elixir client executes this as an extended query in >500-700ms, very
> slow.
> If relevant, the client does not use libpq, it is a native implementation.

> A simple query via psql `PREPARE q AS ... ; EXECUTE q;` executes in ~130ms.
> (Aside, please let me know if psql can execute extended queries.)
> To compare another extended query protocol client, I wrote a tiny C program
> using libpq PQprepare()/PQexecPrepared() executes in ~100ms.

psql can't do it, but pygres can do it since last year.

> Would wireshark on client messages reveal anything postgres can't log?

I think maybe, like how many round trips there are.  
Does the difference between the clients scale linearly with the number of rows
returned?

> Elixir/postgrex extended query: (always >500ms)
> -------------------------------

> 2020-03-11 13:46:20.658 EDT [3401] LOG:  duration: 513.792 ms  plan:
...
> Seq Scan on public.genes g0  (cost=0.00..6591.62 rows=60632 width=224)
> (actual time=0.523..33.229 rows=60623 loops=1)

You can see the "actual time" is low (that's postgres running the query), so
the rest seems to comes from something like sending data back to the client or
the client parsing the results.

-- 
Justin



Richard Michael <rmichael@edgeofthenet.org> writes:
> On PG 12.2, I am analyzing a performance problem when using a client
> (Elixir/postgrex) querying via the extended query protocol.  I am comparing
> with
> psql and a C program.  Logs for all three follow this short explanation.

Hmm, your auto-explain log entries show all three queries having
server-side execution times of a couple dozen msec.  It seems like
the excess time must be involved in transmitting the data to the
client.  So I guess I'd be looking at whether the client is really
slow at absorbing data for some reason.

IIRC, auto-explain's 'actual time' for the top-level query node does not
count the time to format data or transmit it to the client.  Still, we
have an upper bound of ~80 msec for that to happen with the libpq client,
and there's no obvious reason why it'd be different for the other client.

[ thinks for a bit... ]  You might double check that those two clients
are using the same client_encoding setting.  400ms doing encoding
conversion seems excessive, but there aren't that many other possibilities
for the I/O time to be different.

            regards, tom lane



Re: Slow ext'd query via client native implementation vs. libpq &simple psql

From
Justin Pryzby
Date:
On Thu, Mar 12, 2020 at 04:27:54PM -0400, Richard Michael wrote:
> > psql can't do it, but pygres can do it since last year.
> 
> Thank you for mentioning it.  Do you mean this:
> https://github.com/rogamba/pygres ?

Ugh, no.  I'm referring to PyGreSQL, which was at one point a part of the
postgres source tree.
https://www.pygresql.org/

$ python -c "import pg; print(pg.DB('postgres').query('SELECT 1').getresult())"
[(1,)]

$ python -c "import pg; p=pg.DB('postgres'); p.prepare('q', 'SELECT 1'); print(p.query_prepared('q').getresult())"
[(1,)]

-- 
Justin

(I'm having trouble believe someone made a project called "pygres", which uses
python, postgres, and psycopg...  Will mail D'Arcy)