Re: Mysterious performance degradation in exceptional cases - Mailing list pgsql-general

From Matthias Apitz
Subject Re: Mysterious performance degradation in exceptional cases
Date
Msg-id YyK5PTCoSFXUzOs5@c720-r368166
Whole thread Raw
In response to Re: Mysterious performance degradation in exceptional cases  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Mysterious performance degradation in exceptional cases
List pgsql-general
El día miércoles, septiembre 14, 2022 a las 07:19:31a. m. -0700, Adrian Klaver escribió:

> On 9/14/22 01:31, Matthias Apitz wrote:
> > 
> > We have a C-written application server which uses ESQL/C on top
> > of PostgreSQL 13.1 on Linux. The application in question always serves
> > the same search in a librarian database, given to the server
> > as commands over the network, login into the application and doing
> > a search:
> > 
> > SLNPServerInit
> > User:zfl
> > SLNPEndCommand
> > 
> > SLNPSearch
> > HitListName:Zfernleihe
> > Search:1000=472214284
> > SLNPEndCommand
> > 
> > To fulfill the search, the application server has to do some 100
> > ESQL/C calls and all this should not take longer than 1-2 seconds, and
> > normally it does not take longer. But, in some situations it takes
> > longer than 180 seconds, in 10% of the cases. The other 90% are below 2 seconds,
> > i.e. this is digital: Or 2 seconds, or more than 180 seconds, no values between.
> > 
> > We can easily simulate the above with a small shell script just sending over
> > the above two commands with 'netcat' and throwing away its result (the real search is
> > done by an inter library loan software which has an timeout of 180 seconds
> > to wait for the SLNPSearch search result -- that's why we got to know
> > about the problem at all, because all this is running automagically with
> > no user dialogs). The idea of the simulated search was to get to know
> > with the ESQL/C log files which operation takes so long and why.
> 
> Does the test search run the inter library loan software?

The real picture is:

  ILL-software --(network, search command)---> app-server --(ESQL/C)--> PostgreSQL-server
  test search  --(localhost, search command)-> app-server --(ESQL/C)--> PostgreSQL-server

> > Well, since some day, primary to catch the situation, we send over every
> > 10 seconds this simulated searches and since then the problem went away at all.
> 
> To be clear the problem went away for the real search?

Yes, since the 'test search' runs every 10 seconds, the above pictured
'ILL-software', doing the same search, does not face the problem anymore.

> 
> Where is the inter library software, in your application or are you reaching
> out to another application?

The above 'app-server' fulfills the search requested by the
'ILL-software' (or the 'test search'), i.e. looks up for one single
librarian record (one row in the PostgreSQL database) and delivers
it to the 'ILL-software'. The request from the 'ILL-software' is not
a heavy duty, more or less 50 requests per day.

> Is the search running across a remote network?

The real search comes over the network through a stunnel. But we
watched with tcpdump the incoming search and the response by the
'app-server' locally. In the case of the timeout, the 'app-server' does not
answer within 180 seconds, i.e. does not send anything into the stunnel,
and the remote 'ILL-software' terminates the connection with an F-packet.

I will now:

- shutdown the test search every 10 secs to see if the problem re-appears
- set 'log_autovacuum_min_duration = 0' in postgresql.conf to see if
  the times of the problem matches;

Thanks for your feedback in any case.

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Is it possible to stop sessions killing eachother when they all authorize as the same role?
Next
From: Laurenz Albe
Date:
Subject: Re: Re[2]: CVE-2022-2625