Thread: memory leak in libpq , definitely lost: 200 bytes in 1 blocks, indirectly lost: 2,048 bytes in 1 blocks ...

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <unistd.h>
#include <limits.h>
#include <fcntl.h>
#include <termios.h>
#include <time.h>
#include <stdbool.h>

#include <libpq-fe.h>

// gcc -o db_test.o -c -I/usr/include/postgresql db_test.c
// gcc -o db_test db_test.o -lpq

//-------------------------------------------------------------------------------------------------------------------------------//
int main(void)
{
 PGconn *pg_conn = NULL;
 PGresult *res = NULL;
 ConnStatusType status = CONNECTION_BAD;
 
 const char *conninfo = "host = 192.168.0.20 port = 20000 dbname = dbtest user = postgres password = postgres application_name = app-test";

  if((pg_conn = PQconnectdb(conninfo)) != NULL)
  {
   fprintf(stdout, "\n -CP00- \n");
   if(PQstatus(pg_conn) != CONNECTION_OK)
   {
  fprintf(stdout, "\n -CP01-[%d]- \n", PQstatus(pg_conn)); 
 
    do
    {
  sleep(2); 
     PQreset(pg_conn); 
  status = PQstatus(pg_conn); 
  fprintf(stdout, "\n -CP02-[%d]- \n", status);
    }
    while(status != CONNECTION_OK);
   }

   fprintf(stdout, "\n -CP03-[%d]- \n", PQstatus(pg_conn));
  
   res = PQexec(pg_conn, "SELECT 1 AS a");
   PQclear(res); res = NULL;

  
   
   PQfinish(pg_conn); pg_conn = NULL;
  }

 
 return(0);
}
//-------------------------------------------------------------------------------------------------------------------------------//
 


 
 STEP_1_ON-THE-APP-SERVER - compile the application

  gcc -o db_test.o -c -I/usr/include/postgresql db_test.c
  gcc -o db_test db_test.o -lpq

{  
  STEP_2_ON-THE-DB-SERVER - stop PostgreSQL

  sudo service postgresql stop

  STEP_3_ON-THE-APP-SERVER - run the application

  valgrind --leak-check=full --show-reachable=yes --leak-resolution=high -v ./db_test > ./log.log 2>&1

  STEP_4_ON-THE-DB-SERVER - wait [N] seconds (N is a random chosen number between 10-3000) and start PostgreSQL 

} Repeat several times to get the leak


Valgrind options:

--22352--    --leak-check=full
--22352--    --show-reachable=yes
--22352--    --leak-resolution=high
--22352--    -v
--22352-- Contents of /proc/version:
--22352--   Linux version 4.4.0-36-generic (buildd@lcy01-01) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016
--22352-- 


==18777== 2,048 bytes in 1 blocks are indirectly lost in loss record 282 of 290
==18777==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18777==    by 0x4E4A1C9: ??? (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E4A87C: ??? (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E541A4: ??? (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E48396: PQconnectPoll (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E488ED: ??? (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E495C7: PQreset (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4009FB: main (in /home/puffy/bkserver/pg_db_test_bug00/db_test)

==18777== 2,248 (200 direct, 2,048 indirect) bytes in 1 blocks are definitely lost in loss record 283 of 290
==18777==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18777==    by 0x4E4A5DC: PQmakeEmptyPGresult (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E54161: ??? (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E48396: PQconnectPoll (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E488ED: ??? (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4E495C7: PQreset (in /usr/lib/x86_64-linux-gnu/libpq.so.5.8)
==18777==    by 0x4009FB: main (in /home/puffy/bkserver/pg_db_test_bug00/db_test)

==18777== LEAK SUMMARY:
==18777==    definitely lost: 200 bytes in 1 blocks
==18777==    indirectly lost: 2,048 bytes in 1 blocks
==18777==      possibly lost: 0 bytes in 0 blocks
==18777==    still reachable: 93,928 bytes in 2,899 blocks
==18777==         suppressed: 0 bytes in 0 blocks
==18777== 
==18777== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==18777== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Attachment
On 10/10/2016 12:38 PM, Simplex wrote:
>  STEP_1_ON-THE-APP-SERVER - compile the application
>
>   gcc -o db_test.o -c -I/usr/include/postgresql db_test.c
>   gcc -o db_test db_test.o -lpq
>
> {
>   STEP_2_ON-THE-DB-SERVER - stop PostgreSQL
>
>   sudo service postgresql stop
>
>   STEP_3_ON-THE-APP-SERVER - run the application
>
>   valgrind --leak-check=full --show-reachable=yes --leak-resolution=high -v
> ./db_test > ./log.log 2>&1
>
>   STEP_4_ON-THE-DB-SERVER - wait [N] seconds (N is a random chosen number
> between 10-3000) and start PostgreSQL
>
> } Repeat several times to get the leak

I was not able to reproduce this. Can you help to analyze this in more
detail, please? Which version of PostgreSQL are you using? Can you load
debug symbols or compile from source, to get a stack trace with symbol
names? How often do you see the error? Do you have a theory of how the
leak occurs?

- Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 10/10/2016 12:38 PM, Simplex wrote:
>> } Repeat several times to get the leak

> I was not able to reproduce this.

I was able to reproduce it, or at least something that looks similar,
after a good deal of fooling around.  I needed:

* SSL turned on
* server recovering from crash, so that it rejects at least one connection
  attempt with "the database system is starting up"

This causes the successful PQreset call to leave an async result behind
containing the earlier connection rejection message.  That might or might
not be a logic bug in PQreset, but it doesn't really seem all that wrong.
But then the PQexec calls PQsendQueryStart which cavalierly resets
conn->result to NULL, leaking the async result.  I fixed PQsendQueryStart
to use pqClearAsyncResult which is less cavalier, and then I couldn't
reproduce the leak anymore.

Interestingly, I couldn't make it happen (ie get to PQsendQueryStart with
a non-null result) without valgrind active on the client --- so there's
some timing considerations involved too, apparently.

            regards, tom lane