Re: [HACKERS] Another nasty cache problem - Mailing list pgsql-hackers

From Patrick Welche
Subject Re: [HACKERS] Another nasty cache problem
Date
Msg-id 20000203112434.B1509@quartz.newn.cam.ac.uk
Whole thread Raw
In response to Re: [HACKERS] Another nasty cache problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Another nasty cache problem
Re: [HACKERS] Another nasty cache problem
Re: [HACKERS] Another nasty cache problem
List pgsql-hackers
On Mon, Jan 31, 2000 at 09:02:30PM -0500, Tom Lane wrote:
> Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> > Tom Lane wrote:
> 
> There are cache-flush-related bugs still left to deal with, but they
> seem to be far lower in probability than the ones squashed so far.
> I'm finding that even with MAXNUMMESSAGES set to 8, the parallel tests
> usually pass; so it seems we need some other way of testing to nail down
> the remaining problems.
> 
> > I also tried that nonsensical join from the other day, and it failed in
> > the same way again:
> > newnham=# select * from crsids,"tblPerson" where
> > newnham-# crsids.crsid != "tblPerson"."CRSID";
> > Backend sent B message without prior T
> 
> Hmm.  Can you provide a self-contained test case (a script to build the
> failing tables, preferably)?

It seems this is a memory exhaustion thing: I have 128Mb real memory.
Attached below is the C program used to create some random data in
tables test and test2 of database test (which needs to exist). Executing
the non-sensical query
 select * from test,test2 where test.i!=test2.i;

should result in 2600*599=1557400 (ie lots of) rows to be returned.
The process's memory consumption during this select grows to 128Mb, and after
a moment or two:

Backend sent D message without prior T
Backend sent D message without prior T
...

Which isn't quite the same message as before, but is of the same type.
59 processes:  2 running, 57 sleeping
CPU states:  2.3% user, 86.4% nice,  9.3% system,  0.0% interrupt,  1.9% idle
Memory: 74M Act, 37M Inact, 184K Wired, 364K Free, 95M Swap, 262M Swap free
 PID USERNAME PRI NICE   SIZE   RES STATE   TIME   WCPU    CPU COMMAND1547 prlw1     50    0   128M  516K run     1:28
59.28%59.28% psql1552 postgres  50    0  1920K  632K run     1:37 24.32% 24.32% postgres
 

later, while the "Backend sent..." messages appear
1547 prlw1     -5    0   128M   68M sleep   1:41 23.00% 23.00% psql1552 postgres   2    0  1920K    4K sleep   1:41
141.00% 6.88% <postgres>
 

Note that there is still plenty of swap space. The 128Mb number seems to be
more than a coincidence (how to prove?)

So, is this only happening to me? How can lack of real memory affect timing
of interprocess communication?

Cheers,

Patrick

==========================================================================

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

#include "libpq-fe.h"

const char *progname;

PGresult *send_query(PGconn *db, const char *query)
{ PGresult *res;
 res=PQexec(db,query); switch(PQresultStatus(res))   {     case PGRES_EMPTY_QUERY:       printf("PGRES_EMPTY_QUERY:
%s\n",query);      break;     case PGRES_COMMAND_OK:       printf("PGRES_COMMAND_OK: %s\n",query);       break;
casePGRES_TUPLES_OK:       printf("PGRES_TUPLES_OK: %s\n",query);       break;     case PGRES_COPY_OUT:
printf("PGRES_COPY_OUT:%s\n",query);       break;     case PGRES_COPY_IN:       printf("PGRES_COPY_IN: %s\n",query);
  break;     case PGRES_BAD_RESPONSE:       printf("PGRES_BAD_RESPONSE: %s\n",query);       exit(1);       break;
casePGRES_NONFATAL_ERROR:       printf("PGRES_NONFATAL_ERROR: %s\n",query);       break;     case PGRES_FATAL_ERROR:
  printf("PGRES_FATAL_ERROR: %s\n",query);       exit(1);       break;     default:       fprintf(stderr,"Error from
%s:Unknown response from "\         "PQresultStatus()\n",progname);       exit(1);       break;   }
 
 return res;
}

char get_letter(void)
{ int c;
 do c=(int)random()%128; while(!(isascii(c)&&isalpha(c)));
 return (char)tolower(c);
}

unsigned int get_num(void)
{ return random()%100;
}

int main(int argc, char* argv[])
{ char id[7],query[2048]; int i; PGconn *db; PGresult *res;
 progname=argv[0];
 srandom(42); /* same data each time hopefully */
 db=PQconnectdb("dbname=test"); if(PQstatus(db)==CONNECTION_BAD)   {     fprintf(stderr,"Error from %s: Unable to
connectto database \"test\".\n",       progname);     exit(1);   }
 
 res=send_query(db,"create table test (txt text,var varchar(7),i integer)"); PQclear(res); res=send_query(db,"create
tabletest2(txt text,var varchar(7),i integer)"); PQclear(res);
 
 for(i=1;i<=2600;++i)   {     sprintf(id,"%c%c%c%c%03u",get_letter(),get_letter(),get_letter(),
get_letter(),get_num());
     sprintf(query,"insert into test values ('%s','%s','%i')",id,id,i);     res=send_query(db,query);     PQclear(res);
 }
 
 for(i=1;i<=600;++i)   {     sprintf(id,"%c%c%c%c%03u",get_letter(),get_letter(),get_letter(),
get_letter(),get_num());
     sprintf(query,"insert into test2 values ('%s','%s','%i')",id,id,i);     res=send_query(db,query);
PQclear(res);  }
 
 PQfinish(db);
 return 0;
}


pgsql-hackers by date:

Previous
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] SELECT FOR UPDATE leaks relation refcounts
Next
From: Patrick Welche
Date:
Subject: Re: [HACKERS] Another nasty cache problem