Re: Bgwriter strategies - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bgwriter strategies
Date
Msg-id 4694AA71.7040701@enterprisedb.com
Whole thread Raw
In response to Re: Bgwriter strategies  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Bgwriter strategies  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
List pgsql-hackers
In the last couple of days, I've been running a lot of DBT-2 tests and
smaller microbenchmarks with different bgwriter settings and
experimental patches, but I have not been able to produce a repeatable
test case where any of the bgwriter configurations perform better than
not having bgwriter at all.

I encountered a strange phenomenon that I don't understand. I ran a
small test case with DELETEs in random order, using an index, on a table
~300MB table, with shared_buffers smaller than that. I expected that to
be dominated by the speed postgres can swap pages in and out of the
shared buffer cache, but surprisingly the test starts to block on the
write I/O, even though the table fits completely in OS cache. I was able
to reproduce the phenomenon with a simple C program that writes 8k
blocks in random order to a fixed size file. I've attached it along with
output of running it on my test server. The output shows how the writes
start to periodically block after a while. I was able to reproduce the
problem on my laptop as well. Can anyone explain what's going on?

Anyone out there have a repeatable test case where bgwriter helps?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <time.h>

int main(int argc, char **argv)
{
  int fd;
  off_t len;
  char buf[8192];
  int i;
  int size;
  struct timeval begin_t;

  if (argc != 3)
  {
    printf("Usage: writetest <filename> <size in MB>\n");
    exit(1);
  }

  fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, S_IWUSR | S_IRUSR);
  if (fd == -1)
  {
    perror(NULL);
    exit(1);
  }
  size = atoi(argv[2]) * 1024 * 1024;

  for(i=0; i < size;)
    i += write(fd, buf, sizeof(buf));

  len = i;

  fsync(fd);

  gettimeofday(&begin_t, NULL);
  for(i = 0; i < 10000000; i++)
  {
    lseek(fd, ((random() % (len / sizeof(buf)))) * sizeof(buf), SEEK_SET);
    write(fd, buf, sizeof(buf));
    if(i % 40000 == 0)
    {
      struct timeval t;
      long msecs;

      gettimeofday(&t, NULL);
      msecs = (t.tv_sec - begin_t.tv_sec) * 1000 +(t.tv_usec - begin_t.tv_usec) / 1000;
      printf("%d blocks written, time=%ld ms\n", i, msecs);
      begin_t = t;
    }
  }
}
./writetest /mnt/data/writetest-data 80
0 blocks written, time=0 ms
40000 blocks written, time=251 ms
80000 blocks written, time=241 ms
120000 blocks written, time=241 ms
160000 blocks written, time=241 ms
200000 blocks written, time=242 ms
240000 blocks written, time=242 ms
280000 blocks written, time=241 ms
320000 blocks written, time=241 ms
360000 blocks written, time=242 ms
400000 blocks written, time=241 ms
440000 blocks written, time=241 ms
480000 blocks written, time=241 ms
520000 blocks written, time=242 ms
560000 blocks written, time=241 ms
600000 blocks written, time=241 ms
640000 blocks written, time=242 ms
680000 blocks written, time=242 ms
720000 blocks written, time=242 ms
760000 blocks written, time=241 ms
800000 blocks written, time=242 ms
840000 blocks written, time=4579 ms
880000 blocks written, time=244 ms
920000 blocks written, time=242 ms
960000 blocks written, time=4752 ms
1000000 blocks written, time=241 ms
1040000 blocks written, time=4618 ms
1080000 blocks written, time=242 ms
1120000 blocks written, time=4614 ms
1160000 blocks written, time=246 ms
1200000 blocks written, time=243 ms
1240000 blocks written, time=4619 ms
1280000 blocks written, time=242 ms
1320000 blocks written, time=242 ms
1360000 blocks written, time=4605 ms
1400000 blocks written, time=242 ms


pgsql-hackers by date:

Previous
From: tomas@tuxteam.de
Date:
Subject: Re: 2PC-induced lockup
Next
From: "Pavan Deolasee"
Date:
Subject: Re: Bgwriter strategies