Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: checkpointer continuous flushing
Date
Msg-id CAA4eK1KV_ts-CBbTtSeDnc5OPXgXM9C0AyLuXGZ+eRyw=LTevA@mail.gmail.com
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Mon, Jun 22, 2015 at 1:41 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> <sorry, resent stalled post, wrong from>
>
>> It'd be interesting to see numbers for tiny, without the overly small
>> checkpoint timeout value. 30s is below the OS's writeback time.
>
>
> Here are some tests with longer timeout:
>
> tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
>          max_wal_size=1GB warmup=600 time=4000
>
>   flsh |      full speed tps      | percent of late tx, 4 clients, for tps:
>   /srt |  1 client  |  4 clients  |  100 |  200 |  400 |  800 | 1200 | 1600
>   N/N  | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 | 2.38
>   N/Y  | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 | 1.67
>   Y/N  | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 | 1.26
>   Y/Y  | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 | 1.13
>
> There seems to be a small 1-2% performance benefit with 4 clients, this is reversed for 1 client, there are significantly and consistently less late transactions when options are activated, the performance is more stable
> (standard deviation reduced by 10-18%).
>
> The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times over in 5 minutes, so the checkpoint basically writes everything in 220 seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less in order in memory, so may be written out in order anyway.
>
>
> medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
>           max_wal_size=4GB warmup=1200 time=7500
>
>   flsh |      full speed tps       | percent of late tx, 4 clients
>   /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
>    N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
>    N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
>    Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
>    Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |
>
> The effect of sorting is very positive (+150% to 270% tps). On this run, flushing has a positive (+20% with 1 client) or negative (-8 % with 4 clients) on throughput, and late transactions are reduced by 92-95% when both options are activated.
>

Why there is dip in performance with multiple clients, can it be
due to reason that we started doing more stuff after holding bufhdr
lock in below code?

BufferSync()
{
..
for (buf_id = 0; buf_id < NBuffers; buf_id++)
  {
  volatile BufferDesc *bufHdr = GetBufferDescriptor(buf_id);
@@ -1621,32 +1719,185 @@ BufferSync(int flags)
 
  if ((bufHdr->flags & mask) == mask)
  {
+ Oid spc;
+ TableSpaceCountEntry * entry;
+ bool found;
+
  bufHdr->flags |= BM_CHECKPOINT_NEEDED;
+ CheckpointBufferIds[num_to_write] = buf_id;
  num_to_write++;
+
+ /* keep track of per tablespace buffers */
+ spc = bufHdr->tag.rnode.spcNode;
+ entry = (TableSpaceCountEntry *)
+ hash_search(spcBuffers, (void *) &spc, HASH_ENTER, &found);
+
+ if (found) entry->count++;
+ else entry->count = 1;
  }
..
}


-
BufferSync()
{
..
- buf_id = StrategySyncStart(NULL, NULL);
- num_to_scan = NBuffers;
+ active_spaces = nb_spaces;
+ space = 0;
  num_written = 0;
- while (num_to_scan-- > 0)
+
+ while (active_spaces != 0)
..
}

The changed code doesn't seems to give any consideration to
clock-sweep point which might not be helpful for cases when checkpoint
could have flushed soon-to-be-recycled buffers.  I think flushing the
sorted buffers w.r.t tablespaces is a good idea, but not giving any
preference to clock-sweep point seems to me that we would loose in
some cases by this new change.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Next
From: Michael Paquier
Date:
Subject: A couple of newlines missing in pg_rewind log entries