Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: checkpointer continuous flushing
Date	June 23, 2015 07:15:46
Msg-id	CAA4eK1KV_ts-CBbTtSeDnc5OPXgXM9C0AyLuXGZ+eRyw=LTevA@mail.gmail.com Whole thread Raw
In response to	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
List	pgsql-hackers

Tree view

On Mon, Jun 22, 2015 at 1:41 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> <sorry, resent stalled post, wrong from>
>
>> It'd be interesting to see numbers for tiny, without the overly small
>> checkpoint timeout value. 30s is below the OS's writeback time.
>
>
> Here are some tests with longer timeout:
>
> tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
> max_wal_size=1GB warmup=600 time=4000
>
> flsh | full speed tps | percent of late tx, 4 clients, for tps:
> /srt | 1 client | 4 clients | 100 | 200 | 400 | 800 | 1200 | 1600
> N/N | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 | 2.38
> N/Y | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 | 1.67
> Y/N | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 | 1.26
> Y/Y | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 | 1.13
>
> There seems to be a small 1-2% performance benefit with 4 clients, this is reversed for 1 client, there are significantly and consistently less late transactions when options are activated, the performance is more stable
> (standard deviation reduced by 10-18%).
>
> The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times over in 5 minutes, so the checkpoint basically writes everything in 220 seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less in order in memory, so may be written out in order anyway.
>
>
> medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
> max_wal_size=4GB warmup=1200 time=7500
>
> flsh | full speed tps | percent of late tx, 4 clients
> /srt | 1 client | 4 clients | 100 | 200 | 400 |
> N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
> N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 |
> Y/N | 169 +- 166* | 187 +- 302* | 4.01 | 39.84 | 65.70 |
> Y/Y | 546 +- 143 | 681 +- 459 | 1.55 | 3.51 | 2.84 |
>
> The effect of sorting is very positive (+150% to 270% tps). On this run, flushing has a positive (+20% with 1 client) or negative (-8 % with 4 clients) on throughput, and late transactions are reduced by 92-95% when both options are activated.
>

Why there is dip in performance with multiple clients, can it be

due to reason that we started doing more stuff after holding bufhdr

lock in below code?

BufferSync()

{

for (buf_id = 0; buf_id < NBuffers; buf_id++)

{

volatile BufferDesc *bufHdr = GetBufferDescriptor(buf_id);

@@ -1621,32 +1719,185 @@ BufferSync(int flags)

if ((bufHdr->flags & mask) == mask)

{

+ Oid spc;

+ TableSpaceCountEntry * entry;

+ bool found;

bufHdr->flags |= BM_CHECKPOINT_NEEDED;

+ CheckpointBufferIds[num_to_write] = buf_id;

num_to_write++;

+ /* keep track of per tablespace buffers */

+ spc = bufHdr->tag.rnode.spcNode;

+ entry = (TableSpaceCountEntry *)

+ hash_search(spcBuffers, (void *) &spc, HASH_ENTER, &found);

+ if (found) entry->count++;

+ else entry->count = 1;

}

BufferSync()

{

- buf_id = StrategySyncStart(NULL, NULL);

- num_to_scan = NBuffers;

+ active_spaces = nb_spaces;

+ space = 0;

num_written = 0;

- while (num_to_scan-- > 0)

+ while (active_spaces != 0)

}

The changed code doesn't seems to give any consideration to

clock-sweep point which might not be helpful for cases when checkpoint

could have flushed soon-to-be-recycled buffers. I think flushing the

sorted buffers w.r.t tablespaces is a good idea, but not giving any

preference to clock-sweep point seems to me that we would loose in

some cases by this new change.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Tomas Vondra
Date: 23 June 2015, 06:45:24
Subject: Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H

From: Michael Paquier
Date: 23 June 2015, 07:40:07
Subject: A couple of newlines missing in pg_rewind log entries

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next