Re: Remaining 'needs review' patchs in July commitfest - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Remaining 'needs review' patchs in July commitfest
Date
Msg-id alpine.DEB.2.10.1507291537320.27976@sto
Whole thread Raw
In response to Remaining 'needs review' patchs in July commitfest  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Hello Heikki,

About two patches I submitted:


>> pgbench - allow backslash-continuations in custom scripts
>
> Everyone wants the feature, using multi-line SELECTs in pgbench scripts, 
> but we don't seem to be reaching a consensus on how it should work. I 
> think we'll need to integrate the lexer, but it would be nice to still 
> support multi-statements as well, with some syntax.

I was willing to do a small job of that one, but it drifted to solving the 
meta problem of pgbench lexing for doing it "the right way".  Kyotaro 
HORIGUCHI has taken up the challenge and submitted a 3-part patch to 
modify psql lexer, then reuse it "as is" in pgbench.  I'm impressed:-)

I think that the approach is a little overkill for the simple targetted 
feature, and I'm not sure that it allows for multi-statements, but I have 
not checked.  Anyway I may try to review it, although not in the short 
term.  I'm not too optimistic because if passed, it means that the psql 
lexer would be used in 2 contexts, so changes in any of them would require 
ensuring that it does not break anything in the other one, and I'm not 
sure that it is such constraint is desirable.  Duplicating the lexer is 
not a realistic option either, I do not think that pgbench is worth it.

Anyway, in the mean time, the patch may be switched to "returned with 
feedback" or "rejected".



>> checkpoint continuous flushing
>
> This does a big memory allocation at checkpoint, which Tom vehemently 
> objects to. I don't much like it either, although I would be OK with a 
> more moderately-sized allocation.

AFAICS Tom has not expressed any view on the patch in its current form 
(there is no message from Tom on the thread).


ISTM that Tom complained about a year ago about OOM risks on a 2007 
version of the sorting part by Takahiro ITAGAKI which was dynamically 
allocating and freeing about 24 bytes per buffers on each checkpoint.

The current v5 version uses 4 bytes per buffer at the first run and reuse 
the memory, it is allocated once and thus there is no risk of returning it 
and failing to get it back, so no significant OOM risk. Maybe the 
allocation should be moved earlier when starting the checkpointer process, 
though.

A second objection a year ago from Tom was about proof of performance 
gains. I've spent quite some time to collect a lot of data to measure 
benefits under different loads, representing X00 hours of pgbench runs, as 
can be seen in the thread. ISTM that the performance (tps & latency) 
figures, for instance:

http://www.postgresql.org/message-id/raw/alpine.DEB.2.10.1506170803210.9794@sto

are overwhelmingly in favor of the patch.


If the memory requirement of 4 bytes per buffer is still too much, it is 
eventually possible to reduce it with a guc to specify the allowed amount 
of memory and some shuffling in the code to do things by chunk.

My opinion is that 4 bytes per buffer is reasonable enough given the 
measured benefits. Also, there is more benefits if the whole checkpoint is 
sorted instead of just part of it.

I'm really willing to improve the write stall issues which freezes 
postgresql when checkpointing on HDD, so if it has to be chunked because 
this is a blocker I'll make the effort, but I do not think that it is 
useful as such.


> It's not clear on what criteria this should be accepted or rejected.

Given the overall performance gains and reduction in latency, where a lot 
of write stalls are avoided or at least greatly reduced, I would be sad 
for pg to get it rejected. That does not mean that it cannot be improved.

> What workloads need to be tested?

If you tell me, and provide the matching dedicated host for testing, I can 
run tests...

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: dblink: add polymorphic functions.
Next
From: Robert Haas
Date:
Subject: Re: security labels on databases are bad for dump & restore