Re: Remaining 'needs review' patchs in July commitfest - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: Remaining 'needs review' patchs in July commitfest |
Date | |
Msg-id | alpine.DEB.2.10.1507291537320.27976@sto Whole thread Raw |
In response to | Remaining 'needs review' patchs in July commitfest (Heikki Linnakangas <hlinnaka@iki.fi>) |
List | pgsql-hackers |
Hello Heikki, About two patches I submitted: >> pgbench - allow backslash-continuations in custom scripts > > Everyone wants the feature, using multi-line SELECTs in pgbench scripts, > but we don't seem to be reaching a consensus on how it should work. I > think we'll need to integrate the lexer, but it would be nice to still > support multi-statements as well, with some syntax. I was willing to do a small job of that one, but it drifted to solving the meta problem of pgbench lexing for doing it "the right way". Kyotaro HORIGUCHI has taken up the challenge and submitted a 3-part patch to modify psql lexer, then reuse it "as is" in pgbench. I'm impressed:-) I think that the approach is a little overkill for the simple targetted feature, and I'm not sure that it allows for multi-statements, but I have not checked. Anyway I may try to review it, although not in the short term. I'm not too optimistic because if passed, it means that the psql lexer would be used in 2 contexts, so changes in any of them would require ensuring that it does not break anything in the other one, and I'm not sure that it is such constraint is desirable. Duplicating the lexer is not a realistic option either, I do not think that pgbench is worth it. Anyway, in the mean time, the patch may be switched to "returned with feedback" or "rejected". >> checkpoint continuous flushing > > This does a big memory allocation at checkpoint, which Tom vehemently > objects to. I don't much like it either, although I would be OK with a > more moderately-sized allocation. AFAICS Tom has not expressed any view on the patch in its current form (there is no message from Tom on the thread). ISTM that Tom complained about a year ago about OOM risks on a 2007 version of the sorting part by Takahiro ITAGAKI which was dynamically allocating and freeing about 24 bytes per buffers on each checkpoint. The current v5 version uses 4 bytes per buffer at the first run and reuse the memory, it is allocated once and thus there is no risk of returning it and failing to get it back, so no significant OOM risk. Maybe the allocation should be moved earlier when starting the checkpointer process, though. A second objection a year ago from Tom was about proof of performance gains. I've spent quite some time to collect a lot of data to measure benefits under different loads, representing X00 hours of pgbench runs, as can be seen in the thread. ISTM that the performance (tps & latency) figures, for instance: http://www.postgresql.org/message-id/raw/alpine.DEB.2.10.1506170803210.9794@sto are overwhelmingly in favor of the patch. If the memory requirement of 4 bytes per buffer is still too much, it is eventually possible to reduce it with a guc to specify the allowed amount of memory and some shuffling in the code to do things by chunk. My opinion is that 4 bytes per buffer is reasonable enough given the measured benefits. Also, there is more benefits if the whole checkpoint is sorted instead of just part of it. I'm really willing to improve the write stall issues which freezes postgresql when checkpointing on HDD, so if it has to be chunked because this is a blocker I'll make the effort, but I do not think that it is useful as such. > It's not clear on what criteria this should be accepted or rejected. Given the overall performance gains and reduction in latency, where a lot of write stalls are avoided or at least greatly reduced, I would be sad for pg to get it rejected. That does not mean that it cannot be improved. > What workloads need to be tested? If you tell me, and provide the matching dedicated host for testing, I can run tests... -- Fabien.
pgsql-hackers by date: