Re: patch review : Add ability to constrain backend temporary file space - Mailing list pgsql-hackers
From | Mark Kirkwood |
---|---|
Subject | Re: patch review : Add ability to constrain backend temporary file space |
Date | |
Msg-id | 4DE58782.2010001@catalyst.net.nz Whole thread Raw |
In response to | patch review : Add ability to constrain backend temporary file space (Cédric Villemain <cedric.villemain.debian@gmail.com>) |
Responses |
Re: patch review : Add ability to constrain backend temporary file
space
|
List | pgsql-hackers |
On 01/06/11 09:24, Cédric Villemain wrote: > Hello > > here is a partial review of your patch, better than keeping it > sleeping in the commitfest queue I hope. > > Submission review > ================ > > * The patch is not in context diff format. > * The patch apply, but contains some extra whitespace. > * Documentation is here but not explicit about 'temp tables', > maybe worth adding that this won't limit temporary table size ? > * There is no test provided. One can be expected to check that the > feature work. > > Code review > ========= > > * in fd.c, I think that "temporary_files_size -= > (double)vfdP->fileSize;" should be done later in the function once we > have successfully unlink the file, not before. > > * I am not sure it is better to add a fileSize like you did or use > relationgetnumberofblock() when file is about to be truncated or > unlinked, this way the seekPos should be enough to increase the global > counter. > > * temporary_files_size, I think it is better to have a number of pages > à la postgresql than a kilobyte size > > * max_temp_files_size, I'll prefer an approach like shared_buffers > GUC: you can use pages, or KB, MB, ... > > > Simple Feature test > ============== > > either explain buffers is wrong or the patch is wrong: > cedric=# explain (analyze,buffers) select * from foo order by 1 desc ; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------- > Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual > time=364.373..518.940 rows=100000 loops=1) > Sort Key: generate_series > Sort Method: external merge Disk: 1352kB > Buffers: local hit=393, temp read=249 written=249 > -> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4) > (actual time=0.025..138.754 rows=100000 loops=1) > Buffers: local hit=393 > Total runtime: 642.874 ms > (7 rows) > > cedric=# set max_temp_files_size to 1900; > SET > cedric=# explain (analyze,buffers) select * from foo order by 1 desc ; > ERROR: aborting due to exceeding max temp files size > STATEMENT: explain (analyze,buffers) select * from foo order by 1 desc ; > ERROR: aborting due to exceeding max temp files size > > Do you have some testing method I can apply to track that without > explain (analyze, buffers) before going to low-level monitoring ? > > Architecture review > ============== > > max_temp_files_size is used for the global space used per backend. > Based on how work_mem work I expect something like "work_disk" to > limit per file and maybe a backend_work_disk (and yes maybe a > backend_work_mem ?!) per backend. > So I propose to rename the current GUC to something like backend_work_disk. > > Patch is not large and easy to read. > I like the idea and it sounds useful. > Hi Cédric, Thanks for reviewing! The diff format is odd - I'm sure I told git to do a context diff, however running 'file' on the v2 patch results it it saying 'HTML document'... hmm, I'll check more carefully for the next one I do. Yeah, I agree about explicitly mentioning how it does not constraint temp tables. Hmm, test - good idea - I'll look into it. Re the code comments - I agree with most of them. However with respect to the Guc units, I copied the setup for work_mem as that seemed the most related. Also I used bytes for the internal variable in fd.c to limit the number of arithmetic operations while comparing. For testing I set 'log_temp_files = 0' in postgresql.conf and the numbers seemed to agree exactly - I didn't think to use EXPLAIN + buffers... not sure why they would disagree. Have a go with log_temp_files set and see what you find! I like yhour suggesting for the name. Given that 'work_mem' is per backend, I'm leaning towards the idea of 'work_disk' being sufficiently descriptive. Thanks again, and I'll come up with a v3 patch. Mark
pgsql-hackers by date: