Re: [HACKERS] [WIP] Zipfian distribution in pgbench - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: [HACKERS] [WIP] Zipfian distribution in pgbench |
Date | |
Msg-id | alpine.DEB.2.20.1708240433560.24526@lancre Whole thread Raw |
In response to | [HACKERS] [WIP] Zipfian distribution in pgbench (Alik Khilazhev <a.khilazhev@postgrespro.ru>) |
Responses |
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
|
List | pgsql-hackers |
Hello Alik, > I am attaching patch v7. Patch generates multiple warnings with "git apply", apparently because of end-of-line spaces, and fails: pgbench-zipf-07v.patch:52: trailing whitespace. { pgbench-zipf-07v.patch:53: trailing whitespace. "random_zipfian", 3, PGBENCH_RANDOM_ZIPFIAN pgbench-zipf-07v.patch:54: trailing whitespace. }, pgbench-zipf-07v.patch:66:trailing whitespace. #define ZIPF_CACHE_SIZE 15 /* cache cells number */ pgbench-zipf-07v.patch:67:trailing whitespace. error: patch failed: doc/src/sgml/ref/pgbench.sgml:1013 error: doc/src/sgml/ref/pgbench.sgml: patch does not apply error:patch failed: src/bin/pgbench/exprparse.y:191 error: src/bin/pgbench/exprparse.y: patch does not apply error: patchfailed: src/bin/pgbench/pgbench.c:93 error: src/bin/pgbench/pgbench.c: patch does not apply error: patch failed: src/bin/pgbench/pgbench.h:75 error: src/bin/pgbench/pgbench.h: patch does not apply It also complains with "patch", although it seems to work: (Stripping trailing CRs from patch; use --binary to disable.) patching file doc/src/sgml/ref/pgbench.sgml (Stripping trailingCRs from patch; use --binary to disable.) patching file src/bin/pgbench/exprparse.y (Stripping trailing CRs frompatch; use --binary to disable.) patching file src/bin/pgbench/pgbench.c (Stripping trailing CRs from patch; use --binaryto disable.) patching file src/bin/pgbench/pgbench.h patch unexpectedly ends in middle of line patch unexpectedlyends in middle of line Please do not put spaces at the end of lines, eg there is a lonely tab on the second line of "computeHarmonicZipfian". It seems that "CR" characters are used: sh> sha1sum ~/pgbench-zipf-07v.patch 439ad1de0a960b3b415ff6de9412b3cc4d6922e7 sh> file ~/pgbench-zipf-07v.patch pgbench-zipf-07v.patch: unified diff output, ASCII text, with CRLF line terminators Compilation issues one warning: pgbench.c: In function ‘computeHarmonicZipfian’: pgbench.c:894:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] double uniform = pg_erand48(thread->random_state); Please conform to pg standard for portability, even if it is a very old one:-) About the documentation: I would suggest to replace 0.5 in the first example by 1.5, as a zipfian distribution with a lower-than-one parameter is not that well defined. I'm fine with using references to papers or books for the algorithm. The documentation lines in the sgml file are too long. At least limit text paragraphs to maybe 80 columns as done elsewhere. typo: "<entry> Zipfian" (remove space) typo: "used(described" (missing space) typo: "numbers(algorithm" (again) The sentence "It's recommended to pick <replaceable>parameter</> in range (0, 1) for better accuracy." does not make sense with the updated generation algorithm. The text would benefit from a review by a native English speaker, who I am not. Anyway here is an attempt at improving/simplifying the text (I have removed sgml formatting). If some native English speaker could review it, that would be great. """ "random_zipfian" generates an approximated bounded zipfian distribution. For parameter in (0,1), an approximated algorithmis taken from "Quickly Generating Billion-Record Synthetic Databases", Jim Gray et al, SIGMOD 1994. For parameterin (1, 1000), a rejection method is used, based on "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,Springer 1986. The distribution is not defined when the parameter's value is 1.0. The drawing performance is poorfor parameter values close and above 1.0 and on a small range. "parameter" defines how skewed the distribution is. The larger the "parameter", the more frequently values close to thebeginning of the interval are drawn. The closer to 0 "parameter" is, the flatter (more uniform) the access distribution. """ English in comments and code: "inited" is not English, I think you want "initialized". Maybe "nb_cells" would do. "it is not exist" (multiple instances) -> "it does not exist". I think that the references to the paper/book should appear as comments in the code, for instance in front of the functions which implement the algorithm. "current_time", "last_touch_time" are counter based, they are not "time". I suggest to update the name to "current" and "last_touch" or "last_used". "time when cell was used last time" -> "last used logical time" About the code: I would prefer that you use "1.0" instead of "1." for double constants. I think that getZipfianRand should be symmetric. Maybe a direct formula could do: return min - 1 + (s > 1) ? xxx() : yyy(); or at least use an explicit "else" for symmetry. Function "zipfn" asks for s and n as arguments, but ISTM that they are already include as fields in cell, so this seems redundant. Moreover I do not think that the zipfn function is that useful. I would suggest to inline it in its caller. "zipfGeneralizedHarmonic" is not really related to zipf, it is just used there. I suggest to call it "generalizedHarmonicNumber" to reflect what it does. I suggest to put all LRU management in the getCell* function, that is to move the last_use update from computeHarmonicZipfian to getCell. computeHarminicZipfian and computeIterativeZipfian should have the same signature, not one with a random state and the other with the thread. I suggest to do the same as other random functions, whatever it is. Otherwise, no action required: The patch probably conflicst with other patches in the CF, which means that there will be rebase needed. That is life. The queue is long and slow. Also, this function deserve some tap tests, that should be added if the "pgbench tap test" patch get through. Otherwise it is as well tested as everything else around, which is basically no test. -- Fabien.
pgsql-hackers by date: