Thread: Improved scanner performance
I've been poking at the scanner a bit using the large literal test case from the other day http://archives.postgresql.org/pgsql-hackers/2002-04/msg00811.php I've been able to reduce the wall-clock run time of that test from 3:37 min to 2:13 min and the base_yylex() per-call time from 137ms to 24ms. I've used 'flex -8 -CFa' and restructured the code to avoid looping over and copying the input string half a dozen times. For instance, instead of scanstr(), the escape sequences are resolved as the input is scanned, and instead of the myinput() routine I use the function yy_scan_buffer() provided by flex for scanning in-memory strings. (This would make the code flex-dependent, but in reality it already is anyway.) The "before" profile was: % cumulative self self totaltime seconds seconds calls ms/call ms/call name23.51 6.65 6.65 110 60.45 137.27 base_yylex23.19 13.21 6.56 11 596.36 1089.69 pq_getstring19.16 18.63 5.42 74882482 0.00 0.00 pq_getbyte14.99 22.87 4.24 11 385.45 385.46 scanstr 9.61 25.59 2.72 23 118.26 118.26 yy_get_previous_state 3.78 26.66 1.07 34 31.47 31.47 myinput 3.64 27.69 1.03 22 46.82 46.82 textin 1.48 28.11 0.42 34 12.35 43.82 yy_get_next_buffer The "after" profile is: % cumulative self self totaltime seconds seconds calls ms/call ms/call name40.30 5.65 5.65 11 513.64 943.64 pq_getstring33.74 10.38 4.73 74882482 0.00 0.00 pq_getbyte18.90 13.03 2.65 110 24.09 24.09 base_yylex 6.85 13.99 0.96 22 43.64 43.64 textin 0.07 14.00 0.01 86 0.12 0.12 heap_fetch -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > I've used 'flex -8 -CFa' and restructured the code to avoid looping over > and copying the input string half a dozen times. For instance, instead of > scanstr(), the escape sequences are resolved as the input is scanned, and > instead of the myinput() routine I use the function yy_scan_buffer() > provided by flex for scanning in-memory strings. (This would make the > code flex-dependent, but in reality it already is anyway.) Yes, we've been requiring flex-only features for years, so that aspect of it doesn't bother me. Any downsides to the changes? (For instance, I had the idea that -CF would enlarge the lexer tables quite a bit --- what's the change in executable size?) > The "after" profile is: > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 40.30 5.65 5.65 11 513.64 943.64 pq_getstring > 33.74 10.38 4.73 74882482 0.00 0.00 pq_getbyte > 18.90 13.03 2.65 110 24.09 24.09 base_yylex > 6.85 13.99 0.96 22 43.64 43.64 textin > 0.07 14.00 0.01 86 0.12 0.12 heap_fetch Looks like inlining pq_getbyte into pq_getstring would be worth doing too. regards, tom lane
Tom Lane writes: > I had the idea that -CF would enlarge the lexer tables quite a bit --- > what's the change in executable size?) +150 kB I've also looked at -CFe, which is supposedly the next slowest level, but it doesn't do nearly as well. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> I had the idea that -CF would enlarge the lexer tables quite a bit --- >> what's the change in executable size?) > +150 kB > I've also looked at -CFe, which is supposedly the next slowest level, but > it doesn't do nearly as well. Ouch; that sounds like about a ten percent increase in the size of the backend executable. That's enough to reach my threshold of pain; is the long-literal issue worth that much? How much of your reported improvement is due to -CFa, and how much to the coding improvements you made? regards, tom lane
Tom Lane writes: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > >> I had the idea that -CF would enlarge the lexer tables quite a bit --- > >> what's the change in executable size?) > > > +150 kB > > > I've also looked at -CFe, which is supposedly the next slowest level, but > > it doesn't do nearly as well. > > Ouch; that sounds like about a ten percent increase in the size of > the backend executable. That's enough to reach my threshold of pain; > is the long-literal issue worth that much? Here's a breakdown of the postmaster file sizes and the wall-clock run time of the long-literal test: no options 1749912 1m58.688s -CFe 1754315 1m49.223s -CF 1817621 1m43.780s -CFa 1890197 1m45.600s (These numbers are different than yesterday's because they don't have profiling and debugging overhead.) Seeing this, I think -CF should be OK space and time-wise. > How much of your reported improvement is due to -CFa, and how much to > the coding improvements you made? As I recall it, probably a third of the overall improvement came from using -CF[a]. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Here's a breakdown of the postmaster file sizes and the wall-clock run > time of the long-literal test: > no options 1749912 1m58.688s > -CFe 1754315 1m49.223s > -CF 1817621 1m43.780s > -CFa 1890197 1m45.600s > Seeing this, I think -CF should be OK space and time-wise. Looks like a reasonable compromise to me too. regards, tom lane
BTW, here is what I get on an HP box for the same test you described (a dozen trivial SELECTs using string literals between 5MB and 10MB long), using latest sources: % cumulative self self total time seconds seconds calls ms/call ms/call name 47.51 8.19 8.19 chunks26.16 12.70 4.51 129 34.96 35.97 base_yylex12.30 14.82 2.12 1521 1.39 1.39 strlen 6.79 15.99 1.17 13 90.00 90.00 pq_getstring 4.18 16.71 0.72 chunk2 2.55 17.15 0.44 _recv_sys 0.29 17.20 0.05 _mcount "chunks" is the inner loop of memcpy() --- evidently all the time is being spent just copying those big literals around. We could probably avoid some of that copying if we had a cleaner approach to parsetree handling, ie, no scribbling on one's input. Then operations like eval_const_expressions wouldn't feel compelled to copy parsetree nodes that they weren't modifying. But I think we've gotten all the low-hanging fruit for now. At least on the backend side. Did you notice that psql was chewing up three times more CPU than the backend in this test?? regards, tom lane