I've been poking at the scanner a bit using the large literal test case
from the other day
http://archives.postgresql.org/pgsql-hackers/2002-04/msg00811.php
I've been able to reduce the wall-clock run time of that test from 3:37
min to 2:13 min and the base_yylex() per-call time from 137ms to 24ms.
I've used 'flex -8 -CFa' and restructured the code to avoid looping over
and copying the input string half a dozen times. For instance, instead of
scanstr(), the escape sequences are resolved as the input is scanned, and
instead of the myinput() routine I use the function yy_scan_buffer()
provided by flex for scanning in-memory strings. (This would make the
code flex-dependent, but in reality it already is anyway.)
The "before" profile was:
% cumulative self self totaltime seconds seconds calls ms/call ms/call name23.51
6.65 6.65 110 60.45 137.27 base_yylex23.19 13.21 6.56 11 596.36 1089.69
pq_getstring19.16 18.63 5.42 74882482 0.00 0.00 pq_getbyte14.99 22.87 4.24 11 385.45
385.46 scanstr 9.61 25.59 2.72 23 118.26 118.26 yy_get_previous_state 3.78 26.66 1.07
34 31.47 31.47 myinput 3.64 27.69 1.03 22 46.82 46.82 textin 1.48 28.11 0.42
34 12.35 43.82 yy_get_next_buffer
The "after" profile is:
% cumulative self self totaltime seconds seconds calls ms/call ms/call name40.30
5.65 5.65 11 513.64 943.64 pq_getstring33.74 10.38 4.73 74882482 0.00 0.00
pq_getbyte18.90 13.03 2.65 110 24.09 24.09 base_yylex 6.85 13.99 0.96 22 43.64
43.64 textin 0.07 14.00 0.01 86 0.12 0.12 heap_fetch
--
Peter Eisentraut peter_e@gmx.net