Tom Lane writes:
> We do have some numbers suggesting that the per-character loop in the
> lexer is slow enough to be a problem with very long literals. That is
> the overhead that might be avoided with a special protocol.
Which loop is that? Doesn't the scanner use buffered input anyway?
> However, it should be noted that (AFAIK) no one has spent any effort at
> all on trying to make the lexer go faster. There is quite a bit of
> material in the flex documentation about performance considerations ---
> someone should take a look at it and see if we can get any wins by being
> smarter, without having to introduce protocol changes.
My profiles show that the work spent in the scanner is really minuscule
compared to everything else.
The data appears to support a suspicion that I've had many moons ago that
the binary search for the key words takes quite a bit of time:
0.22 0.06 66748/66748 yylex [125]
[129] 0.4 0.22 0.06 66748 base_yylex [129] 0.01 0.02 9191/9191
yy_get_next_buffer[495] 0.02 0.00 32808/34053 ScanKeywordLookup [579] 0.00
0.01 16130/77100 MemoryContextStrdup [370] 0.00 0.00 4000/4000 scanstr [1057]
0.00 0.00 4637/4637 yy_get_previous_state [2158] 0.00 0.00 4554/4554
base_yyrestart[2162] 0.00 0.00 4554/4554 yywrap [2163] 0.00 0.00 1/1
base_yy_create_buffer [2852] 0.00 0.00 1/13695 base_yy_load_buffer_state [2107]
I while ago I've experimented with hash functions for the key word lookup
and got a speedup of factor 2.5, but again, this is really minor in the
overall scheme of things.
(The profile data is from a run of all the regression test files in order
in one session.)
--
Peter Eisentraut peter_e@gmx.net