Scanner performance (was Re: 7.3 schedule) - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Scanner performance (was Re: 7.3 schedule)
Date
Msg-id Pine.LNX.4.30.0204121850140.847-100000@peter.localdomain
Whole thread Raw
In response to Re: 7.3 schedule  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Scanner performance (was Re: 7.3 schedule)
List pgsql-hackers
Tom Lane writes:

> We do have some numbers suggesting that the per-character loop in the
> lexer is slow enough to be a problem with very long literals.  That is
> the overhead that might be avoided with a special protocol.

Which loop is that?  Doesn't the scanner use buffered input anyway?

> However, it should be noted that (AFAIK) no one has spent any effort at
> all on trying to make the lexer go faster.  There is quite a bit of
> material in the flex documentation about performance considerations ---
> someone should take a look at it and see if we can get any wins by being
> smarter, without having to introduce protocol changes.

My profiles show that the work spent in the scanner is really minuscule
compared to everything else.

The data appears to support a suspicion that I've had many moons ago that
the binary search for the key words takes quite a bit of time:
               0.22    0.06   66748/66748       yylex [125]
[129]    0.4    0.22    0.06   66748         base_yylex [129]               0.01    0.02    9191/9191
yy_get_next_buffer[495]               0.02    0.00   32808/34053       ScanKeywordLookup [579]               0.00
0.01  16130/77100       MemoryContextStrdup [370]               0.00    0.00    4000/4000        scanstr [1057]
     0.00    0.00    4637/4637        yy_get_previous_state [2158]               0.00    0.00    4554/4554
base_yyrestart[2162]               0.00    0.00    4554/4554        yywrap [2163]               0.00    0.00       1/1
        base_yy_create_buffer [2852]               0.00    0.00       1/13695       base_yy_load_buffer_state [2107]
 

I while ago I've experimented with hash functions for the key word lookup
and got a speedup of factor 2.5, but again, this is really minor in the
overall scheme of things.

(The profile data is from a run of all the regression test files in order
in one session.)

-- 
Peter Eisentraut   peter_e@gmx.net



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Suggestions please: names for function cachabilityattributes
Next
From: Tom Lane
Date:
Subject: Re: Scanner performance (was Re: 7.3 schedule)