Re: Scanner performance (was Re: 7.3 schedule) - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Scanner performance (was Re: 7.3 schedule)
Date
Msg-id Pine.LNX.4.30.0204161125280.689-100000@peter.localdomain
Whole thread Raw
In response to Re: Scanner performance (was Re: 7.3 schedule)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Scanner performance (was Re: 7.3 schedule)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Scanner performance (was Re: 7.3 schedule)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane writes:

> The regression tests contain no very-long literals.  The results I was
> referring to concerned cases with string (BLOB) literals in the
> hundreds-of-K range; it seems that the per-character loop in the flex
> lexer starts to look like a bottleneck when you have tokens that much
> larger than the rest of the query.
>
> Solutions seem to be either (a) make that loop quicker, or (b) find a
> way to avoid passing BLOBs through the lexer.  I was merely suggesting
> that (a) should be investigated before we invest the work implied
> by (b).

I've done the following test:  Ten statements of the form

SELECT 1 FROM tab1 WHERE val = '...';

where ... are literals of length 5 - 10 MB (some random base-64 encoded
MP3 files).  "tab1" was empty.  The test ran 3:40 min wall-clock time.

Top ten calls:
 %   cumulative   self              self     totaltime   seconds   seconds    calls  ms/call  ms/call  name36.95
9.87    9.87 74882482     0.00     0.00  pq_getbyte22.80     15.96     6.09       11   553.64  1450.93
pq_getstring13.55    19.58     3.62       11   329.09   329.10  scanstr12.09     22.81     3.23      110    29.36
86.00 base_yylex 4.27     23.95     1.14       34    33.53    33.53  yy_get_previous_state 3.86     24.98     1.03
22    46.82    46.83  textin 3.67     25.96     0.98       34    28.82    28.82  myinput 1.83     26.45     0.49
45   10.89    32.67  yy_get_next_buffer 0.11     26.48     0.03     3027     0.01     0.01  AllocSetAlloc 0.11
26.51    0.03      129     0.23     0.23  fmgr_isbuiltin
 

The string literals didn't contain any backslashes, so scanstr is
operating in the best-case scenario here.  But for arbitary binary data we
need some escape mechanism, so I don't see much room for improvement
there.

It seems the real bottleneck is the excessive abstraction in the
communications layer.  I haven't looked closely at all, but it would seem
better if pq_getstring would not use pq_getbyte and instead read the
buffer directly.

-- 
Peter Eisentraut   peter_e@gmx.net



pgsql-hackers by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Foreign Key woes -- 7.2 and ~7.3
Next
From: Bruce Momjian
Date:
Subject: Re: Scanner performance (was Re: 7.3 schedule)