I got interested enough in the psql-with-flex problem to go off and
solve it. Attached is a working patch, which I'm now debating whether
to apply. Comments solicited...
The patch removes about 200 lines of very spaghetti-ish code in
mainloop.c. However, it adds an 875-line flex source file, which
might be thought a bad tradeoff :-(. One bright spot is that about
half of that total is a direct copy of the main backend lexer, so
it's not really as much new, separately maintainable code as all that.
Also, Andrew Dunstan's patch for supporting dollar-quoting would add
about 100 lines to mainloop.c, versus only a dozen or so lines in the
flex implementation. Once that's taken into account I don't think there
is a lot of difference in effective SLOC to maintain. I'm also of the
opinion that the new C code in psqlscan.l is much more straightforward
than the code removed from mainloop.c, though having just written it,
I'm no doubt pretty biased.
Bruce was asking about speed. On normal-size queries I cannot measure
any difference at all. For testing purposes I made up a file containing
a single 750K query (just a "SELECT big-honking-string-constant", with
the string literal broken into lines of 75 bytes). The client-side
(psql) CPU time to run this file looks about like this on my machine:
PGCLIENTENCODING
UNICODE SJIS
CVS tip 1.57 1.82
flex implementation 0.93 2.33
The flex implementation is consistently faster than CVS tip when dealing
with backend-compatible encodings (such as UTF-8). It's consistently
slower when it has to deal with a non-backend-safe encoding such as SJIS
or Big5. But for real-world cases the differential is down in the noise
either way.
I'm inclined to apply this but I can see where a person not comfortable
with flex might feel differently. Opinions?
regards, tom lane