Home > mailing lists

Doing psql's lexing with flex - Mailing list pgsql-patches

From	Tom Lane
Subject	Doing psql's lexing with flex
Date	February 17, 2004 21:44:02
Msg-id	18389.1077057820@sss.pgh.pa.us Whole thread Raw
Responses	Re: Doing psql's lexing with flex Re: Doing psql's lexing with flex Re: Doing psql's lexing with flex Re: Doing psql's lexing with flex
List	pgsql-patches

Tree view

I got interested enough in the psql-with-flex problem to go off and
solve it.  Attached is a working patch, which I'm now debating whether
to apply.  Comments solicited...

The patch removes about 200 lines of very spaghetti-ish code in
mainloop.c.  However, it adds an 875-line flex source file, which
might be thought a bad tradeoff :-(.  One bright spot is that about
half of that total is a direct copy of the main backend lexer, so
it's not really as much new, separately maintainable code as all that.
Also, Andrew Dunstan's patch for supporting dollar-quoting would add
about 100 lines to mainloop.c, versus only a dozen or so lines in the
flex implementation.  Once that's taken into account I don't think there
is a lot of difference in effective SLOC to maintain.  I'm also of the
opinion that the new C code in psqlscan.l is much more straightforward
than the code removed from mainloop.c, though having just written it,
I'm no doubt pretty biased.

Bruce was asking about speed.  On normal-size queries I cannot measure
any difference at all.  For testing purposes I made up a file containing
a single 750K query (just a "SELECT big-honking-string-constant", with
the string literal broken into lines of 75 bytes).  The client-side
(psql) CPU time to run this file looks about like this on my machine:

              PGCLIENTENCODING
            UNICODE        SJIS

CVS tip            1.57        1.82

flex implementation    0.93        2.33

The flex implementation is consistently faster than CVS tip when dealing
with backend-compatible encodings (such as UTF-8).  It's consistently
slower when it has to deal with a non-backend-safe encoding such as SJIS
or Big5.  But for real-world cases the differential is down in the noise
either way.

I'm inclined to apply this but I can see where a person not comfortable
with flex might feel differently.  Opinions?

            regards, tom lane

Attachment

msg-22127-10482.bin

pgsql-patches by date:

From: "Magnus Hagander"
Date: 17 February 2004, 20:04:02
Subject: Re: [pgsql-hackers-win32] win32 setitimer implementation

From: Bruce Momjian
Date: 17 February 2004, 21:54:54
Subject: Re: Doing psql's lexing with flex

Doing psql's lexing with flex - Mailing list pgsql-patches

Attachment

Previous

Next