Thread: Re: Parser Cruft in gram.y

Re: Parser Cruft in gram.y

From

"Kevin Grittner"

Date:

14 December 2012, 23:41:55

Tom Lane wrote:

> the parser tables are basically number-of-tokens wide by
> number-of-states high. (In HEAD there are 433 tokens known to the
> grammar, all but 30 of which are keywords, and 4367 states.)
> 
> Splitting the grammar into multiple grammars is unlikely to do
> much to improve this --- in fact, it could easily make matters
> worse due to duplication.

I agree that without knowing what percentage would be used by each
parser in a split, it could go either way.  Consider a hypothetical
situation where one parser has 80% and the other 50% of the current
combined parser -- 30% overlap on both tokens and grammer
constructs. (Picking numbers out of the air, for purposes of
demonstration.)

Combined = 433 * 4,367 = 1,890,911

80% = 346 * 3,493 = 1,208,578
50% = 216 * 2,183 =   471,528

Total for split =   1,680,106

Of course if they were both at 80% it would be a higher total than
combined, but unless you have a handle on the percentages, it
doesn't seem like a foregone conclusion. Do you have any feel for
what the split would be?

-Kevin

Re: Parser Cruft in gram.y

From

Tom Lane

Date:

15 December 2012, 16:52:57

"Kevin Grittner" <kgrittn@mail.com> writes:
> Tom Lane wrote:
>> the parser tables are basically number-of-tokens wide by
>> number-of-states high. (In HEAD there are 433 tokens known to the
>> grammar, all but 30 of which are keywords, and 4367 states.)
>> 
>> Splitting the grammar into multiple grammars is unlikely to do
>> much to improve this --- in fact, it could easily make matters
>> worse due to duplication.

> Of course if they were both at 80% it would be a higher total than
> combined, but unless you have a handle on the percentages, it
> doesn't seem like a foregone conclusion. Do you have any feel for
> what the split would be?

I don't really, but I will note that the scalar-expression subgrammar is
a pretty sizable part of the whole, and it's difficult to see how you'd
make a useful split that didn't duplicate it.  I guess you could push
CREATE TABLE, ALTER TABLE, CREATE DOMAIN, ALTER DOMAIN, COPY, and
anything else that included expression arguments over into the "main"
grammar.  But that path leads to more and more stuff getting moved to
the "main" grammar over time, making the whole thing more and more
questionable.  The whole concept seems ugly and unmaintainable in any
case.
        regards, tom lane

Re: Parser Cruft in gram.y

From

Robert Haas

Date:

18 December 2012, 02:00:40

On Sat, Dec 15, 2012 at 11:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <kgrittn@mail.com> writes:
>> Tom Lane wrote:
>>> the parser tables are basically number-of-tokens wide by
>>> number-of-states high. (In HEAD there are 433 tokens known to the
>>> grammar, all but 30 of which are keywords, and 4367 states.)
>>>
>>> Splitting the grammar into multiple grammars is unlikely to do
>>> much to improve this --- in fact, it could easily make matters
>>> worse due to duplication.
>
>> Of course if they were both at 80% it would be a higher total than
>> combined, but unless you have a handle on the percentages, it
>> doesn't seem like a foregone conclusion. Do you have any feel for
>> what the split would be?
>
> I don't really, but I will note that the scalar-expression subgrammar is
> a pretty sizable part of the whole, and it's difficult to see how you'd
> make a useful split that didn't duplicate it.  I guess you could push
> CREATE TABLE, ALTER TABLE, CREATE DOMAIN, ALTER DOMAIN, COPY, and
> anything else that included expression arguments over into the "main"
> grammar.  But that path leads to more and more stuff getting moved to
> the "main" grammar over time, making the whole thing more and more
> questionable.  The whole concept seems ugly and unmaintainable in any
> case.

I thought a little bit about the sort of thing that Dimitri is
proposing in the past, and it seemed to me that one could put DML in
one grammar and everything else in another grammar and then decide,
based on the first word of the input, which grammar to use.  But there
are a couple of problems with this.  First, the DML grammar has to
include an awful lot of stuff, because the grammar for expressions is
really complicated and involves a lot of things like special-case
syntax for XML that are probably not really carrying their weight but
which cannot easily be factored out.  Second, the DDL grammar would
have to duplicate a lot of stuff that also shows up in the DML
grammar, because things like expressions can also show up in DEFAULT
or USING clauses which show up in things like CREATE TABLE and ALTER
TABLE and CREATE SCHEMA .. CREATE TABLE.

Now either one of these problems by itself might not be sufficient to
kill the idea: if the DML grammar were a small subset of the full
grammar, one might not mind duplicating some stuff, on the grounds
that in most cases that full grammar would not be used, and using only
the smaller tables most of the time would be easier on the L1 cache.
And on the other hand, if you could get a clean split between the two
grammars, then regardless of exactly what the split was, it might seem
a win.  But it seemed to me when I looked at this that you'd have to
duplicate a lot of stuff and the small parser still wouldn't end up
being very small, which I found hard to get excited about.

I'm frankly kind of shocked at the revelation that the parser is
already 14% of the backend.  I knew it was big; I didn't realize it
was THAT big.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parser Cruft in gram.y

From

Tom Lane

Date:

18 December 2012, 03:19:48

Robert Haas <robertmhaas@gmail.com> writes:
> I'm frankly kind of shocked at the revelation that the parser is
> already 14% of the backend.  I knew it was big; I didn't realize it
> was THAT big.

Yeah, likewise.  It makes me wonder whether we aren't past the size
of grammar that bison is a good choice for: considering that gram.y
is only circa 1% of the source text, it's surprising to find that
it compiles to >10% of the object code.

I'm not sure what other tool might be better though.  I looked through
http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
but it seems our options are a bit limited if we want something
that produces C.  It's not clear to me that any of the likely options
are as mature as bison, let alone likely to substantially outperform it.
(For instance, Hyacc sounded pretty promising until I got to the part
about it doesn't yet support %union or %type.)  Still, I didn't spend
much time on this --- maybe somebody else would like to do a bit more
research.
        regards, tom lane

Re: Parser Cruft in gram.y

From

Dimitri Fontaine

Date:

18 December 2012, 09:33:29

Robert Haas <robertmhaas@gmail.com> writes:
> And on the other hand, if you could get a clean split between the two
> grammars, then regardless of exactly what the split was, it might seem
> a win.  But it seemed to me when I looked at this that you'd have to
> duplicate a lot of stuff and the small parser still wouldn't end up
> being very small, which I found hard to get excited about.

I think the goal is not so much about getting a much smaller parser, but
more about have a separate parser that you don't care about the "bloat"
of, so that you can improve DDL without fearing about main parser
performance regressions.

If we come out with no regression and no gain on the main query parser,
I say it still worth the separation effort. And anyway we only add
things to the main parser (queries) when the standard saith we have to.

Tom Lane <tgl@sss.pgh.pa.us> writes:
> I'm not sure what other tool might be better though.  I looked through
> http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
> but it seems our options are a bit limited if we want something
> that produces C.  It's not clear to me that any of the likely options
> are as mature as bison, let alone likely to substantially outperform it.
> (For instance, Hyacc sounded pretty promising until I got to the part
> about it doesn't yet support %union or %type.)  Still, I didn't spend
> much time on this --- maybe somebody else would like to do a bit more
> research.

I did spend a very little time on it too, with a different search angle,
and did find that "experimental" thing that might be worth looking at,
or maybe not.
 http://en.wikipedia.org/wiki/Parsing_expression_grammar http://piumarta.com/software/peg/

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Parser Cruft in gram.y

From

Robert Haas

Date:

18 December 2012, 22:10:33

On Tue, Dec 18, 2012 at 4:33 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> And on the other hand, if you could get a clean split between the two
>> grammars, then regardless of exactly what the split was, it might seem
>> a win.  But it seemed to me when I looked at this that you'd have to
>> duplicate a lot of stuff and the small parser still wouldn't end up
>> being very small, which I found hard to get excited about.
>
> I think the goal is not so much about getting a much smaller parser, but
> more about have a separate parser that you don't care about the "bloat"
> of, so that you can improve DDL without fearing about main parser
> performance regressions.

Well that would be nice, but the problem is that I see no way to
implement it.  If, with a unified parser, the parser is 14% of our
source code, then splitting it in two will probably crank that number
up well over 20%, because there will be duplication between the two.
That seems double-plus un-good.

I can't help but suspect that the way we handle keywords today is
monumentally inefficient.  The unreserved_keyword products, et al,
just seem somehow badly wrong-headed.  We take the trouble to
distinguish all of those cases so that we an turn around and not
distinguish them.  I feel like there ought to be some way to use lexer
states to handle this - if we're in a context where an unreserved
keyword will be treated as an IDENT, then have the lexer return IDENT
when it sees an unreserved keyword.  I might be wrong, but it seems
like that would eliminate a whole lot of parser state transitions.
However, even if I'm right, I have no idea how to implement it.  It
just seems very wasteful that we have so many parser states that have
no purpose other than (effectively) to convert an unreserved_keyword
into an IDENT when the lexer could do the same thing much more cheaply
given a bit more context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parser Cruft in gram.y

From

Peter Eisentraut

Date:

18 December 2012, 22:24:11

On 12/18/12 5:10 PM, Robert Haas wrote:
> I can't help but suspect that the way we handle keywords today is
> monumentally inefficient.  The unreserved_keyword products, et al,
> just seem somehow badly wrong-headed.  We take the trouble to
> distinguish all of those cases so that we an turn around and not
> distinguish them.  I feel like there ought to be some way to use lexer
> states to handle this - if we're in a context where an unreserved
> keyword will be treated as an IDENT, then have the lexer return IDENT
> when it sees an unreserved keyword.

The problem would be the lookahead.  You need to know the next token
before you can decide what context the current one is in.

Re: Parser Cruft in gram.y

From

Robert Haas

Date:

18 December 2012, 22:44:38

On Tue, Dec 18, 2012 at 5:24 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 12/18/12 5:10 PM, Robert Haas wrote:
>> I can't help but suspect that the way we handle keywords today is
>> monumentally inefficient.  The unreserved_keyword products, et al,
>> just seem somehow badly wrong-headed.  We take the trouble to
>> distinguish all of those cases so that we an turn around and not
>> distinguish them.  I feel like there ought to be some way to use lexer
>> states to handle this - if we're in a context where an unreserved
>> keyword will be treated as an IDENT, then have the lexer return IDENT
>> when it sees an unreserved keyword.
>
> The problem would be the lookahead.  You need to know the next token
> before you can decide what context the current one is in.

Yeah, that's why I don't know how to make it work.  It feels like this
is partly artifact of the tool, though.  I mean, suppose we haven't
read anything yet.  Then, the next token can't be an IDENT, so if we
see an unreserved keyword, we know we're not going to convert it to an
IDENT.  OTOH, if we've seen CREATE TABLE, the next token cannot be an
unreserved keyword that is intended as a keyword; it has to be
something that will reduce to ColId.

I guess the problem situation is where we can shift the keyword and
then use the following token to decide whether to reduce it to
ColId/type_function_name/ColLabel or use some other rule instead; the
CREATE INDEX CONCURRENTLY productions might be such a case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parser Cruft in gram.y

From

David Fetter

Date:

19 December 2012, 14:21:19

On Tue, Dec 18, 2012 at 10:33:12AM +0100, Dimitri Fontaine wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > And on the other hand, if you could get a clean split between the two
> > grammars, then regardless of exactly what the split was, it might seem
> > a win.  But it seemed to me when I looked at this that you'd have to
> > duplicate a lot of stuff and the small parser still wouldn't end up
> > being very small, which I found hard to get excited about.
> 
> I think the goal is not so much about getting a much smaller parser, but
> more about have a separate parser that you don't care about the "bloat"
> of, so that you can improve DDL without fearing about main parser
> performance regressions.

In addition to this, there could well be uses for a more modular
parser.  For example, if it turns out to be possible to do our parser
in a way that is exportable and (can be converted to a type that)
looks forward, client code could do a lot of interesting (and provably
correct) things.

> If we come out with no regression and no gain on the main query parser,
> I say it still worth the separation effort. And anyway we only add
> things to the main parser (queries) when the standard saith we have to.
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > I'm not sure what other tool might be better though.  I looked through
> > http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
> > but it seems our options are a bit limited if we want something
> > that produces C.  It's not clear to me that any of the likely options
> > are as mature as bison, let alone likely to substantially outperform it.
> > (For instance, Hyacc sounded pretty promising until I got to the part
> > about it doesn't yet support %union or %type.)  Still, I didn't spend
> > much time on this --- maybe somebody else would like to do a bit more
> > research.
> 
> I did spend a very little time on it too, with a different search angle,
> and did find that "experimental" thing that might be worth looking at,
> or maybe not.
> 
>   http://en.wikipedia.org/wiki/Parsing_expression_grammar
>   http://piumarta.com/software/peg/

More angles:
   http://wwwiti.cs.uni-magdeburg.de/~rosenmue/publications/SKS+08.pdf

Might anyone here wish to investigate this, given some kind of
resources for the initial study?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Parser Cruft in gram.y

From

Greg Stark

Date:

20 December 2012, 03:02:04

On Tue, Dec 18, 2012 at 10:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, that's why I don't know how to make it work.  It feels like this
> is partly artifact of the tool, though.  I mean, suppose we haven't
> read anything yet.  Then, the next token can't be an IDENT, so if we
> see an unreserved keyword, we know we're not going to convert it to an
> IDENT.  OTOH, if we've seen CREATE TABLE, the next token cannot be an
> unreserved keyword that is intended as a keyword; it has to be
> something that will reduce to ColId.
>
> I guess the problem situation is where we can shift the keyword and
> then use the following token to decide whether to reduce it to
> ColId/type_function_name/ColLabel or use some other rule instead; the
> CREATE INDEX CONCURRENTLY productions might be such a case.

It seems to me the avenue for simplifying the transition table would
be keywords that can never be used in the same place. That is, if we
replaced all the elements of such a set with a single token then the
grammar would be unambigous and we could insert a check that the right
actual token was present in each place it's used. I'm thinking of the
various "noise words" that the SQL standard introduces which are all
going to be reduced to IDENT except for a few places each of which
will only admit one such noise word anyways.

I think doing this manually would be unmaintainable since every time
we modified the grammar it would introduce random unpredictable
conflicts which would be hard to debug. But I wonder if we could
preparse the transitions table, find any such large sets and rewrite
either the transition table or regenerate the grammar and rerun bison
on it.

Alternately we could just replace the transition table with a
representation that is less wasteful such as a list of perfect hash
tables just large enough to hold the valid transition. Or even a
single very large perfect hash table where the key is <state,token>.

But I'm not entirely convinced any of this is actually useful. Just
becuase the transition table is large doesn't mean it's inefficient.
Most of these bytes are being wasted on transitions which can never
occur or which can never occur in syntactically valid SQL. The
transitions which can occur will still be present in any condensed
representation we come up with. The L2 cache might still be large
enough to hold these hot transitions which might not be a very large
subset at all.

valgrind comes with a tool called cachegrind which can emulate the
cache algorithm on some variants of various cpus and produce reports.
Can it be made to produce a report for a specific block of memory?

-- 
greg

Re: Parser Cruft in gram.y

From

Tom Lane

Date:

20 December 2012, 03:18:54

Greg Stark <stark@mit.edu> writes:
> But I'm not entirely convinced any of this is actually useful. Just
> becuase the transition table is large doesn't mean it's inefficient.

That's a fair point.  However, I've often noticed base_yyparse() showing
up rather high on profiles --- higher than seemed plausible at the time,
given that its state-machine implementation is pretty tight.  Now I'm
wondering whether that isn't coming from cache stalls from trying to
touch all the requisite parts of the transition table.

> valgrind comes with a tool called cachegrind which can emulate the
> cache algorithm on some variants of various cpus and produce reports.
> Can it be made to produce a report for a specific block of memory?

I believe that oprofile can be persuaded to produce statistics about
where in one's code are the most cache misses, not just the most
wall-clock ticks; which would shed a lot of light on this question.
However, my oprofile-fu doesn't quite extend to actually persuading it.
        regards, tom lane

Re: Parser Cruft in gram.y

From

Robert Haas

Date:

20 December 2012, 13:00:52

On Wed, Dec 19, 2012 at 10:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> valgrind comes with a tool called cachegrind which can emulate the
>> cache algorithm on some variants of various cpus and produce reports.
>> Can it be made to produce a report for a specific block of memory?
>
> I believe that oprofile can be persuaded to produce statistics about
> where in one's code are the most cache misses, not just the most
> wall-clock ticks; which would shed a lot of light on this question.
> However, my oprofile-fu doesn't quite extend to actually persuading it.

perf can certainly do this.

$ perf record -a -e cache-misses pgbench -n -S -T 30
...output elided...
$ perf report -d postgres | grep -v '^#' | head    8.88%   postgres  base_yyparse    7.05%    swapper  0x807c    4.67%
postgres  SearchCatCache    3.77%    pgbench  0x172dd58    3.47%   postgres  hash_search_with_hash_value    3.23%
postgres AllocSetAlloc    2.58%   postgres  core_yylex    1.87%   postgres  LWLockAcquire    1.83%   postgres
fmgr_info_cxt_security   1.75%   postgres  0x4d1054
 

For comparison:

$ perf record -a -e cycles -d postgres pgbench -n -S -T 30
$ perf report -d postgres | grep -v '^#' | head    6.54%     postgres  AllocSetAlloc    4.08%      swapper  0x4ce754
3.60%    postgres  hash_search_with_hash_value    2.74%     postgres  base_yyparse    2.71%     postgres
MemoryContextAllocZeroAligned   2.57%     postgres  MemoryContextAlloc    2.36%     postgres  SearchCatCache    2.10%
 postgres  _bt_compare    1.70%     postgres  LWLockAcquire    1.54%     postgres  FunctionCall2Coll
 

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parser Cruft in gram.y

From

Peter Eisentraut

Date:

20 December 2012, 13:47:56

On 12/18/12 5:10 PM, Robert Haas wrote:
> I can't help but suspect that the way we handle keywords today is
> monumentally inefficient.  The unreserved_keyword products, et al,
> just seem somehow badly wrong-headed.  We take the trouble to
> distinguish all of those cases so that we an turn around and not
> distinguish them.  I feel like there ought to be some way to use lexer
> states to handle this - if we're in a context where an unreserved
> keyword will be treated as an IDENT, then have the lexer return IDENT
> when it sees an unreserved keyword.  I might be wrong, but it seems
> like that would eliminate a whole lot of parser state transitions.
> However, even if I'm right, I have no idea how to implement it.  It
> just seems very wasteful that we have so many parser states that have
> no purpose other than (effectively) to convert an unreserved_keyword
> into an IDENT when the lexer could do the same thing much more cheaply
> given a bit more context.

Another way of attack along these lines might be to use the %glr-parser
and then try to cut back on all those redundant rules that were put in
to avoid conflicts.  The number of key words categories and such could
perhaps also be reduced that way.

Of course, this is mostly speculation.

Re: Parser Cruft in gram.y

From

Simon Riggs

Date:

20 December 2012, 13:55:51

On 18 December 2012 22:10, Robert Haas <robertmhaas@gmail.com> wrote:

> Well that would be nice, but the problem is that I see no way to
> implement it.  If, with a unified parser, the parser is 14% of our
> source code, then splitting it in two will probably crank that number
> up well over 20%, because there will be duplication between the two.
> That seems double-plus un-good.

I don't think the size of the parser binary is that relevant. What is
relevant is how much of that is regularly accessed.

Increasing parser cache misses for DDL and increasing size of binary
overall are acceptable costs if we are able to swap out the unneeded
areas and significantly reduce the cache misses on the well travelled
portions of the parser.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parser Cruft in gram.y

From

Robert Haas

Date:

20 December 2012, 14:11:51

On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 18 December 2012 22:10, Robert Haas <robertmhaas@gmail.com> wrote:
>> Well that would be nice, but the problem is that I see no way to
>> implement it.  If, with a unified parser, the parser is 14% of our
>> source code, then splitting it in two will probably crank that number
>> up well over 20%, because there will be duplication between the two.
>> That seems double-plus un-good.
>
> I don't think the size of the parser binary is that relevant. What is
> relevant is how much of that is regularly accessed.
>
> Increasing parser cache misses for DDL and increasing size of binary
> overall are acceptable costs if we are able to swap out the unneeded
> areas and significantly reduce the cache misses on the well travelled
> portions of the parser.

I generally agree.  We don't want to bloat the size of the parser with
wild abandon, but yeah if we can reduce the cache misses on the
well-travelled portions that seems like it ought to help.  My previous
hacky attempt to quantify the potential benefit in this area was:

http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php

On my machine there seemed to be a small but consistent win; on a very
old box Jeff Janes tried, it didn't seem like there was any benefit at
all.  Somehow, I have a feeling we're missing a trick here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parser Cruft in gram.y

From

Andres Freund

Date:

20 December 2012, 14:45:54

On 2012-12-20 09:11:46 -0500, Robert Haas wrote:
> On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On 18 December 2012 22:10, Robert Haas <robertmhaas@gmail.com> wrote:
> >> Well that would be nice, but the problem is that I see no way to
> >> implement it.  If, with a unified parser, the parser is 14% of our
> >> source code, then splitting it in two will probably crank that number
> >> up well over 20%, because there will be duplication between the two.
> >> That seems double-plus un-good.
> >
> > I don't think the size of the parser binary is that relevant. What is
> > relevant is how much of that is regularly accessed.
> >
> > Increasing parser cache misses for DDL and increasing size of binary
> > overall are acceptable costs if we are able to swap out the unneeded
> > areas and significantly reduce the cache misses on the well travelled
> > portions of the parser.
>
> I generally agree.  We don't want to bloat the size of the parser with
> wild abandon, but yeah if we can reduce the cache misses on the
> well-travelled portions that seems like it ought to help.  My previous
> hacky attempt to quantify the potential benefit in this area was:
>
> http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php
>
> On my machine there seemed to be a small but consistent win; on a very
> old box Jeff Janes tried, it didn't seem like there was any benefit at
> all.  Somehow, I have a feeling we're missing a trick here.

I don't think you will see too many cache misses on such a low number of
extremly simply statements, so its not too surprising not to see a that
big difference with that.

Are we sure its really cache-misses and not just the actions performed
in the grammar that make bison code show up in profiles? I remember the
latter being the case...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Parser Cruft in gram.y

From

Andres Freund

Date:

20 December 2012, 14:51:51

On 2012-12-20 15:45:47 +0100, Andres Freund wrote:
> On 2012-12-20 09:11:46 -0500, Robert Haas wrote:
> > On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > > On 18 December 2012 22:10, Robert Haas <robertmhaas@gmail.com> wrote:
> > >> Well that would be nice, but the problem is that I see no way to
> > >> implement it.  If, with a unified parser, the parser is 14% of our
> > >> source code, then splitting it in two will probably crank that number
> > >> up well over 20%, because there will be duplication between the two.
> > >> That seems double-plus un-good.
> > >
> > > I don't think the size of the parser binary is that relevant. What is
> > > relevant is how much of that is regularly accessed.
> > >
> > > Increasing parser cache misses for DDL and increasing size of binary
> > > overall are acceptable costs if we are able to swap out the unneeded
> > > areas and significantly reduce the cache misses on the well travelled
> > > portions of the parser.
> >
> > I generally agree.  We don't want to bloat the size of the parser with
> > wild abandon, but yeah if we can reduce the cache misses on the
> > well-travelled portions that seems like it ought to help.  My previous
> > hacky attempt to quantify the potential benefit in this area was:
> >
> > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php
> >
> > On my machine there seemed to be a small but consistent win; on a very
> > old box Jeff Janes tried, it didn't seem like there was any benefit at
> > all.  Somehow, I have a feeling we're missing a trick here.
>
> I don't think you will see too many cache misses on such a low number of
> extremly simply statements, so its not too surprising not to see a that
> big difference with that.
>
> Are we sure its really cache-misses and not just the actions performed
> in the grammar that make bison code show up in profiles? I remember the
> latter being the case...

Hm. A very, very quick perf stat -dvvv of pgbench -S -c 20 -j 20 -T 20 later:
    218350.885559 task-clock                #   10.095 CPUs utilized        1,676,479 context-switches          #
0.008M/sec            2,396 cpu-migrations            #    0.011 K/sec          796,038 page-faults               #
0.004M/sec  506,312,525,518 cycles                    #    2.319 GHz                     [20.00%]  405,944,435,754
stalled-cycles-frontend  #   80.18% frontend cycles idle    [30.32%]  236,712,872,641 stalled-cycles-backend    #
46.75%backend  cycles idle    [40.51%]  193,459,120,458 instructions              #    0.38  insns per cycle
                               #    2.10  stalled cycles per insn [50.70%]   36,433,144,472 branches                  #
166.856 M/sec                   [51.12%]    3,623,778,087 branch-misses             #    9.95% of all branches
[50.87%]  50,344,123,581 L1-dcache-loads           #  230.565 M/sec                   [50.33%]    5,548,192,686
L1-dcache-load-misses    #   11.02% of all L1-dcache hits   [49.69%]    2,666,461,361 LLC-loads                 #
12.212M/sec                   [35.63%]      112,407,198 LLC-load-misses           #    4.22% of all LL-cache hits    [
9.67%]
     21.629396701 seconds time elapsed

So there definitely a noticeable rate of cache misses...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Parser Cruft in gram.y

From

Andres Freund

Date:

20 December 2012, 15:05:03

On 2012-12-20 15:51:37 +0100, Andres Freund wrote:
> On 2012-12-20 15:45:47 +0100, Andres Freund wrote:
> > On 2012-12-20 09:11:46 -0500, Robert Haas wrote:
> > > On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > > > On 18 December 2012 22:10, Robert Haas <robertmhaas@gmail.com> wrote:
> > > >> Well that would be nice, but the problem is that I see no way to
> > > >> implement it.  If, with a unified parser, the parser is 14% of our
> > > >> source code, then splitting it in two will probably crank that number
> > > >> up well over 20%, because there will be duplication between the two.
> > > >> That seems double-plus un-good.
> > > >
> > > > I don't think the size of the parser binary is that relevant. What is
> > > > relevant is how much of that is regularly accessed.
> > > >
> > > > Increasing parser cache misses for DDL and increasing size of binary
> > > > overall are acceptable costs if we are able to swap out the unneeded
> > > > areas and significantly reduce the cache misses on the well travelled
> > > > portions of the parser.
> > >
> > > I generally agree.  We don't want to bloat the size of the parser with
> > > wild abandon, but yeah if we can reduce the cache misses on the
> > > well-travelled portions that seems like it ought to help.  My previous
> > > hacky attempt to quantify the potential benefit in this area was:
> > >
> > > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php
> > >
> > > On my machine there seemed to be a small but consistent win; on a very
> > > old box Jeff Janes tried, it didn't seem like there was any benefit at
> > > all.  Somehow, I have a feeling we're missing a trick here.
> >
> > I don't think you will see too many cache misses on such a low number of
> > extremly simply statements, so its not too surprising not to see a that
> > big difference with that.
> >
> > Are we sure its really cache-misses and not just the actions performed
> > in the grammar that make bison code show up in profiles? I remember the
> > latter being the case...
>
> Hm. A very, very quick perf stat -dvvv of pgbench -S -c 20 -j 20 -T 20 later:
>
>      218350.885559 task-clock                #   10.095 CPUs utilized
>          1,676,479 context-switches          #    0.008 M/sec
>              2,396 cpu-migrations            #    0.011 K/sec
>            796,038 page-faults               #    0.004 M/sec
>    506,312,525,518 cycles                    #    2.319 GHz                     [20.00%]
>    405,944,435,754 stalled-cycles-frontend   #   80.18% frontend cycles idle    [30.32%]
>    236,712,872,641 stalled-cycles-backend    #   46.75% backend  cycles idle    [40.51%]
>    193,459,120,458 instructions              #    0.38  insns per cycle
>                                              #    2.10  stalled cycles per insn [50.70%]
>     36,433,144,472 branches                  #  166.856 M/sec                   [51.12%]
>      3,623,778,087 branch-misses             #    9.95% of all branches         [50.87%]
>     50,344,123,581 L1-dcache-loads           #  230.565 M/sec                   [50.33%]
>      5,548,192,686 L1-dcache-load-misses     #   11.02% of all L1-dcache hits   [49.69%]
>      2,666,461,361 LLC-loads                 #   12.212 M/sec                   [35.63%]
>        112,407,198 LLC-load-misses           #    4.22% of all LL-cache hits    [ 9.67%]
>
>       21.629396701 seconds time elapsed
>
> So there definitely a noticeable rate of cache misses...

L1 misses:
# Samples: 997K of event 'L1-dcache-load-misses'
# Overhead   Command       Shared Object Symbol
# ........  ........  ...............................................................    6.49%  postgres  postgres
     [.] SearchCatCache    3.65%  postgres  postgres            [.] base_yyparse    3.48%  postgres  postgres
[.] hash_search_with_hash_value    3.41%  postgres  postgres            [.] AllocSetAlloc    1.84%  postgres  postgres
         [.] LWLockAcquire    1.40%  postgres  postgres            [.] fmgr_info_cxt_security    1.36%  postgres
postgres           [.] nocachegetattr    1.23%  postgres  libc-2.13.so        [.] _int_malloc    1.20%  postgres
postgres           [.] core_yylex    1.15%  postgres  postgres            [.] MemoryContextAllocZeroAligned    0.94%
postgres postgres            [.] PostgresMain    0.94%  postgres  postgres            [.] MemoryContextAlloc    0.91%
postgres libc-2.13.so        [.] __memcpy_ssse3_back    0.89%  postgres  postgres            [.]
CatalogCacheComputeHashValue   0.86%  postgres  postgres            [.] PinBuffer    0.86%  postgres  [kernel.kallsyms]
 [k] __audit_syscall_entry    0.80%  postgres  libc-2.13.so        [.] __strcmp_sse42    0.80%  postgres  postgres
     [.] _bt_compare    0.78%  postgres  postgres            [.] FunctionCall2Coll    0.77%  postgres  libc-2.13.so
  [.] malloc    0.73%  postgres  libc-2.13.so        [.] __memset_sse2    0.72%  postgres  postgres            [.]
GetSnapshotData   0.69%  postgres  [kernel.kallsyms]   [k] fget_light    0.69%  postgres  postgres            [.]
DirectFunctionCall1Coll   0.67%  postgres  postgres            [.] hash_search    0.67%  postgres  libc-2.13.so
[.]0x000000000011a3a5    0.66%  postgres  postgres            [.] pgstat_initstats    0.66%  postgres  postgres
  [.] AllocSetFree    0.65%  postgres  libc-2.13.so        [.] __strlen_sse42    0.60%  postgres  libc-2.13.so
[.]_int_free    0.60%  postgres  [kernel.kallsyms]   [k] cpuacct_charge    0.59%  postgres  postgres            [.]
heap_getsysattr   0.59%  postgres  postgres            [.] MemoryContextAllocZero    0.58%  postgres  postgres
 [.] PopActiveSnapshot    0.53%  postgres  libc-2.13.so        [.] __memcmp_sse4_1    0.51%  postgres  postgres
  [.] ReadBuffer_common    0.49%  postgres  postgres            [.] ScanKeywordLookup    0.49%  postgres  postgres
     [.] LockAcquireExtended    0.47%  postgres  [kernel.kallsyms]   [k] update_cfs_shares    0.45%  postgres  postgres
          [.] SearchCatCacheList    0.45%  postgres  postgres            [.] new_list    0.44%  postgres  postgres
     [.] get_relation_info
 

LLC misses:
# Samples: 1M of event 'LLC-load-misses'
# Event count (approx.): 143379713
# Overhead   Command       Shared Object Symbol
# ........  ........  ...............................................................   25.08%  postgres  postgres
     [.] _bt_compare   12.84%  postgres  postgres            [.] PinBuffer    9.18%  postgres  postgres            [.]
LWLockAcquire   6.31%  postgres  postgres            [.] GetSnapshotData    6.08%  postgres  postgres            [.]
heap_hot_search_buffer   5.13%  postgres  postgres            [.] hash_search_with_hash_value    4.85%  postgres
postgres           [.] _bt_checkpage    3.95%  postgres  postgres            [.] _bt_moveright    3.09%  postgres
postgres           [.] heap_page_prune_opt    2.12%  postgres  postgres            [.] slot_deform_tuple    1.98%
postgres postgres            [.] LWLockRelease    1.82%  postgres  libc-2.13.so        [.] __memcmp_sse4_1    1.16%
postgres postgres            [.] ExecProject    1.16%  postgres  postgres            [.] FunctionCall2Coll    0.94%
postgres [kernel.kallsyms]   [k] copy_user_generic_string    0.94%  postgres  [kernel.kallsyms]   [k] tg_load_down
0.78% postgres  [kernel.kallsyms]   [k] find_get_page    0.73%  postgres  postgres            [.]
ProcArrayEndTransaction   0.73%  postgres  postgres            [.] pfree    0.71%  postgres  postgres            [.]
pgstat_report_xact_timestamp   0.69%  postgres  postgres            [.] index_fetch_heap    0.66%  postgres  postgres
        [.] LockAcquireExtended    0.56%  postgres  postgres            [.] LockBuffer    0.45%  postgres  postgres
      [.] slot_getsomeattrs    0.40%  postgres  postgres            [.] _bt_readpage 

So it seems L1 misses are the interesting thing wrt to parsing.

When doing a source/assembly annotation in SearchCatCache about half of
the misses are attributed to the memcpy() directly at the beginning.
In base_yyparse the three biggest offenders (15%, 10.5%, 5.58%) really
seem to be various kinds of table lookups.

So it seems to confirm the various suspicious that the table size might
be rather relevant.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Parser Cruft in gram.y

From

Andres Freund

Date:

20 December 2012, 15:36:11

On 2012-12-20 16:04:49 +0100, Andres Freund wrote:
> On 2012-12-20 15:51:37 +0100, Andres Freund wrote:
> When doing a source/assembly annotation in SearchCatCache about half of
> the misses are attributed to the memcpy() directly at the beginning.

Using a separate array for storing the arguments instead of copying &
modifying cache->cc_skey yields a 2% speedup in pgbench -S for me...

The attached patch is clearly not ready and I don't really have time &
energy to pursue it right now, but it seems interesting enough to
post. The approach seems solid and sensible although the implementation
is not (too much c&p, no comments).

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

separate-args-from-scankeydata.patch

Re: Parser Cruft in gram.y

From

Greg Stark

Date:

20 December 2012, 15:58:56

On Thu, Dec 20, 2012 at 3:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> But I'm not entirely convinced any of this is actually useful. Just
>> becuase the transition table is large doesn't mean it's inefficient.
>
> That's a fair point.  However, I've often noticed base_yyparse() showing
> up rather high on profiles --- higher than seemed plausible at the time,
> given that its state-machine implementation is pretty tight.  Now I'm
> wondering whether that isn't coming from cache stalls from trying to
> touch all the requisite parts of the transition table.

For what it's worth the bloat isn't in the parser transition table at all:
516280 yy_transition
147208 yytable
147208 yycheck
146975 base_yyparse
17468 yypact
9571 core_yylex
8734 yystos
8734 yydefact

Unless I'm confused yy_transition is in fact the *lexer* transition
table. I'm not sure how to reconcile that with the profiling results
showing the cache misses in base_yyparse() though.


-- 
greg

Re: Parser Cruft in gram.y

From

Andres Freund

Date:

20 December 2012, 16:10:21

On 2012-12-20 15:58:12 +0000, Greg Stark wrote:
> On Thu, Dec 20, 2012 at 3:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Greg Stark <stark@mit.edu> writes:
> >> But I'm not entirely convinced any of this is actually useful. Just
> >> becuase the transition table is large doesn't mean it's inefficient.
> >
> > That's a fair point.  However, I've often noticed base_yyparse() showing
> > up rather high on profiles --- higher than seemed plausible at the time,
> > given that its state-machine implementation is pretty tight.  Now I'm
> > wondering whether that isn't coming from cache stalls from trying to
> > touch all the requisite parts of the transition table.
>
> For what it's worth the bloat isn't in the parser transition table at all:
> 516280 yy_transition
> 147208 yytable
> 147208 yycheck
> 146975 base_yyparse
> 17468 yypact
> 9571 core_yylex
> 8734 yystos
> 8734 yydefact
>
> Unless I'm confused yy_transition is in fact the *lexer* transition
> table. I'm not sure how to reconcile that with the profiling results
> showing the cache misses in base_yyparse() though.

The scanner is compiled together with the parser, so its not unlikely
that the compiler bunches them together making only base_yyparse visible
in the profile.
I quickly checked though, and the top three offenders for L1 caches are
in bison not in the lexer... Its not implausible that the state
transitions in the lexer are better cached and/or predicted...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Parser Cruft in gram.y

From

"McDevitt, Charles"

Date:

20 December 2012, 23:13:31

>
> Another way of attack along these lines might be to use the %glr-parser
> and then try to cut back on all those redundant rules that were put in
> to avoid conflicts.  The number of key words categories and such could
> perhaps also be reduced that way.
>
> Of course, this is mostly speculation.
>
>

The GLR output from Bison is licensed under the GPL (unlike the LALR output).
So using Bison's GLR mode isn't an option.

Re: Parser Cruft in gram.y

From

Andres Freund

Date:

20 December 2012, 23:16:48

On 2012-12-20 23:12:55 +0000, McDevitt, Charles wrote:
> > 
> > Another way of attack along these lines might be to use the %glr-parser
> > and then try to cut back on all those redundant rules that were put in
> > to avoid conflicts.  The number of key words categories and such could
> > perhaps also be reduced that way.
> > 
> > Of course, this is mostly speculation.
> > 
> > 
> 
> The GLR output from Bison is licensed under the GPL (unlike the LALR output).
> So using Bison's GLR mode isn't an option.

Thats not the case anymore:
http://www.gnu.org/software/bison/manual/html_node/Conditions.html

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Parser Cruft in gram.y

From

"McDevitt, Charles"

Date:

20 December 2012, 23:23:42

> >
> > The GLR output from Bison is licensed under the GPL (unlike the LALR output).
> > So using Bison's GLR mode isn't an option.
>
> Thats not the case anymore:
> http://www.gnu.org/software/bison/manual/html_node/Conditions.html

Sorry!  My mistake... I didn't realize they changed the rules.
I'll be more careful to check these things in the future.

Re: Parser Cruft in gram.y

From

Tom Lane

Date:

20 December 2012, 23:49:58

Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-12-20 23:12:55 +0000, McDevitt, Charles wrote:
>>> Another way of attack along these lines might be to use the %glr-parser
>>> and then try to cut back on all those redundant rules that were put in
>>> to avoid conflicts.  The number of key words categories and such could
>>> perhaps also be reduced that way.

>> The GLR output from Bison is licensed under the GPL (unlike the LALR output).
>> So using Bison's GLR mode isn't an option.

> Thats not the case anymore:
> http://www.gnu.org/software/bison/manual/html_node/Conditions.html

This does mean that we'd have to specify a minimum bison version of 2.2
in order to be squeaky-clean license wise, if we went over to using the
GLR mode.  However, that would likely be a good idea anyway from a
technical standpoint --- the GLR mode may exist in ancient bison
versions, but who knows how bug-free it is.

Anyway, this is all merest speculation until somebody actually tries it
and sees if a performance gain is possible.  Having just re-read
the description of GLR mode, I wouldn't be surprised if any savings in
table size is squandered by its handling of ambiguous cases (ie, the
need to track and merge multiple parser states).
        regards, tom lane