Thread: eviscerating the parser

eviscerating the parser

From
Robert Haas
Date:
Just to see how much difference it would make, I tried ripping
everything out of the parser except for support for DML queries.  In
addition to removing the actual rules, I also yanked out most of the
unreserved keywords that are needed only for DML.  Results (pgbench -n
-S -T 300):

With patch:
tps = 10212.893052 (including connections establishing)
tps = 10213.012916 (excluding connections establishing)
tps = 10216.606402 (including connections establishing)
tps = 10216.722802 (excluding connections establishing)

Without patch:
tps = 10119.528987 (including connections establishing)
tps = 10119.642155 (excluding connections establishing)
tps = 10167.798764 (including connections establishing)
tps = 10167.900407 (excluding connections establishing)

This means that, in a situation where aren't using DML, and are
running very simple queries without prepared statements, the parser
bloat resulting from supporting all the other kinds of queries which
aren't being exercised by the tests results in a slowdown of
approximately 0.7%.

Patch is attached, in case anyone wants to play with it.  The size of
the generated parser is reduced by about two-third with this applied.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: eviscerating the parser

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of vie may 20 18:41:37 -0400 2011:

> This means that, in a situation where aren't using DML, and are
> running very simple queries without prepared statements, the parser
> bloat resulting from supporting all the other kinds of queries which
> aren't being exercised by the tests results in a slowdown of
> approximately 0.7%.

So the point here is, we do not need to worry about adding new keywords,
because the performance impact is really minimal.  Right?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: eviscerating the parser

From
Robert Haas
Date:
On Sat, May 21, 2011 at 12:13 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of vie may 20 18:41:37 -0400 2011:
>> This means that, in a situation where aren't using DML, and are
>> running very simple queries without prepared statements, the parser
>> bloat resulting from supporting all the other kinds of queries which
>> aren't being exercised by the tests results in a slowdown of
>> approximately 0.7%.
>
> So the point here is, we do not need to worry about adding new keywords,
> because the performance impact is really minimal.  Right?

I think there are several possible points to be made here.  I agree
that it's somewhat reassuring in that it certainly means that the
likely impact of any single keyword is probably minimal.  On the other
hand, I wouldn't go so far as to say that we can add infinite numbers
of keywords with wild abandon: that's certainly not true, and spending
two or three minutes trying to use the existing ones rather than
adding new ones is probably time well spent.  But on the flip side
there seems to be no reason for alarm about adding ~10
keywords/release or so, which I think is approximately what we've been
doing.

Another point is that parsing overhead is quite obviously not the
reason for the massive performance gap between one core running simple
selects on PostgreSQL and one core running simple selects on MySQL.
Even if I had (further) eviscerated the parser to cover only the
syntax those queries actually use, it wasn't going to buy more than a
couple points.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: eviscerating the parser

From
Robert Haas
Date:
On Sat, May 21, 2011 at 7:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Another point is that parsing overhead is quite obviously not the
> reason for the massive performance gap between one core running simple
> selects on PostgreSQL and one core running simple selects on MySQL.
> Even if I had (further) eviscerated the parser to cover only the
> syntax those queries actually use, it wasn't going to buy more than a
> couple points.

Incidentally, prepared transactions help a lot.  On unpatched master,
with pgbench -T 300 -S -n:

tps = 10106.900801 (including connections establishing)
tps = 10107.015951 (excluding connections establishing)

vs.

tps = 18212.053457 (including connections establishing)
tps = 18212.246077 (excluding connections establishing)

The reasons for the magnitude of that difference are not entirely
apparent to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: eviscerating the parser

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> Incidentally, prepared transactions help a lot.
Prepared transactions or prepared statements?
-Kevin



Re: eviscerating the parser

From
Robert Haas
Date:
On Sat, May 21, 2011 at 8:36 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas  wrote:
>> Incidentally, prepared transactions help a lot.
>
> Prepared transactions or prepared statements?

Uh, statements.  -M prepared.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: eviscerating the parser

From
Jeff Janes
Date:
On Sat, May 21, 2011 at 5:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, May 21, 2011 at 7:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Another point is that parsing overhead is quite obviously not the
>> reason for the massive performance gap between one core running simple
>> selects on PostgreSQL and one core running simple selects on MySQL.
>> Even if I had (further) eviscerated the parser to cover only the
>> syntax those queries actually use, it wasn't going to buy more than a
>> couple points.
>
> Incidentally, prepared transactions help a lot.  On unpatched master,
> with pgbench -T 300 -S -n:
>
> tps = 10106.900801 (including connections establishing)
> tps = 10107.015951 (excluding connections establishing)

Are you sure that you actually ran that with -M prepared?  The numbers
look suspiciously similar to the ones reported in your original email.

For what it is worth, on my ancient hardware, the patched code is
slower than the unpatched just as often as it is faster, using -n -S
-T 300 on alternations between servers.

Cheers,

Jeff


Re: eviscerating the parser

From
Robert Haas
Date:
On Sat, May 21, 2011 at 8:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sat, May 21, 2011 at 5:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, May 21, 2011 at 7:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Another point is that parsing overhead is quite obviously not the
>>> reason for the massive performance gap between one core running simple
>>> selects on PostgreSQL and one core running simple selects on MySQL.
>>> Even if I had (further) eviscerated the parser to cover only the
>>> syntax those queries actually use, it wasn't going to buy more than a
>>> couple points.
>>
>> Incidentally, prepared transactions help a lot.  On unpatched master,
>> with pgbench -T 300 -S -n:
>>
>> tps = 10106.900801 (including connections establishing)
>> tps = 10107.015951 (excluding connections establishing)
>
> Are you sure that you actually ran that with -M prepared?  The numbers
> look suspiciously similar to the ones reported in your original email.

That's without -M prepared; the subsequent number (~18K) is the one
with -M prepared.  So prepared transactions increased throughput by
about 80%, in this test.

> For what it is worth, on my ancient hardware, the patched code is
> slower than the unpatched just as often as it is faster, using -n -S
> -T 300 on alternations between servers.

Well, that's pretty interesting.  The effect *appeared* to be small
but consistent in my testing, but it could be I just got lucky; or the
choice of architecture and/or OS might matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: eviscerating the parser

From
Joshua Berkus
Date:
Robert,

> Another point is that parsing overhead is quite obviously not the
> reason for the massive performance gap between one core running simple
> selects on PostgreSQL and one core running simple selects on MySQL.
> Even if I had (further) eviscerated the parser to cover only the
> syntax those queries actually use, it wasn't going to buy more than a
> couple points.

I don't know if you say Jignesh's presentation, but there seems to be a lot of reason to believe that we are lock-bound
onlarge numbers of concurrent read-only queries.
 

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


Re: eviscerating the parser

From
Robert Haas
Date:
On Sun, May 22, 2011 at 1:38 PM, Joshua Berkus <josh@agliodbs.com> wrote:
>> Another point is that parsing overhead is quite obviously not the
>> reason for the massive performance gap between one core running simple
>> selects on PostgreSQL and one core running simple selects on MySQL.
>> Even if I had (further) eviscerated the parser to cover only the
>> syntax those queries actually use, it wasn't going to buy more than a
>> couple points.
>
> I don't know if you say Jignesh's presentation, but there seems to be a lot of reason to believe that we are
lock-boundon large numbers of concurrent read-only queries.
 

I didn't see Jignesh's presentation, but I'd come to the same
conclusion (with some help from Jeff Janes and others):

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01643.php
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01665.php

We did also recently discuss how we might improve the behavior in this case:

http://archives.postgresql.org/pgsql-hackers/2011-05/msg00787.php

...and ensuing discussion.

However, in this case, there was only one client, so that's not the
problem.  I don't really see how to get a big win here.  If we want to
be 4x faster, we'd need to cut time per query by 75%.  That might
require 75 different optimizations averaging 1% a piece, most likely
none of them trivial.  I do confess I'm a bit confused as to why
prepared statements help so much.  That is increasing the throughput
by 80%, which is equivalent to decreasing time per query by 45%.  That
is a surprisingly big number, and I'd like to better understand where
all that time is going.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: eviscerating the parser

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Sun, May 22, 2011 at 1:38 PM, Joshua Berkus <josh@agliodbs.com> wrote:
> >> Another point is that parsing overhead is quite obviously not the
> >> reason for the massive performance gap between one core running simple
> >> selects on PostgreSQL and one core running simple selects on MySQL.
> >> Even if I had (further) eviscerated the parser to cover only the
> >> syntax those queries actually use, it wasn't going to buy more than a
> >> couple points.
> >
> > I don't know if you say Jignesh's presentation, but there seems to be a lot of reason to believe that we are
lock-boundon large numbers of concurrent read-only queries.
 
> 
> I didn't see Jignesh's presentation, but I'd come to the same
> conclusion (with some help from Jeff Janes and others):
> 
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg01643.php
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg01665.php
> 
> We did also recently discuss how we might improve the behavior in this case:
> 
> http://archives.postgresql.org/pgsql-hackers/2011-05/msg00787.php
> 
> ...and ensuing discussion.
> 
> However, in this case, there was only one client, so that's not the
> problem.  I don't really see how to get a big win here.  If we want to
> be 4x faster, we'd need to cut time per query by 75%.  That might
> require 75 different optimizations averaging 1% a piece, most likely
> none of them trivial.  I do confess I'm a bit confused as to why
> prepared statements help so much.  That is increasing the throughput
> by 80%, which is equivalent to decreasing time per query by 45%.  That
> is a surprisingly big number, and I'd like to better understand where
> all that time is going.

Prepared statements are pre-parsed/rewritten/planned, but I can't see
how decreasing the parser size would affect those other stages, and
certainly not 45%.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: eviscerating the parser

From
Jeff Janes
Date:
On Sun, May 22, 2011 at 3:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
...
>
> However, in this case, there was only one client, so that's not the
> problem.  I don't really see how to get a big win here.  If we want to
> be 4x faster, we'd need to cut time per query by 75%.  That might
> require 75 different optimizations averaging 1% a piece, most likely
> none of them trivial.  I do confess I'm a bit confused as to why
> prepared statements help so much.  That is increasing the throughput
> by 80%, which is equivalent to decreasing time per query by 45%.  That
> is a surprisingly big number, and I'd like to better understand where
> all that time is going.

On my old 32-bit linux box, the difference is even bigger, 150%
increase in throughput (4000 vs 9836 tps) with using prepared
statements.

By gprof, over half of that extra time is going to planning,
specifically standard_planner and its children.  Unfortunately once
you dig down beyond that level, the time is spread all over the place,
so there is no one hot spot to focus on.

I've don't trust gprof all that much, so I've also poked at
tcop/postgres.c a bit to make it do silly things like parse the
statement repeatedly, and throw away all results but the last one (and
similar things with analyze/rewriting, and planning) and see how much
slower that makes things.  Here too the planner is the slow part.  But
by extrapolating backwards; parsing, analyzing, and planning all
together only account for 1/3 of the extra time of not using -M
prepared.  I don't know where the other 2/3 of the time is lost.  It
could be, for example, that parsing the command twice does not take
twice as long doing it once, due to L1 and instruction caching, in
which extrapolation backwards is not very reliable

But by both methods, the majority of the extra time that can be
accounted for is going to the planner.

Cheers,

Jeff


Re: eviscerating the parser

From
Robert Haas
Date:
On Sat, May 28, 2011 at 5:51 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> But by both methods, the majority of the extra time that can be
> accounted for is going to the planner.

Sounds like an argument for a plan cache.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company