Thread: Useless sort by

Useless sort by

From
Gaetano Mendola
Date:
Hi all,
I have a view v_table defined as following:

select a,b,c,d,e,f
from  t_table
sort by  a,b,c;

the usage pattern of this view is the following:

select distinct(a) from v_table;
select distinct(b) from v_table where a = "XXX";
select distinct(c) from v_table where a = "XXX" and b = "YYYY";

because of that sort in the view definition the first query above
takes not less than 3 seconds. I have solved this performance issue
removing the sort from the view definition and putting it in the
select reducing the time from > 3secons to < 150ms.

Can not the optimizer take rid of that useless sort on those
kind of queries ?


Regards
Gaetano Mendola




Re: Useless sort by

From
Tom Lane
Date:
Gaetano Mendola <mendola@gmail.com> writes:
> because of that sort in the view definition the first query above
> takes not less than 3 seconds. I have solved this performance issue
> removing the sort from the view definition and putting it in the
> select reducing the time from > 3secons to < 150ms.

> Can not the optimizer take rid of that useless sort on those
> kind of queries ?

It is not the optimizer's job to second-guess the user on whether a sort
is really needed there.  If we did make it throw away non-top-level
sorts, we'd have hundreds of users screaming loudly.

            regards, tom lane

Re: Useless sort by

From
Gaetano Mendola
Date:
On 09/13/2010 04:44 PM, Tom Lane wrote:
> Gaetano Mendola <mendola@gmail.com> writes:
>> because of that sort in the view definition the first query above
>> takes not less than 3 seconds. I have solved this performance issue
>> removing the sort from the view definition and putting it in the
>> select reducing the time from > 3secons to < 150ms.
>
>> Can not the optimizer take rid of that useless sort on those
>> kind of queries ?
>
> It is not the optimizer's job to second-guess the user on whether a sort
> is really needed there.  If we did make it throw away non-top-level
> sorts, we'd have hundreds of users screaming loudly.

Of course I'm not suggesting to take away the "sort by" and give the user
an unsorted result, I'm asking why the the optimizer in cases like:

   select unique(a) from v_table_with_order_by;

doesn't takes away the "order by" inside the view and puts it back "rewriting the
query like this:

   select unique(a) from v_table_without_order_by
   order by a;

then the user will not know about it. The result is the same but 30 times
faster (in my case).

Regards
Gaetano Mendola



Re: Useless sort by

From
Tom Lane
Date:
Gaetano Mendola <mendola@gmail.com> writes:
> Of course I'm not suggesting to take away the "sort by" and give the user
> an unsorted result, I'm asking why the the optimizer in cases like:

>    select unique(a) from v_table_with_order_by;

> doesn't takes away the "order by" inside the view and puts it back "rewriting the
> query like this:

>    select unique(a) from v_table_without_order_by
>    order by a;

That changes the order in which the rows are fed to unique(a).  The
principal real-world use for a non-top-level ORDER BY is exactly to
determine the order in which rows are fed to a function, so we will
have a revolt on our hands if we break that.

            regards, tom lane

Re: Useless sort by

From
Gaetano Mendola
Date:
On Mon, Sep 13, 2010 at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gaetano Mendola <mendola@gmail.com> writes:
>> Of course I'm not suggesting to take away the "sort by" and give the user
>> an unsorted result, I'm asking why the the optimizer in cases like:
>
>>    select unique(a) from v_table_with_order_by;
>
>> doesn't takes away the "order by" inside the view and puts it back "rewriting the
>> query like this:
>
>>    select unique(a) from v_table_without_order_by
>>    order by a;
>
> That changes the order in which the rows are fed to unique(a).  The
> principal real-world use for a non-top-level ORDER BY is exactly to
> determine the order in which rows are fed to a function, so we will
> have a revolt on our hands if we break that.

I see your point, but some functions like:  unique, count are not affected
by the order of values fed, and I don't think either that unique has to
give out the unique values in the same fed order.


Regards
Gaetano Mendola

--
cpp-today.blogspot.com

Re: Useless sort by

From
Maciek Sakrejda
Date:
> I see your point, but some functions like:  unique, count are not affected
> by the order of values fed, and I don't think either that unique has to
> give out the unique values in the same fed order.

Sure. You'd need additional metadata about which aggregates care about
sort order and which don't. Our system is more sensitive to this sort
of thing and so we've actually implemented this, but in the absence of
this "order-sensitive" flag, you have to assume sorts matter (or
you're leaving a *lot* of room for shooting yourself in the foot).

Even with this, it seems a little dodgy to mess up sort order in a
top-level query. Relational databases are ostensibly relational, but I
imagine in practice, it may be a toss-up in the trade-off between the
performance benefits of what you are suggesting and the breaking of
implicit non-relational behaviors that users have been taking for
granted.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Useless sort by

From
Scott Marlowe
Date:
On Mon, Sep 13, 2010 at 11:09 AM, Gaetano Mendola <mendola@gmail.com> wrote:
> On Mon, Sep 13, 2010 at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Gaetano Mendola <mendola@gmail.com> writes:
>>> Of course I'm not suggesting to take away the "sort by" and give the user
>>> an unsorted result, I'm asking why the the optimizer in cases like:
>>
>>>    select unique(a) from v_table_with_order_by;
>>
>>> doesn't takes away the "order by" inside the view and puts it back "rewriting the
>>> query like this:
>>
>>>    select unique(a) from v_table_without_order_by
>>>    order by a;
>>
>> That changes the order in which the rows are fed to unique(a).  The
>> principal real-world use for a non-top-level ORDER BY is exactly to
>> determine the order in which rows are fed to a function, so we will
>> have a revolt on our hands if we break that.
>
> I see your point, but some functions like:  unique, count are not affected
> by the order of values fed, and I don't think either that unique has to
> give out the unique values in the same fed order.

First off, having a top level order by in a view is considered poor
practice.  It adds an overhead you may or may not need each time the
view is accessed, and there's no simple way to avoid it once it's in
there.

On top of that you'd be adding complexity to the planner that would
make it slower and more likely to make mistakes, all to fix a problem
that I and most others don't have.

--
To understand recursion, one must first understand recursion.

Re: Useless sort by

From
Heikki Linnakangas
Date:
On 13/09/10 19:48, Tom Lane wrote:
> Gaetano Mendola<mendola@gmail.com>  writes:
>> Of course I'm not suggesting to take away the "sort by" and give the user
>> an unsorted result, I'm asking why the the optimizer in cases like:
>
>>     select unique(a) from v_table_with_order_by;
>
>> doesn't takes away the "order by" inside the view and puts it back "rewriting the
>> query like this:
>
>>     select unique(a) from v_table_without_order_by
>>     order by a;
>
> That changes the order in which the rows are fed to unique(a).  The
> principal real-world use for a non-top-level ORDER BY is exactly to
> determine the order in which rows are fed to a function, so we will
> have a revolt on our hands if we break that.

You could check for volatile functions. I think this could be done
safely. However, it doesn't seem worthwhile, it would be a fair amount
of code, and it's not usually a good idea to put an ORDER BY in a view
or subquery anyway unless you also have volatile functions in there, or
you want to coerce the optimizer to choose a certain plan.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Useless sort by

From
Maciek Sakrejda
Date:
> You could check for volatile functions. I think this could be done safely.

I don't think that's enough. A UDA like last() could have an immutable
sfunc, but still be sensitive to the sort order. I think you'd need
something like a special order-sensitive aggregate definition flag.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Useless sort by

From
Dave Crooke
Date:
I presume there is more usage of this view than just those 3 queries (otherwise, for a start there would be no need for d, e, f in the view definition)

Why not just rewrite these 3 queries to go directly off the main table? Or, create a different view without the sort_by in its definition?

Or, if these are used very frequently and performance is critical, consider (i) caching these results in the application layer, with logic to understand when they need to be updated, or (b) maintaining extra tables that just contain (a) (a,b) and (a,b,c)

Objectively, it's always better to optimize the SQL and application level for the specific needs of the situation before concluding that the underlying database engine should do these optimizations automatically, and it seems like there are a number of options you could explore here.

Cheers
Dave

On Mon, Sep 13, 2010 at 4:47 AM, Gaetano Mendola <mendola@gmail.com> wrote:
Hi all,
I have a view v_table defined as following:

select a,b,c,d,e,f
from  t_table
sort by  a,b,c;

the usage pattern of this view is the following:

select distinct(a) from v_table;
select distinct(b) from v_table where a = "XXX";
select distinct(c) from v_table where a = "XXX" and b = "YYYY";

because of that sort in the view definition the first query above
takes not less than 3 seconds. I have solved this performance issue
removing the sort from the view definition and putting it in the
select reducing the time from > 3secons to < 150ms.

Can not the optimizer take rid of that useless sort on those
kind of queries ?


Regards
Gaetano Mendola




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Useless sort by

From
Gaetano Mendola
Date:
On Tue, Sep 14, 2010 at 6:15 PM, Dave Crooke <dcrooke@gmail.com> wrote:
> I presume there is more usage of this view than just those 3 queries
> (otherwise, for a start there would be no need for d, e, f in the view
> definition)
>
> Why not just rewrite these 3 queries to go directly off the main table? Or,
> create a different view without the sort_by in its definition?
>
> Or, if these are used very frequently and performance is critical, consider
> (i) caching these results in the application layer, with logic to understand
> when they need to be updated, or (b) maintaining extra tables that just
> contain (a) (a,b) and (a,b,c)
>
> Objectively, it's always better to optimize the SQL and application level
> for the specific needs of the situation before concluding that the
> underlying database engine should do these optimizations automatically, and
> it seems like there are a number of options you could explore here.

Question here is not how to do it right, but how to make the optimizer smarter
than it is now, taking rid of work not needed.

Regards
Gaetano Mendola

--
cpp-today.blogspot.com

Re: Useless sort by

From
Robert Haas
Date:
On Mon, Sep 13, 2010 at 1:09 PM, Gaetano Mendola <mendola@gmail.com> wrote:
> I see your point, but some functions like:  unique, count are not affected
> by the order of values fed, and I don't think either that unique has to
> give out the unique values in the same fed order.

Gee, I'd sure expect it to.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Useless sort by

From
Date:

---- Original message ----
>Date: Wed, 22 Sep 2010 20:54:22 -0400
>From: pgsql-performance-owner@postgresql.org (on behalf of Robert Haas <robertmhaas@gmail.com>)
>Subject: Re: [PERFORM] Useless sort by
>To: Gaetano Mendola <mendola@gmail.com>
>Cc: Tom Lane <tgl@sss.pgh.pa.us>,pgsql-performance@postgresql.org
>
>On Mon, Sep 13, 2010 at 1:09 PM, Gaetano Mendola <mendola@gmail.com> wrote:
>> I see your point, but some functions like:  unique, count are not affected
>> by the order of values fed, and I don't think either that unique has to
>> give out the unique values in the same fed order.
>
>Gee, I'd sure expect it to.

Spoken like a dyed in the wool COBOL coder.  The RM has no need for order; it's set based.  I've dabbled in PG for some
time,and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented.   

robert
>
>--
>Robert Haas
>EnterpriseDB: http://www.enterprisedb.com
>The Enterprise Postgres Company
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

Re: Useless sort by

From
Greg Smith
Date:
gnuoytr@rcn.com wrote:
> Spoken like a dyed in the wool COBOL coder.  The RM has no need for order; it's set based.  I've dabbled in PG for
sometime, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented.   
>

I can't tell if you meant for this to be insulting or my reading it that
way is wrong, but it certainly wasn't put in a helpful tone.  Let me
summarize for you.  You've been told that putting ORDER BY into a view
is a generally poor idea anyway, that it's better to find ways avoid
this class of concern altogether.  There are significant non-obvious
technical challenges behind actually implementing the behavior you'd
like to see; the concerns raised by Tom and Maciek make your idea
impractical even if it were desired.  And for every person like yourself
who'd see the benefit you're looking for, there are far more that would
find a change in this area a major problem.  The concerns around
breakage due to assumed but not required aspects of the relational model
are the ones the users of the software will be confused by, not the
developers of it.  You have the classification wrong; the feedback
you've gotten here is from the developers being user oriented, not
theory oriented or code oriented.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


Re: Useless sort by

From
Scott Marlowe
Date:
On Wed, Sep 22, 2010 at 10:01 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> gnuoytr@rcn.com wrote:
>>
>> Spoken like a dyed in the wool COBOL coder.  The RM has no need for order;
>> it's set based.  I've dabbled in PG for some time, and my sense is
>> increasingly that PG developers are truly code oriented, not database (set)
>> oriented.
>
> I can't tell if you meant for this to be insulting or my reading it that way
> is wrong, but it certainly wasn't put in a helpful tone.  Let me summarize
> for you.  You've been told that putting ORDER BY into a view is a generally
> poor idea anyway, that it's better to find ways avoid this class of concern
> altogether.

It's been a few years since I've read the SQL spec, but doesn't it
actually forbid order by in views but pgsql allows it anyway?

Like you said, order by in a view is a bad practice to get into, and
it's definitely not what a "set oriented" person would do.  it's what
a code oriented person would do.

Re: Useless sort by

From
Robert Haas
Date:
On Wed, Sep 22, 2010 at 11:05 PM,  <gnuoytr@rcn.com> wrote:
> Spoken like a dyed in the wool COBOL coder.  The RM has no need for order; it's set based.  I've dabbled in PG for
sometime, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented. 

I'm struggling to think of an adequate response to this.  I think I'm
going to go with: huh?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Useless sort by

From
Date:
I can't tell if you meant for this to be insulting or my reading it that way is wrong, but it certainly wasn't put in a
helpfultone.  Let me summarize for you.  You've been told that putting ORDER BY into a view is a generally poor idea
anyway,that it's better to find ways avoid this class of concern altogether.  There are significant non-obvious
technicalchallenges behind actually implementing the behavior you'd like to see; the concerns raised by Tom and Maciek
makeyour idea impractical even if it were desired.  And for every person like yourself who'd see the benefit you're
lookingfor, there are far more that would find a change in this area a major problem.  The concerns around breakage due
toassumed but not required aspects of the relational model are the ones the users of the software will be confused by,
notthe developers of it.  You have the classification wrong; the feedback you've gotten here is from the developers
beinguser oriented, not theory oriented or  
 c!
ode oriented.
--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


Not insulting, just amused bemusement.  PG portrays itself as the best OS database, which it may well be.  But it does
soby stressing the row-by-agonizing-row approach to data.  In other words, as just a record paradigm filestore for
COBOL/java/Ccoders.  I was expecting more Relational oomph.  As Dr. Codd says:  "A Relational Model of Data for Large
SharedData Banks".  Less code, more data. 

robert

Re: Useless sort by

From
Scott Marlowe
Date:
On Thu, Sep 23, 2010 at 7:51 AM,  <gnuoytr@rcn.com> wrote:
> Not insulting, just amused bemusement.  PG portrays itself as the best OS database, which it may well be.  But it
doesso by stressing the row-by-agonizing-row approach to data.  In other words, as just a record paradigm filestore for
COBOL/java/Ccoders.  I was expecting more Relational oomph.  As Dr. Codd says:  "A Relational Model of Data for Large
SharedData Banks".  Less code, more data. 

So what, exactly, would give pgsql more relationally "oomph"?

Your assertion feels pretty hand wavy right now.

--
To understand recursion, one must first understand recursion.

Re: Useless sort by

From
Gaetano Mendola
Date:
On 09/23/2010 05:05 AM, gnuoytr@rcn.com wrote:
> Spoken like a dyed in the wool COBOL coder.  The RM has no need for order; it's set based.  I've dabbled in PG for
sometime, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented.   

That's a bit harsh. Your sense if fooling you.

Regards
Gaetano Mendola