Thread: Talking about optimizer, my long dream

Talking about optimizer, my long dream

From
Віталій Тимчишин
Date:
Hi, all.

All this optimizer vs hint thread reminded me about crazy idea that got to my head some time ago.
I currently has two problems with postgresql optimizer
1) Dictionary tables. Very usual thing is something like "select * from big_table where distionary_id = (select id from dictionary where name=value)". This works awful if dictionary_id distribution is not uniform. The thing that helps is to retrieve subselect value and then simply do "select * from big_table where dictionary_id=id_value".
2) Complex queries. If there are over 3 levels of subselects, optmizer counts often become less and less correct as we go up on levels. On ~3rd level this often lead to wrong choises. The thing that helps is to create temporary tables from subselects, analyze them and then do main select using this temporary tables.
While first one can be fixed by introducing some correlation statistics, I don't think there is any simple way to fix second one.

But what if optimizer could in some cases tell "fetch this and this and then I'll plan other part of the query based on statistics of what you've fetched"?

--
Best regards,
 Vitalii Tymchyshyn

Re: Talking about optimizer, my long dream

From
Mladen Gogala
Date:
Віталій Тимчишин wrote:
> Hi, all.
>
> All this optimizer vs hint thread
There is no "optimizer vs. hint". Hints are a necessary part of the
optimizer in all other databases. Without hints Postgres will not get
used in the company that I work for, period. I was willing to wait but
the fatwa against hints seems unyielding, so that's it. I am even
inclined to believe that deep down under the hood, this fatwa has an
ulterior motive, which disgusts me deeply. With hints, there would be
far fewer consulting gigs.

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Talking about optimizer, my long dream

From
Shaun Thomas
Date:
On 02/04/2011 07:56 AM, Mladen Gogala wrote:

> Hints are a necessary part of the
> optimizer in all other databases. Without hints Postgres will not get
> used in the company that I work for, period.

I've said repeatedly that EnterpriseDB, a fork of PostgreSQL, has the
hints you seek, yet you seem to enjoy berating the PostgreSQL community
as if it owes you something.

Also, we don't care if you don't use PostgreSQL. If I put something up
for free, some random guy not taking it won't exactly hurt my feelings.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Talking about optimizer, my long dream

From
Greg Smith
Date:
Mladen Gogala wrote:
> I am even inclined to believe that deep down under the hood, this
> fatwa has an ulterior motive, which disgusts me deeply. With hints,
> there would be far fewer consulting gigs.

Now you're just being rude.  Given that you have direct access to the
developers of the software, for free, on these mailing lists, the main
reason there is consulting work anyway is because some companies can't
publish their queries or data publicly.  All of us doing PostgreSQL
consulting regularly take those confidental reports and turn them into
feedback to improve the core software.  That is what our clients want,
too:  a better PostgreSQL capable of handling their problem, not just a
hacked up application that works today, but will break later once data
volume or distribution changes.

You really just don't get how open-source development works at all if
you think money is involved in why people have their respective
technical opinions on controversial subjects.  Try and hire the
sometimes leader of this particular "fatwa", Tom Lane, for a consulting
gig if you think that's where his motivation lies.  I would love to have
a recording of *that* phone call.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Talking about optimizer, my long dream

From
Mladen Gogala
Date:
Shaun Thomas wrote:
> On 02/04/2011 07:56 AM, Mladen Gogala wrote:
>
>
>> Hints are a necessary part of the
>> optimizer in all other databases. Without hints Postgres will not get
>> used in the company that I work for, period.
>>
>
> I've said repeatedly that EnterpriseDB, a fork of PostgreSQL, has the
> hints you seek, yet you seem to enjoy berating the PostgreSQL community
> as if it owes you something.
>
> Also, we don't care if you don't use PostgreSQL. If I put something up
> for free, some random guy not taking it won't exactly hurt my feelings.
>
>
Shaun, I don't need to convince you or the Postgres community. I needed
an argument to convince my boss.
My argument was that the sanctimonious and narrow minded Postgres
community is unwilling to even consider creating the tools I need for
large porting projects, tools provided by other major databases. This
discussion served my purpose wonderfully.  Project is killed, here we
part ways. No more problems for either of us. Good luck with the
"perfect optimizer" and good riddance. My only regret is about the time
I have wasted.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Talking about optimizer, my long dream

From
Frank Heikens
Date:


On 04 Feb, 2011,at 02:56 PM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:

Віталій Тимчишин wrote:
> Hi, all.
>
> All this optimizer vs hint thread
There is no "optimizer vs. hint". Hints are a necessary part of the
optimizer in all other databases.
 
That has nothing to do with PostgreSQL: PostgreSQL = PostgreSQL. And it doesn't have hints and everybody knows it.

Without hints Postgres will not get
used in the company that I work for, period.
 
That's up to you, that's fine. But why did you start with PostgreSQL in the first place? You knew PostgreSQL doesn't have hints and the wiki told you hints are not wanted as well. When hints are an essential requirement for your company, you should pick another product, EnterpriseDB Postgres Plus for example.

I was willing to wait but
the fatwa against hints seems unyielding,
 
There is no fatwa. The PostgreSQL project prefers to spend resources on a better optimizer to solve the real problems, not on hints for working around the problems. That has nothing to do with any fatwa or religion.

so that's it. I am even
inclined to believe that deep down under the hood, this fatwa has an
ulterior motive, which disgusts me deeply. With hints, there would be
far fewer consulting gigs.
 
The consulting guys are the ones who love hints: They know they have to come back the other month because the old hint does more harm than good when data changes. And data will change over time.

You said it's so simple to implement hints in PostgreSQL, so please, show us. Or ask/pay somebody to write this simple code for you to support hints, nobody will ever stop you from doing that. When you have a use case that proves the usage of hints will improve the performance of PostgreSQL and you have some code that can be maintained by the PostgreSQL project, it might be implemented in the contrib or even core. It's up to you, not somebody else.



Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Talking about optimizer, my long dream

From
Cédric Villemain
Date:
2011/2/4 Frank Heikens <frankheikens@mac.com>:
>
>
> On 04 Feb, 2011,at 02:56 PM, Mladen Gogala <mladen.gogala@vmsinfo.com>
> wrote:
>
> Віталій Тимчишин wrote:
>> Hi, all.
>>
>> All this optimizer vs hint thread
> There is no "optimizer vs. hint". Hints are a necessary part of the
> optimizer in all other databases.
>
>
> That has nothing to do with PostgreSQL: PostgreSQL = PostgreSQL. And it
> doesn't have hints and everybody knows it.
>
> Without hints Postgres will not get
> used in the company that I work for, period.
>
>
> That's up to you, that's fine. But why did you start with PostgreSQL in the
> first place? You knew PostgreSQL doesn't have hints and the wiki told you
> hints are not wanted as well. When hints are an essential requirement for
> your company, you should pick another product, EnterpriseDB Postgres Plus
> for example.
>
> I was willing to wait but
> the fatwa against hints seems unyielding,
>
>
> There is no fatwa. The PostgreSQL project prefers to spend resources on a
> better optimizer to solve the real problems, not on hints for working around
> the problems. That has nothing to do with any fatwa or religion.
>
> so that's it. I am even
> inclined to believe that deep down under the hood, this fatwa has an
> ulterior motive, which disgusts me deeply. With hints, there would be
> far fewer consulting gigs.
>
>
> The consulting guys are the ones who love hints: They know they have to come
> back the other month because the old hint does more harm than good when data
> changes. And data will change over time.
>
> You said it's so simple to implement hints in PostgreSQL, so please, show
> us. Or ask/pay somebody to write this simple code for you to support hints,
> nobody will ever stop you from doing that. When you have a use case that
> proves the usage of hints will improve the performance of PostgreSQL and you
> have some code that can be maintained by the PostgreSQL project, it might be
> implemented in the contrib or even core. It's up to you, not somebody else.

Just in case you miss it:
http://www.sai.msu.su/~megera/wiki/plantuner

Btw feel free to do how you want, it is open source, and BSD, you can
take PostgreSQL, add hints, go and sell that to your boss.


>
>
>
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Talking about optimizer, my long dream

From
Mark Kirkwood
Date:
On 05/02/11 03:36, Mladen Gogala wrote:
> Shaun, I don't need to convince you or the Postgres community. I
> needed an argument to convince my boss.
> My argument was that the sanctimonious and narrow minded Postgres
> community is unwilling to even consider creating the tools I need for
> large porting projects, tools provided by other major databases. This
> discussion served my purpose wonderfully.  Project is killed, here we
> part ways. No more problems for either of us. Good luck with the
> "perfect optimizer" and good riddance. My only regret is about the
> time I have wasted.
>

I think it is unlikely that your boss is going to dismiss Postgres on
the basis of some minor technical point (no optimizer hints). Bosses
usually (and should) care about stuff like reference sites, product
pedigree and product usage in similar sized companies to theirs.
Postgres will come out rather well if such an assessment is actually
performed I would think.

The real question you should be asking is this:

Given that there are no hints, what do I do to solve the problem of a
slow query suddenly popping up in production? If and when this situation
occurs, see how quickly the community steps in to help you solve it (and
it'd bet it will solved be very quickly indeed).

Best wishes

Mark




Re: Talking about optimizer, my long dream

From
Scott Marlowe
Date:
2011/2/4 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
> Given that there are no hints, what do I do to solve the problem of a slow
> query suddenly popping up in production? If and when this situation occurs,
> see how quickly the community steps in to help you solve it (and it'd bet it
> will solved be very quickly indeed).

That is EXACTLY what happened to me.  I had a query killing my
production box because it was running VERY long by picking the wrong
plan.  Turned out it was ignoring the number of NULLs and this led to
it thinking one access method that was wrong was the right one.  I had
a patch within 24 hours of identifying the problem, and it took me < 1
hour to have it applied and running in production.

If Oracle can patch their query planner for you in 24 hours, and you
can apply patch with confidence against your test then production
servers in an hour or so, great.  Til then I'll stick to a database
that has the absolutely, without a doubt, best coder support of any
project I've ever used.

My point in the other thread is that if you can identify a point where
a hint would help, like my situation above, you're often better off
presenting a test case here and getting a patch to make it smarter.

However, there are places where the planner just kind of guesses.  And
those are the places to attack when you find a pathological behaviour.
 Or to rewrite your query or use a functional index.

Re: Talking about optimizer, my long dream

From
Mark Kirkwood
Date:
O
> If Oracle can patch their query planner for you in 24 hours, and you
> can apply patch with confidence against your test then production
> servers in an hour or so, great.  Til then I'll stick to a database
> that has the absolutely, without a doubt, best coder support of any
> project I've ever used.
>
> My point in the other thread is that if you can identify a point where
> a hint would help, like my situation above, you're often better off
> presenting a test case here and getting a patch to make it smarter.
>

By way of contrast - I had a similar situation with DB2 (a few years
ago) with a bad plan being chosen for BETWEEN predicates in some cases.
I found myself having to spend about a hour or two a week chasing the
support organization for - wait for it - 6 months to get a planner patch!

Re: Talking about optimizer, my long dream

From
Mladen Gogala
Date:
Please, don't include me on your emails. I unsubscribed from the list.


Cédric Villemain wrote:
> 2011/2/4 Frank Heikens <frankheikens@mac.com>:
>
>> On 04 Feb, 2011,at 02:56 PM, Mladen Gogala <mladen.gogala@vmsinfo.com>
>> wrote:
>>
>> Віталій Тимчишин wrote:
>>
>>> Hi, all.
>>>
>>> All this optimizer vs hint thread
>>>
>> There is no "optimizer vs. hint". Hints are a necessary part of the
>> optimizer in all other databases.
>>
>>
>> That has nothing to do with PostgreSQL: PostgreSQL = PostgreSQL. And it
>> doesn't have hints and everybody knows it.
>>
>> Without hints Postgres will not get
>> used in the company that I work for, period.
>>
>>
>> That's up to you, that's fine. But why did you start with PostgreSQL in the
>> first place? You knew PostgreSQL doesn't have hints and the wiki told you
>> hints are not wanted as well. When hints are an essential requirement for
>> your company, you should pick another product, EnterpriseDB Postgres Plus
>> for example.
>>
>> I was willing to wait but
>> the fatwa against hints seems unyielding,
>>
>>
>> There is no fatwa. The PostgreSQL project prefers to spend resources on a
>> better optimizer to solve the real problems, not on hints for working around
>> the problems. That has nothing to do with any fatwa or religion.
>>
>> so that's it. I am even
>> inclined to believe that deep down under the hood, this fatwa has an
>> ulterior motive, which disgusts me deeply. With hints, there would be
>> far fewer consulting gigs.
>>
>>
>> The consulting guys are the ones who love hints: They know they have to come
>> back the other month because the old hint does more harm than good when data
>> changes. And data will change over time.
>>
>> You said it's so simple to implement hints in PostgreSQL, so please, show
>> us. Or ask/pay somebody to write this simple code for you to support hints,
>> nobody will ever stop you from doing that. When you have a use case that
>> proves the usage of hints will improve the performance of PostgreSQL and you
>> have some code that can be maintained by the PostgreSQL project, it might be
>> implemented in the contrib or even core. It's up to you, not somebody else.
>>
>
> Just in case you miss it:
> http://www.sai.msu.su/~megera/wiki/plantuner
>
> Btw feel free to do how you want, it is open source, and BSD, you can
> take PostgreSQL, add hints, go and sell that to your boss.
>
>
>
>>
>> Mladen Gogala
>> Sr. Oracle DBA
>> 1500 Broadway
>> New York, NY 10036
>> (212) 329-5251
>> www.vmsinfo.com
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>>
>
>
>
>


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Talking about optimizer, my long dream

From
Achilleas Mantzios
Date:
Greg, 1st off, thanx for your great book, and i really hope i find the time to read it
thoroughly. (since i am still stuck somewhere in the middle of "Administration Cookbook" lol!)
Well, people, speaking from the point of the occasional poster and frequent lurker
i can see that smth is going a little bit out of hand in the lists.
I remember my own thread 1-2 weeks ago about NOT IN working much better
in 8.3 than 9.0, how much i was trying to convince people that it was not
FreeBSD related, nor other setting related, trying to convince mladen
that i was not cheating with the explain analyze i posted, trying to answer politely to Tom
who was asking me to post smth that i had already posted 4-5 times till then,
and i can feel the agony of certain members here.
Before i moved the thread from -admin over to -performance i had certain issues at my home's FreeBSD mail server.
Thank God, some reply i wrote one late night on -admin didn't make it to the list.
Anyways, that's open source. Great products, access to source and knowledge at a negligible cost come with a price.
So here is my advice to people in similar situations (me mainly!) : take a deep breath, dont hit the send button unless
youare  
100% certain you have smth new/positive to say, take some time to do more homework from your part, even try
to read/hack the source, etc...
If this is not possible, then its better to seek for alternatives rather than turning angry.

just my 3 euros!

Στις Friday 04 February 2011 16:27:55 ο/η Greg Smith έγραψε:
> Mladen Gogala wrote:
> > I am even inclined to believe that deep down under the hood, this
> > fatwa has an ulterior motive, which disgusts me deeply. With hints,
> > there would be far fewer consulting gigs.
>
> Now you're just being rude.  Given that you have direct access to the
> developers of the software, for free, on these mailing lists, the main
> reason there is consulting work anyway is because some companies can't
> publish their queries or data publicly.  All of us doing PostgreSQL
> consulting regularly take those confidental reports and turn them into
> feedback to improve the core software.  That is what our clients want,
> too:  a better PostgreSQL capable of handling their problem, not just a
> hacked up application that works today, but will break later once data
> volume or distribution changes.
>
> You really just don't get how open-source development works at all if
> you think money is involved in why people have their respective
> technical opinions on controversial subjects.  Try and hire the
> sometimes leader of this particular "fatwa", Tom Lane, for a consulting
> gig if you think that's where his motivation lies.  I would love to have
> a recording of *that* phone call.
>
> --
> Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>



--
Achilleas Mantzios

Re: Talking about optimizer, my long dream

From
Robert Haas
Date:
2011/2/4 Віталій Тимчишин <tivv00@gmail.com>:
> Hi, all.
> All this optimizer vs hint thread reminded me about crazy idea that got to
> my head some time ago.
> I currently has two problems with postgresql optimizer
> 1) Dictionary tables. Very usual thing is something like "select * from
> big_table where distionary_id = (select id from dictionary where
> name=value)". This works awful if dictionary_id distribution is not uniform.

Does it work better if you write it as a join?

SELECT b.* FROM big_table b, dictionary d WHERE b.dictionary_id = d.id
AND d.name = 'value'

I would like to see a concrete example of this not working well,
because I've been writing queries like this (with MANY tables) for
years and it's usually worked very well for me.

> The thing that helps is to retrieve subselect value and then simply do
> "select * from big_table where dictionary_id=id_value".
> 2) Complex queries. If there are over 3 levels of subselects, optmizer
> counts often become less and less correct as we go up on levels. On ~3rd
> level this often lead to wrong choises. The thing that helps is to create
> temporary tables from subselects, analyze them and then do main select using
> this temporary tables.
> While first one can be fixed by introducing some correlation statistics, I
> don't think there is any simple way to fix second one.
> But what if optimizer could in some cases tell "fetch this and this and then
> I'll plan other part of the query based on statistics of what you've
> fetched"?

I've had that thought, too.  It's pretty hard to see how to make ti
work, but I think there are cases where it could be beneficial.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Talking about optimizer, my long dream

From
Віталій Тимчишин
Date:


27 лютого 2011 р. 19:59 Robert Haas <robertmhaas@gmail.com> написав:
2011/2/4 Віталій Тимчишин <tivv00@gmail.com>:
> Hi, all.
> All this optimizer vs hint thread reminded me about crazy idea that got to
> my head some time ago.
> I currently has two problems with postgresql optimizer
> 1) Dictionary tables. Very usual thing is something like "select * from
> big_table where distionary_id = (select id from dictionary where
> name=value)". This works awful if dictionary_id distribution is not uniform.

Does it work better if you write it as a join?

SELECT b.* FROM big_table b, dictionary d WHERE b.dictionary_id = d.id
AND d.name = 'value'

I would like to see a concrete example of this not working well,
because I've been writing queries like this (with MANY tables) for
years and it's usually worked very well for me.

Here you are:
 PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
create table a(dict int4, val int4);
create table b(dict int4, name text);
create index c on a(dict);
insert into b values (1, 'small'), (2, 'large');
insert into a values (1,1);
insert into a select 2,generate_series(1,10000);
analyze a;
analyze b;
test=# explain analyze select * from a where dict=1;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Index Scan using c on a  (cost=0.00..8.27 rows=1 width=8) (actual time=0.014..0.016 rows=1 loops=1)
   Index Cond: (dict = 1)
 Total runtime: 0.041 ms
(3 rows)
test=# explain analyze select * from a where dict=2;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..170.01 rows=10000 width=8) (actual time=0.014..6.876 rows=10000 loops=1)
   Filter: (dict = 2)
 Total runtime: 13.419 ms
(3 rows)
test=# explain analyze select * from a,b where a.dict=b.dict and b.name='small'; 
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.04..233.55 rows=5000 width=18) (actual time=0.047..13.159 rows=1 loops=1)
   Hash Cond: (a.dict = b.dict)
   ->  Seq Scan on a  (cost=0.00..145.01 rows=10001 width=8) (actual time=0.009..6.633 rows=10001 loops=1)
   ->  Hash  (cost=1.02..1.02 rows=1 width=10) (actual time=0.011..0.011 rows=1 loops=1)
         ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual time=0.006..0.008 rows=1 loops=1)
               Filter: (name = 'small'::text)
 Total runtime: 13.197 ms
(7 rows)
test=# explain analyze select * from a,b where a.dict=b.dict and b.name='large';
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.04..233.55 rows=5000 width=18) (actual time=0.074..21.476 rows=10000 loops=1)
   Hash Cond: (a.dict = b.dict)
   ->  Seq Scan on a  (cost=0.00..145.01 rows=10001 width=8) (actual time=0.012..7.085 rows=10001 loops=1)
   ->  Hash  (cost=1.02..1.02 rows=1 width=10) (actual time=0.021..0.021 rows=1 loops=1)
         ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual time=0.015..0.016 rows=1 loops=1)
               Filter: (name = 'large'::text)
 Total runtime: 28.293 ms
(7 rows)

It simply don't know that small=1 and large=2, so it never uses nested loop + iindex scan:
test=# set enable_hashjoin=false;
SET
test=# explain analyze select * from a,b where a.dict=b.dict and b.name='small';
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..253.28 rows=5000 width=18) (actual time=0.041..0.047 rows=1 loops=1)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual time=0.010..0.012 rows=1 loops=1)
         Filter: (name = 'small'::text)
   ->  Index Scan using c on a  (cost=0.00..189.75 rows=5000 width=8) (actual time=0.021..0.023 rows=1 loops=1)
         Index Cond: (a.dict = b.dict)
 Total runtime: 0.089 ms
(6 rows)

--
Best regards,
 Vitalii Tymchyshyn

Re: Talking about optimizer, my long dream

From
Robert Haas
Date:
2011/2/27 Віталій Тимчишин <tivv00@gmail.com>:
>
>
> 27 лютого 2011 р. 19:59 Robert Haas <robertmhaas@gmail.com> написав:
>>
>> 2011/2/4 Віталій Тимчишин <tivv00@gmail.com>:
>> > Hi, all.
>> > All this optimizer vs hint thread reminded me about crazy idea that got
>> > to
>> > my head some time ago.
>> > I currently has two problems with postgresql optimizer
>> > 1) Dictionary tables. Very usual thing is something like "select * from
>> > big_table where distionary_id = (select id from dictionary where
>> > name=value)". This works awful if dictionary_id distribution is not
>> > uniform.
>>
>> Does it work better if you write it as a join?
>>
>> SELECT b.* FROM big_table b, dictionary d WHERE b.dictionary_id = d.id
>> AND d.name = 'value'
>>
>> I would like to see a concrete example of this not working well,
>> because I've been writing queries like this (with MANY tables) for
>> years and it's usually worked very well for me.
>>
> Here you are:
>  PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
> create table a(dict int4, val int4);
> create table b(dict int4, name text);
> create index c on a(dict);
> insert into b values (1, 'small'), (2, 'large');
> insert into a values (1,1);
> insert into a select 2,generate_series(1,10000);
> analyze a;
> analyze b;
> test=# explain analyze select * from a where dict=1;
>                                              QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------
>  Index Scan using c on a  (cost=0.00..8.27 rows=1 width=8) (actual
> time=0.014..0.016 rows=1 loops=1)
>    Index Cond: (dict = 1)
>  Total runtime: 0.041 ms
> (3 rows)
> test=# explain analyze select * from a where dict=2;
>                                              QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------
>  Seq Scan on a  (cost=0.00..170.01 rows=10000 width=8) (actual
> time=0.014..6.876 rows=10000 loops=1)
>    Filter: (dict = 2)
>  Total runtime: 13.419 ms
> (3 rows)
> test=# explain analyze select * from a,b where a.dict=b.dict and
> b.name='small';
>                                                 QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=1.04..233.55 rows=5000 width=18) (actual
> time=0.047..13.159 rows=1 loops=1)
>    Hash Cond: (a.dict = b.dict)
>    ->  Seq Scan on a  (cost=0.00..145.01 rows=10001 width=8) (actual
> time=0.009..6.633 rows=10001 loops=1)
>    ->  Hash  (cost=1.02..1.02 rows=1 width=10) (actual time=0.011..0.011
> rows=1 loops=1)
>          ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual
> time=0.006..0.008 rows=1 loops=1)
>                Filter: (name = 'small'::text)
>  Total runtime: 13.197 ms
> (7 rows)
> test=# explain analyze select * from a,b where a.dict=b.dict and
> b.name='large';
>                                                 QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=1.04..233.55 rows=5000 width=18) (actual
> time=0.074..21.476 rows=10000 loops=1)
>    Hash Cond: (a.dict = b.dict)
>    ->  Seq Scan on a  (cost=0.00..145.01 rows=10001 width=8) (actual
> time=0.012..7.085 rows=10001 loops=1)
>    ->  Hash  (cost=1.02..1.02 rows=1 width=10) (actual time=0.021..0.021
> rows=1 loops=1)
>          ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual
> time=0.015..0.016 rows=1 loops=1)
>                Filter: (name = 'large'::text)
>  Total runtime: 28.293 ms
> (7 rows)
> It simply don't know that small=1 and large=2, so it never uses nested loop
> + iindex scan:
> test=# set enable_hashjoin=false;
> SET
> test=# explain analyze select * from a,b where a.dict=b.dict and
> b.name='small';
>                                                    QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..253.28 rows=5000 width=18) (actual
> time=0.041..0.047 rows=1 loops=1)
>    ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual
> time=0.010..0.012 rows=1 loops=1)
>          Filter: (name = 'small'::text)
>    ->  Index Scan using c on a  (cost=0.00..189.75 rows=5000 width=8)
> (actual time=0.021..0.023 rows=1 loops=1)
>          Index Cond: (a.dict = b.dict)
>  Total runtime: 0.089 ms
> (6 rows)

Oh, I see.  Interesting example.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Talking about optimizer, my long dream

From
Merlin Moncure
Date:
2011/2/4 Mladen Gogala <mladen.gogala@vmsinfo.com>:
> Віталій Тимчишин wrote:
>>
>> Hi, all.
>>
>> All this optimizer vs hint thread
>
> There is no "optimizer vs. hint". Hints are a necessary part of the
> optimizer in all other databases. Without hints Postgres will not get used
> in the company that I work for, period. I was willing to wait but the fatwa
> against hints seems unyielding, so that's it. I am even inclined to believe
> that deep down under the hood, this fatwa has an ulterior motive, which
> disgusts me deeply. With hints, there would be far fewer consulting gigs.
>
> Mladen Gogala Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com

Ah, that's too bad...we really will miss you here.  With luck and a
good helping of Oracle expertise you should be finally be able to get
that 14 record table under control!

merlin