Thread: AW: [HACKERS] Really slow query on 6.4.2

AW: [HACKERS] Really slow query on 6.4.2

From
Zeugswetter Andreas IZ5
Date:
<br /><ul><p><font face="Fixedsys" size="2">Unfortunately, if you haven't done a vacuum, the system
effectively</font><br/><font face="Fixedsys" size="2">assumes that all your tables are tiny.  I think this is a
brain-dead</font><br/><font face="Fixedsys" size="2">default, but haven't had much luck convincing anyone else that
the</font><br/><font face="Fixedsys" size="2">default should be changed.</font></ul><p><font color="#0000FF"
face="Fixedsys"size="2">I totally agree with Tom Lane here. Let me try to give some arguments.</font><br />       
                                                                       <br /><font color="#0000FF" face="Fixedsys"
size="2">1.If you have a user that does vacuum analyze regularly, we can</font><br /><font color="#0000FF"
face="Fixedsys"size="2">convince him to do vacuum analyze right after table creation, if he</font><br /><font
color="#0000FF"face="Fixedsys" size="2">knows the table will be tiny.</font><p><font color="#0000FF" face="Fixedsys"
size="2">2.We have an application where the size of 20 tables changes from </font><br /><font color="#0000FF"
face="Fixedsys"size="2">0 to ~200000 rows in 3 hours. To have accurate statistics during the day we would need to
analyzeat least every 20 min.</font><p><font color="#0000FF" face="Fixedsys" size="2">This was not acceptable during
those3 hours.</font><br /><font color="#0000FF" face="Fixedsys" size="2">So we took the approach to tune the sql to
workproperly without ever</font><br /><font color="#0000FF" face="Fixedsys" size="2">doing statistics.</font><br
/><fontcolor="#0000FF" face="Fixedsys" size="2">This works perfectly on our Informix installation, since Informix
has</font><br/><font color="#0000FF" face="Fixedsys" size="2">a tuning parameter, that tells it, that an index has to
beused iff</font><br /><font color="#0000FF" face="Fixedsys" size="2">possible even if cost is higher, and the default
fortable size is 100.</font><p><font color="#0000FF" face="Fixedsys" size="2">3. There are two types of popular
optimizers,rule and cost based. </font><br /><font color="#0000FF" face="Fixedsys" size="2">A good approach is to
behaverule based lacking statistics and cost</font><br /><font color="#0000FF" face="Fixedsys" size="2">based with
statistics.An easy way to achieve this is to choose</font><br /><font color="#0000FF" face="Fixedsys"
size="2">reasonabledefaults for the statistics before accurate statistics </font><br /><font color="#0000FF"
face="Fixedsys"size="2">are made.</font><p><font color="#0000FF" face="Fixedsys" size="2">4. Those doing statistics
willmost likely not leave out a few tables, thus creating an undefined state where the optimizer would behave
rule</font><p><fontcolor="#0000FF" face="Fixedsys" size="2">and cost based.</font><p><font color="#0000FF"
face="Fixedsys"size="2">5. Actually postgresql has behaved in this manner because of certain</font><br /><font
color="#0000FF"face="Fixedsys" size="2">"bugs" in the optimizer. Recently a lot of those "bugs" have been</font><br
/><fontcolor="#0000FF" face="Fixedsys" size="2">identified and "fixed", thus destroying the defacto rule
based</font><br/><font color="#0000FF" face="Fixedsys" size="2">behavior.</font><p><font color="#0000FF"
face="Fixedsys"size="2">If the defaults are not changed, behavior of the overall system will</font><br /><font
color="#0000FF"face="Fixedsys" size="2">actually be changed for the case where statistics are lacking, when
the</font><br/><font color="#0000FF" face="Fixedsys" size="2">optimizer is improved to actually behave cost based under
all</font><br /><font color="#0000FF" face="Fixedsys" size="2">circumstances.</font><p><font color="#0000FF"
face="Fixedsys"size="2">Andreas</font> 

Re: [HACKERS] Really slow query on 6.4.2

From
Tom Lane
Date:
Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
> 5. Actually postgresql has behaved in this manner because of certain
> "bugs" in the optimizer. Recently a lot of those "bugs" have been
> identified and "fixed", thus destroying the defacto rule based
> behavior.

That's a real good point --- I think we've already heard a couple of
complaints about the new optimizer doing "silly" things that it didn't
use to do.

I repeat my proposal: CREATE TABLE should insert a default size (say
about 1000 tuples) into pg_class.reltuples, rather than inserting 0.
That way, the optimizer will only choose small-table-oriented plans
if the table has actually been verified to be small by vacuum.
        regards, tom lane


Re: AW: [HACKERS] Really slow query on 6.4.2

From
The Hermit Hacker
Date:
On Thu, 25 Mar 1999, Zeugswetter Andreas IZ5 wrote:

> 
> > Unfortunately, if you haven't done a vacuum, the system effectively
> > assumes that all your tables are tiny.  I think this is a brain-dead
> > default, but haven't had much luck convincing anyone else that the
> > default should be changed.
> > 
> I totally agree with Tom Lane here. Let me try to give some arguments.

Maybe I've missed something here, but I don't think anyone disagree's that
our stats aren't the best, but I also don't think anyone has step'd up and
provided an alternative...have they?

Personally, I'd like to see some method where stats can, to a certain
extent, be updated automagically, when changes are made to the table.  The
generated stats wouldn't *replace* vacuum, just reduce the overall need
for them.

I'm not sure what is all contained in the stats, but the easiest one, I
think, to have done automagically is table sizes...add a tuple, update the
table of number of rows automatically.  If that numbers gets "off", at
least it will be more reasonable then not doing anything...no?


Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Really slow query on 6.4.2

From
Bruce Momjian
Date:
> Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
> > 5. Actually postgresql has behaved in this manner because of certain
> > "bugs" in the optimizer. Recently a lot of those "bugs" have been
> > identified and "fixed", thus destroying the defacto rule based
> > behavior.
> 
> That's a real good point --- I think we've already heard a couple of
> complaints about the new optimizer doing "silly" things that it didn't
> use to do.
> 
> I repeat my proposal: CREATE TABLE should insert a default size (say
> about 1000 tuples) into pg_class.reltuples, rather than inserting 0.
> That way, the optimizer will only choose small-table-oriented plans
> if the table has actually been verified to be small by vacuum.

OK.  Sounds good to me.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: AW: [HACKERS] Really slow query on 6.4.2

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> I'm not sure what is all contained in the stats, but the easiest one, I
> think, to have done automagically is table sizes...add a tuple, update the
> table of number of rows automatically.  If that numbers gets "off", at
> least it will be more reasonable then not doing anything...no?

The number of tuples is definitely the most important stat; updating it
automatically would make the optimizer work better.  The stuff in
pg_statistics is not nearly as important.

The only objection I can think of to auto-updating reltuples is that
it'd mean additional computation (to access and rewrite the pg_class
entry) and additional disk I/O (to write back pg_class) for every INSERT
and DELETE.  There's also a potential problem of multiple backends all
trying to write pg_class and being delayed or even deadlocked because of
it.  (Perhaps the MVCC code will help here.)

I'm not convinced that accurate stats are worth that cost, but I don't
know how big the cost would be anyway.  Anyone have a feel for it?
        regards, tom lane


Re: AW: [HACKERS] Really slow query on 6.4.2

From
geek+@cmu.edu
Date:
Then <tgl@sss.pgh.pa.us> spoke up and said:
> I'm not convinced that accurate stats are worth that cost, but I don't
> know how big the cost would be anyway.  Anyone have a feel for it?

They are definitely *not* worth the cost.  Especially since no table
will have the default 0 rows entry after a single vacuum analyze of
that table.  Let's be honest: if you aren't interested in doing a
vacuum, then really aren't interested in performance, anyway.

-- 
=====================================================================
| JAVA must have been developed in the wilds of West Virginia.      |
| After all, why else would it support only single inheritance??    |
=====================================================================
| Finger geek@cmu.edu for my public key.                            |
=====================================================================

Re: AW: [HACKERS] Really slow query on 6.4.2

From
The Hermit Hacker
Date:
On 25 Mar 1999 geek+@cmu.edu wrote:

> Then <tgl@sss.pgh.pa.us> spoke up and said:
> > I'm not convinced that accurate stats are worth that cost, but I don't
> > know how big the cost would be anyway.  Anyone have a feel for it?
> 
> They are definitely *not* worth the cost.  Especially since no table
> will have the default 0 rows entry after a single vacuum analyze of
> that table.  Let's be honest: if you aren't interested in doing a
> vacuum, then really aren't interested in performance, anyway.

What I personally am not interested in is having to spend 20 minute per
day with a totally locked up database because I want my queries to be
faster, when there are other ways of doing it...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: AW: [HACKERS] Really slow query on 6.4.2

From
The Hermit Hacker
Date:
On Thu, 25 Mar 1999, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > I'm not sure what is all contained in the stats, but the easiest one, I
> > think, to have done automagically is table sizes...add a tuple, update the
> > table of number of rows automatically.  If that numbers gets "off", at
> > least it will be more reasonable then not doing anything...no?
> 
> The number of tuples is definitely the most important stat; updating it
> automatically would make the optimizer work better.  The stuff in
> pg_statistics is not nearly as important.
> 
> The only objection I can think of to auto-updating reltuples is that
> it'd mean additional computation (to access and rewrite the pg_class
> entry) and additional disk I/O (to write back pg_class) for every INSERT
> and DELETE.  There's also a potential problem of multiple backends all
> trying to write pg_class and being delayed or even deadlocked because of
> it.  (Perhaps the MVCC code will help here.)
> 
> I'm not convinced that accurate stats are worth that cost, but I don't
> know how big the cost would be anyway.  Anyone have a feel for it?

We're not looking for perfect numbers here, how about something just
stored in cache and periodically written out to disk?  We already have the
shard memory pool to work with...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Really slow query on 6.4.2

From
The Hermit Hacker
Date:
On Thu, 25 Mar 1999, Tom Lane wrote:

> Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
> > 5. Actually postgresql has behaved in this manner because of certain
> > "bugs" in the optimizer. Recently a lot of those "bugs" have been
> > identified and "fixed", thus destroying the defacto rule based
> > behavior.
> 
> That's a real good point --- I think we've already heard a couple of
> complaints about the new optimizer doing "silly" things that it didn't
> use to do.
> 
> I repeat my proposal: CREATE TABLE should insert a default size (say
> about 1000 tuples) into pg_class.reltuples, rather than inserting 0.
> That way, the optimizer will only choose small-table-oriented plans
> if the table has actually been verified to be small by vacuum.

inserting 0 is an accurate number, not 1000 ...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: AW: [HACKERS] Really slow query on 6.4.2

From
geek+@cmu.edu
Date:
Then <scrappy@hub.org> spoke up and said:
> On 25 Mar 1999 geek+@cmu.edu wrote:
> > They are definitely *not* worth the cost.  Especially since no table
> > will have the default 0 rows entry after a single vacuum analyze of
> > that table.  Let's be honest: if you aren't interested in doing a
> > vacuum, then really aren't interested in performance, anyway.
> 
> What I personally am not interested in is having to spend 20 minute per
> day with a totally locked up database because I want my queries to be
> faster, when there are other ways of doing it...

Uhm, no.  The specific case we are talking about here is creation of a
table, inserting rows into it, and NEVER running vacuum analyze on
it.  This would not lock up your database for 20 minutes unless you
are dropping and re-creating a bunch of tables.  Even that case could
be scripted creatively[0], though.  Further, you don't have to run it
on a whole database every night.  Just the tables of interest.

We run a multi-gigabyte Ingres database her for our student systems.
When we want to make sure that good plans are chosen, we sysmod and
optimizedb it.  Since we always want good plans, but rarely inload
massive amounts of data, we do this once a week.

One of the things to be kept in mind with performance tuning is
tradeoffs.  Does it make sense to penalize every transaction for the
sake of updating statistics?  (the answer is "maybe") Does it make
sense to penalize every transaction to provide a recovery mechanism?
(yes) Does it make sense to penalize every transaction to prevent any
one transaction from using up more than 1MB/s of bandwidth?  (no)
Should you extract the data to a binary flat file, read it in C,
collect the information of interest and then do something interesting
with it?  (maybe)

[0] Assuming the data are laid out "sequentially" on the index fields:
create the table, chop off and insert only the first and last
thousand[1] rows, vacuum, and then insert the rest.

[1] Or perhaps a slightly bigger number.  Or a sampling of the file

-- 
=====================================================================
| JAVA must have been developed in the wilds of West Virginia.      |
| After all, why else would it support only single inheritance??    |
=====================================================================
| Finger geek@cmu.edu for my public key.                            |
=====================================================================