Re: TB-sized databases - Mailing list pgsql-performance

From Robert Treat
Subject Re: TB-sized databases
Date
Msg-id 200712051507.22745.xzilla@users.sourceforge.net
Whole thread Raw
In response to Re: TB-sized databases  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: TB-sized databases
List pgsql-performance
On Thursday 29 November 2007 11:14, Simon Riggs wrote:
> On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote:
> > Gregory Stark <stark@enterprisedb.com> writes:
> > > "Simon Riggs" <simon@2ndquadrant.com> writes:
> > >> Tom's previous concerns were along the lines of "How would know what
> > >> to set it to?", given that the planner costs are mostly arbitrary
> > >> numbers.
> > >
> > > Hm, that's only kind of true.
> >
> > The units are not the problem.  The problem is that you are staking
> > non-failure of your application on the planner's estimates being
> > pretty well in line with reality.  Not merely in line enough that
> > it picks a reasonably cheap plan, but in line enough that if it
> > thinks plan A is 10x more expensive than plan B, then the actual
> > ratio is indeed somewhere near 10.
> >
> > Given that this list spends all day every day discussing cases where the
> > planner is wrong, I'd have to think that that's a bet I wouldn't take.
>
> I think you have a point, but the alternative is often much worse.
>
> If an SQL statement fails because of too high cost, we can investigate
> the problem and re-submit. If a website slows down because somebody
> allowed a very large query to execute then everybody is affected, not
> just the person who ran the bad query. Either way the guy that ran the
> query loses, but without constraints in place one guy can kill everybody
> else also.
>
> > You could probably avoid this risk by setting the cutoff at something
> > like 100 or 1000 times what you really want to tolerate, but how
> > useful is it then?
>
> Still fairly useful, as long as we understand its a blunt instrument.
>
> If the whole performance of your system depends upon indexed access then
> rogue queries can have disastrous, unpredictable consequences. Many
> sites construct their SQL dynamically, so a mistake in a seldom used
> code path can allow killer queries through. Even the best DBAs have been
> known to make mistakes.
>

If the whole performance of your system depends upon indexed access, then
maybe you need a database that gives you a way to force index access at the
query level?

> e.g. An 80GB table has 8 million blocks in it.
> - So putting a statement_cost limit = 1 million would allow some fairly
> large queries but prevent anything that did a SeqScan (or worse).
> - Setting it 10 million is going to prevent things like sorting the
> whole table without a LIMIT
> - Setting it at 100 million is going to prevent unconstrained product
> joins etc..

I think you're completly overlooking the effect of disk latency has on query
times.  We run queries all the time that can vary from 4 hours to 12 hours in
time based solely on the amount of concurrent load on the system, even though
they always plan with the same cost.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

pgsql-performance by date:

Previous
From: Bill Moran
Date:
Subject: Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)
Next
From: "Merlin Moncure"
Date:
Subject: Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)