Thread: Cost: Big Tables vs. Organized Separation of Data

Cost: Big Tables vs. Organized Separation of Data

From
Bob Dusek
Date:
Hey all,

I've been developing an electronic inventory system for a used
clothing store using Postgres and PHP3, with Netscape as my
graphical interface.

And, as the database has grown, my queries (as one would logically
expect) have been getting slower.  In particular, I have a set of
queries that allow the store employees to analyze the current
inventory and the sales inventory.

Some of these queries are required to do joins on several tables.
And, they are also trying to do some simple calculations
(table.price - table.cost).

As I attempt to "fix" the slowness of the queries, I have come
to the following cross-road:

1. If I re-organize the data, I would be able to perform my queries
without executing joins on multiple tables per query.

2. As I re-organize the data, the database becomes less and less
intuitive and (seemingly) less "normal".

So, I guess my question is:  how costly are joins?  I've heard that
Postgres pretty much "pukes" (in terms of speed) when you're trying
to do anything more than 6 table joins in one query.  This leads
me to believe that joins are fairly costly... ????

Does anyone have any words of advice for me as I battle this?

PS...
I reallize that if I do the calculations with my scripting language
and simply fetch data with my queries, I will cut down on the cost
of the queries a little bit.

Thanks,

Bob


Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data

From
Bruce Momjian
Date:
> 1. If I re-organize the data, I would be able to perform my queries
> without executing joins on multiple tables per query.
>
> 2. As I re-organize the data, the database becomes less and less
> intuitive and (seemingly) less "normal".
>
> So, I guess my question is:  how costly are joins?  I've heard that
> Postgres pretty much "pukes" (in terms of speed) when you're trying
> to do anything more than 6 table joins in one query.  This leads
> me to believe that joins are fairly costly... ????
>
> Does anyone have any words of advice for me as I battle this?

We are working speeding up large table joins right now.  Try doing SET
GEQO to set the value lower and see if that helps.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data

From
Jeff Hoffmann
Date:
Bruce Momjian wrote:
>
> > 1. If I re-organize the data, I would be able to perform my queries
> > without executing joins on multiple tables per query.
> >
> > 2. As I re-organize the data, the database becomes less and less
> > intuitive and (seemingly) less "normal".
> >
> > So, I guess my question is:  how costly are joins?  I've heard that
> > Postgres pretty much "pukes" (in terms of speed) when you're trying
> > to do anything more than 6 table joins in one query.  This leads
> > me to believe that joins are fairly costly... ????
> >
> > Does anyone have any words of advice for me as I battle this?
>
> We are working speeding up large table joins right now.  Try doing SET
> GEQO to set the value lower and see if that helps.

i've noticed a pretty drastic slowdown going from a 4 table join
(instantaneous) to a 5 table join (15-20 seconds) i don't think i've
tried 6 tables yet with the same database.  is there a lower limit to
where using GEQO doesn't really make sense in terms of speed (i.e.,
would setting GEQO to 5 be a toss up and 4 not make sense?)  i'm
guessing that the number of plans the optimizer checks without GEQO goes
up factorially, while GEQO goes up fairly linearly (from my limited
knowledge of genetic algorithms, basically you make a series of first
guesses and gradually refine them and throw away the losers until you
end up with the right one.)  if my join is based purely on primary keys,
shouldn't just about any plan work well, or at least well enough that it
doesn't pay to make an exhaustive search of the plan space, making GEQO
the best choice?  i guess my question is "is there a rule of thumb for
setting GEQO?"  is there a reason it was set to 8 by default?  does GEQO
work better in some cases than others? (small tables, joins on
non-indexed fields, etc.)

it seems like i'm always learning something new with postgres -- i've
never thought about this before; it's just one more thing for me to play
around with...

Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data

From
"Alex P. Rudnev"
Date:
Do you try 'vacuum analyze' and create indexes for those keys which are
_most important_?

It's very important. I can't talk just about PSQL parser/optimiser, but
it can't choose appropriate joining order withouth knowing the key
distribution. Well projected indexes and 'vacuum analyze' can improve
your join dramatically.

Simple rules:
- create indexes for all UNIQ attributes you are used for the joining.
- don't create any index for the attributes if this attribute have a lot
of duplicated records for every attribute instance.

For example:

 CREATE TABLE employee (enum int, branch int);
 CREATE TABLE empinfo (enum int, surname text, sex int);

Create index for the branch (if you should select employee by the branch)
and for the empinfo.enum. The worst thing you can do is creating index
on 'empinfo.sex' because the frauded system can think there is 10,000
different sexes when there is only two of them.

And use 'explain' statement to check if the system's behaviour is right.


 On Wed, 3 Feb 1999, Jeff Hoffmann wrote:

> Date: Wed, 03 Feb 1999 09:14:37 -0600
> From: Jeff Hoffmann <jeff@remapcorp.com>
> To: Bruce Momjian <maillist@candle.pha.pa.us>
> Cc: pgsql-general@postgreSQL.org
> Subject: Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data
>
> Bruce Momjian wrote:
> >
> > > 1. If I re-organize the data, I would be able to perform my queries
> > > without executing joins on multiple tables per query.
> > >
> > > 2. As I re-organize the data, the database becomes less and less
> > > intuitive and (seemingly) less "normal".
> > >
> > > So, I guess my question is:  how costly are joins?  I've heard that
> > > Postgres pretty much "pukes" (in terms of speed) when you're trying
> > > to do anything more than 6 table joins in one query.  This leads
> > > me to believe that joins are fairly costly... ????
> > >
> > > Does anyone have any words of advice for me as I battle this?
> >
> > We are working speeding up large table joins right now.  Try doing SET
> > GEQO to set the value lower and see if that helps.
>
> i've noticed a pretty drastic slowdown going from a 4 table join
> (instantaneous) to a 5 table join (15-20 seconds) i don't think i've
> tried 6 tables yet with the same database.  is there a lower limit to
> where using GEQO doesn't really make sense in terms of speed (i.e.,
> would setting GEQO to 5 be a toss up and 4 not make sense?)  i'm
> guessing that the number of plans the optimizer checks without GEQO goes
> up factorially, while GEQO goes up fairly linearly (from my limited
> knowledge of genetic algorithms, basically you make a series of first
> guesses and gradually refine them and throw away the losers until you
> end up with the right one.)  if my join is based purely on primary keys,
> shouldn't just about any plan work well, or at least well enough that it
> doesn't pay to make an exhaustive search of the plan space, making GEQO
> the best choice?  i guess my question is "is there a rule of thumb for
> setting GEQO?"  is there a reason it was set to 8 by default?  does GEQO
> work better in some cases than others? (small tables, joins on
> non-indexed fields, etc.)
>
> it seems like i'm always learning something new with postgres -- i've
> never thought about this before; it's just one more thing for me to play
> around with...
>
>

Aleksei Roudnev, Network Operations Center, Relcom, Moscow
(+7 095) 194-19-95 (Network Operations Center Hot Line),(+7 095) 230-41-41, N 13729 (pager)
(+7 095) 196-72-12 (Support), (+7 095) 194-33-28 (Fax)


Re: [GENERAL] Cost: Big Tables vs. Organized Separation of Data

From
Bruce Momjian
Date:
> i've noticed a pretty drastic slowdown going from a 4 table join
> (instantaneous) to a 5 table join (15-20 seconds) i don't think i've
> tried 6 tables yet with the same database.  is there a lower limit to
> where using GEQO doesn't really make sense in terms of speed (i.e.,
> would setting GEQO to 5 be a toss up and 4 not make sense?)  i'm
> guessing that the number of plans the optimizer checks without GEQO goes
> up factorially, while GEQO goes up fairly linearly (from my limited
> knowledge of genetic algorithms, basically you make a series of first
> guesses and gradually refine them and throw away the losers until you
> end up with the right one.)  if my join is based purely on primary keys,
> shouldn't just about any plan work well, or at least well enough that it
> doesn't pay to make an exhaustive search of the plan space, making GEQO
> the best choice?  i guess my question is "is there a rule of thumb for
> setting GEQO?"  is there a reason it was set to 8 by default?  does GEQO
> work better in some cases than others? (small tables, joins on
> non-indexed fields, etc.)

The bottom line is that GEQO in 6.4.* and previous releases counted the
number of tables as an indicator.  The new 6.5, which will be going into
beta soon, will use the number of tables _and_ indexes, making the GEQO
start location much more reliable.  The current default GEQO of 8 is too
high for some queries, and too low for others, because it did not take
into account the number of indexes involved.   So, count the number of
indexes and tables, and try a proper GEQO setting for that sum.  Let me
know what you find as a good GEQO number.  This testing will require you
to enable/disable GEQO with SET GEQO TO 'ON=1' and SET GEQO TO 'ON=9999'
just before each query.  What sum of tables and indexes seems to make
GEQO a win?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026