Thread: initdb needed for newest sources

initdb needed for newest sources

From

Tom Lane

Date:

01 August 1999, 01:01:11

I have just committed changes that require a recompile and initdb
when next you pull the current CVS sources.  Be warned.

The changes revise the contents of the pg_statistic system table
in order to support more accurate selectivity estimation, as per
discussions a few days ago.

I had hoped that pg_upgrade would be sufficient to deal with this,
but it seems to be out of service at the moment...
        regards, tom lane

Re: [HACKERS] initdb needed for newest sources

From

Thomas Lockhart

Date:

01 August 1999, 17:25:22

> The changes revise the contents of the pg_statistic system table
> in order to support more accurate selectivity estimation, as per
> discussions a few days ago.

Hey Tom, I'm not sure I was all that enthused about trying to optimize
the selectivity for *any* particular strategy. How about also allowing
the value(s) used for selectivity estimates to be manually set in the
table so folks can tune things to their satisfaction? Maybe they can
already be set, in which case we could add a SET OPTIMIZATION command
(the name is negotiable) to make it easier?
                       - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] initdb needed for newest sources

From

Tom Lane

Date:

02 August 1999, 00:03:27

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> The changes revise the contents of the pg_statistic system table
>> in order to support more accurate selectivity estimation, as per
>> discussions a few days ago.

> Hey Tom, I'm not sure I was all that enthused about trying to optimize
> the selectivity for *any* particular strategy. How about also allowing
> the value(s) used for selectivity estimates to be manually set in the
> table so folks can tune things to their satisfaction?

Well, I'm more interested in putting my effort into making the system
do the right thing without help.  Manual overrides are OK as long as
you remember to revisit the settings whenever anything changes ...
otherwise your manual optimization can become manual pessimization ...

But if you want to spend time on the manual approach, I have no
objection.  There's room for everyone to play.

> Maybe they can already be set, in which case we could add a SET
> OPTIMIZATION command (the name is negotiable) to make it easier?

There's no manual inputs presently, except for some rather crude
control variables (_enable_mergejoin_ and so on --- see
backend/optimizer/path/costsize.c).  There are a mixture of SET
commands and backend command-line switches (ugh) to set these,
and some don't have any tweak method short of source code changes
or going in with a debugger.  This area could use some cleanup
and rethinking, for sure.
        regards, tom lane

[HACKERS] A suggestion on finding those pesky .so files

From

"J. Michael Roberts"

Date:

02 August 1999, 00:34:27

Another thing I found out during installation is that libpq.so is hard to
find.  Setting LD_LIBRARY_PATH works, but I hate depending on environment
settings, mostly because I always screw them up sooner or later.

On searching, I found that you can specify -R/usr/local/pgsql/lib during
linking in gcc and in cc on Solaris -- is there some reason that would be
bad to do in general?  I tried forcing it on psql and it seems to work
fine, and I don't need to set LD_LIBRARY_PATH any more.

I'm talking about Solaris 2.5 here, by the way.

Re: [HACKERS] A suggestion on finding those pesky .so files

From

Tom Lane

Date:

02 August 1999, 01:24:28

"J. Michael Roberts" <mirobert@cs.indiana.edu> writes:
> On searching, I found that you can specify -R/usr/local/pgsql/lib during
> linking in gcc and in cc on Solaris -- is there some reason that would be
> bad to do in general?

Only that it's platform-specific.  We do already do the equivalent
incantation for HPUX --- they pronounce it differently, of course,
but it's the same idea --- and it seems to work well.  If you want
to submit a patch to make it happen on Solaris, go for it.
(The HPUX tweak is in makefiles/Makefile.hpux, so I suppose you'd
want to hack on Makefile.solaris_*.)

Ultimately I think we want to get out of the business of learning
all these grotty details about shared libraries on different flavors
of Unix, and instead rely on GNU libtool to do it for us.  But
converting to libtool takes work too :-(
        regards, tom lane

Optimizer hints

From

"John Ridout"

Date:

02 August 1999, 05:00:30

Hi,

I know of course that the pgsql optimizer is never wrong
but for the sake of argument do you think we could have
optimizer hints the way say MSSQL has them?

Oh, I'll work on the rest of my wish list later.  ;-)

Regards

John Ridout.

Re: [HACKERS] Optimizer hints

From

Hannu Krosing

Date:

02 August 1999, 08:42:33

John Ridout wrote:
> 
> Hi,
> 
> I know of course that the pgsql optimizer is never wrong

:) 

OTOH, it is never completely right either :)

> but for the sake of argument do you think we could have
> optimizer hints the way say MSSQL has them?

I think it has been discussed (a little) but more on the 
lines of 'the way Oracle has them' ;)

It seems that this is not currently very high priority for 
any of the current real developers, but if you would 
contribute a nice clean implementation I'm sure it would 
be accepted.

----------------
Hannu

RE: [HACKERS] Optimizer hints

From

"John Ridout"

Date:

02 August 1999, 08:57:33

> Hannu Krosing wrote:
> 
> John Ridout wrote:
> > 
> > Hi,
> > 
> > I know of course that the pgsql optimizer is never wrong
> 
> :) 
> 
> OTOH, it is never completely right either :)
> 
> > but for the sake of argument do you think we could have
> > optimizer hints the way say MSSQL has them?
> 
> I think it has been discussed (a little) but more on the 
> lines of 'the way Oracle has them' ;)
> 

Oh, you mean properly.

> It seems that this is not currently very high priority for 
> any of the current real developers, but if you would 
> contribute a nice clean implementation I'm sure it would 
> be accepted.
> 
> ----------------
> Hannu
> 

Come the end of September I will have enough time
to play with the internals of pgsql.
I suppose there are more useful things that
can be done before optimizer hints such as nearly
everything on the TODO list.

John.

Re: [HACKERS] Optimizer hints

From

Bruce Momjian

Date:

02 August 1999, 10:31:34

> Come the end of September I will have enough time
> to play with the internals of pgsql.
> I suppose there are more useful things that
> can be done before optimizer hints such as nearly
> everything on the TODO list.

Yes, that is basically the issue.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Selectivity estimates paper, and Mariposa

From

"Ross J. Reedstrom"

Date:

02 August 1999, 16:27:38

On Mon, Aug 02, 1999 at 12:01:03AM -0400, Tom Lane wrote:
> 
> Well, I'm more interested in putting my effort into making the system
> do the right thing without help.  Manual overrides are OK as long as
> you remember to revisit the settings whenever anything changes ...
> otherwise your manual optimization can become manual pessimization ...
> 

Hey Tom - 
Ran across this paper, about an interesting approach, pulling in the indices
to aid in selectivity estimates.

http://db.cs.berkeley.edu/papers/CSD-98-1021.pdf

I grabbed this from a link at:

http://db.cs.berkeley.edu/papers/

while looking at the Mariposa work ( http://mariposa.cs.berkeley.edu)
from the Sequoia2000 project. I've convinced my team to let me spend
a couple days analyzing what it would take to fold the remote access
features of Mariposa into the current PostgreSQL tree.  I grabbed the
alpha-1 code (June 23, 1996) which seems to be based on an early version
of Postgres95. Interesting to see all the academic cruft you guys have
already cleaned out ;-)

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

Re: [HACKERS] Selectivity estimates paper, and Mariposa

From

Bruce Momjian

Date:

02 August 1999, 16:45:38

> On Mon, Aug 02, 1999 at 12:01:03AM -0400, Tom Lane wrote:
> > 
> > Well, I'm more interested in putting my effort into making the system
> > do the right thing without help.  Manual overrides are OK as long as
> > you remember to revisit the settings whenever anything changes ...
> > otherwise your manual optimization can become manual pessimization ...
> > 
> 
> Hey Tom - 
> Ran across this paper, about an interesting approach, pulling in the indices
> to aid in selectivity estimates.
> 
> http://db.cs.berkeley.edu/papers/CSD-98-1021.pdf
> 
> I grabbed this from a link at:
> 
> http://db.cs.berkeley.edu/papers/
> 
> while looking at the Mariposa work ( http://mariposa.cs.berkeley.edu)
> from the Sequoia2000 project. I've convinced my team to let me spend
> a couple days analyzing what it would take to fold the remote access
> features of Mariposa into the current PostgreSQL tree.  I grabbed the
> alpha-1 code (June 23, 1996) which seems to be based on an early version
> of Postgres95. Interesting to see all the academic cruft you guys have
> already cleaned out ;-)

We still have a directory called tioga which is also related to
Mariposa.  Basically, at the time, no one understood the academic stuff,
and we had tons of bugs in general areas.  We just didn't see any reason
to keep around unusual features while our existing code was so poorly
maintained from Berkeley.

The mariposa remote access features looked like they were heavily done
in the executor directory.  This makes sense assuming they wanted the
access to be done remotely.  They also tried to fix some things while
doing Mariposa.  A few of those fixes have been added over the years.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Selectivity estimates paper, and Mariposa

From

Tom Lane

Date:

02 August 1999, 17:09:39

"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes:
> Ran across this paper, about an interesting approach, pulling in the indices
> to aid in selectivity estimates.
> http://db.cs.berkeley.edu/papers/CSD-98-1021.pdf

Looks pretty interesting, but also vastly more complex than I want to
tackle at the moment.

As of 6.5 the selectivity code is broken for everything except integers.
What I'm trying to do for this release cycle is to get it to operate as
well as can be expected given the existing design (in which available
statistics are not much more than min/max/mode values for each column;
strictly speaking the mode was not in the existing design, but given
that VACUUM was computing it anyway, we might as well use it).

A lot more *could* be done, as this paper suggests; but there are also
many other important tasks, and only so many hours in the day.  I doubt
that building an entirely new selectivity estimation infrastructure is
worthwhile until we have cured some more problems elsewhere :-(

> while looking at the Mariposa work ( http://mariposa.cs.berkeley.edu)
> from the Sequoia2000 project. I've convinced my team to let me spend
> a couple days analyzing what it would take to fold the remote access
> features of Mariposa into the current PostgreSQL tree.

Let us know what you find out...
        regards, tom lane

Re: [HACKERS] Mariposa

From

"Ross J. Reedstrom"

Date:

02 August 1999, 18:24:40

On Mon, Aug 02, 1999 at 04:44:10PM -0400, Bruce Momjian wrote:
> 
> We still have a directory called tioga which is also related to
> Mariposa.  Basically, at the time, no one understood the academic stuff,
> and we had tons of bugs in general areas.  We just didn't see any reason
> to keep around unusual features while our existing code was so poorly
> maintained from Berkeley.

The right thing to do, I concur. Get the basics stable and working well,
_then_ tack on the interesting stuff :-) A common complaint about us
academics: we only want to do the interesting stuff.

> 
> The mariposa remote access features looked like they were heavily done
> in the executor directory.  This makes sense assuming they wanted the
> access to be done remotely.  They also tried to fix some things while
> doing Mariposa.  A few of those fixes have been added over the years.
> 

Right. As I've been able to make out so far, in Mariposa a query passes
through the regular parser and single-site optimizer, then the selected
plan tree is handed to a 'fragmenter' to break the work up into chunks,
which are then handed around to a 'broker' which uses a microeconomic
'bid' process to parcels them out to both local and remote executors. The
results from each site then go through a local 'coordinator' which merges
the result sets, and hands them back to the original client.

Whew!

It's interesting to compare the theory describing the workings of Mariposa
(such as the paper in VLDB), and the code. For the fragmenter, the paper
describes basically a rational decomposition of the plan, while the code
applies non-deterministic, but tuneable, methods (lots of calls to random
and comparisions to user specified odds ratios).

It strikes me as a bit odd to optimize the plan for a single site,
then break it all apart again. My thoughts on this are to implement
a two new node types: one a remote table, and one which represents
access to a remote table. Remote tables have host info in them, and
always be added to the plan with a remote-access node directly above
them. Remote-access nodes would be seperate from their remote-table,
to allow the communications cost to be slid up the plan tree, and merged
with other remote-access nodes talking to the same server. This should
maintain the order-agnostic nature of the optimizer. The executor will
need to build SQL statements and from the sub-plans and submit them via
standard network db access client librarys.

First step, create a remote-table node, and teach the excutor how to get
info from it. Later, add the seperable remote-access node.

How insane does this sound now? Am I still a mad scientist? (...always!)

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

Re: [HACKERS] Mariposa

From

Tom Lane

Date:

02 August 1999, 18:52:40

"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes:
> It strikes me as a bit odd to optimize the plan for a single site,
> then break it all apart again.

Yes, that sounds pretty peculiar, especially considering that the
optimizer's choices are all about access costs.  A plan generated
for entirely-local execution might be far from optimal when broken
across multiple nodes.

> My thoughts on this are to implement a two new node types: one a
> remote table, and one which represents access to a remote
> table. Remote tables have host info in them, and always be added to
> the plan with a remote-access node directly above them. Remote-access
> nodes would be seperate from their remote-table, to allow the
> communications cost to be slid up the plan tree, and merged with other
> remote-access nodes talking to the same server.

I like that approach a lot better.  If the access cost estimates for the
shared-table node can be set to reflect remote communication costs,
you might actually get reasonable plans out of the optimizer...

You should not move too far with an actual implementation until you talk
to Jan about rangetable entries for sub-selects.  If we are going to
invent new RTE types we may as well try to deal with that problem
at the same time.
        regards, tom lane

Re: [HACKERS] Mariposa

From

Bruce Momjian

Date:

02 August 1999, 22:31:43

> It strikes me as a bit odd to optimize the plan for a single site,
> then break it all apart again. My thoughts on this are to implement
> a two new node types: one a remote table, and one which represents
> access to a remote table. Remote tables have host info in them, and
> always be added to the plan with a remote-access node directly above
> them. Remote-access nodes would be seperate from their remote-table,
> to allow the communications cost to be slid up the plan tree, and merged
> with other remote-access nodes talking to the same server. This should
> maintain the order-agnostic nature of the optimizer. The executor will
> need to build SQL statements and from the sub-plans and submit them via
> standard network db access client librarys.
> 
> First step, create a remote-table node, and teach the excutor how to get
> info from it. Later, add the seperable remote-access node.
> 
> How insane does this sound now? Am I still a mad scientist? (...always!)

Sounds interesting, and doable.  People have asked from time to time
about this.  Our access routines are very modular, so if you can get
your stuff working inside the tuple access routines, you will have it
made.  The easiest way may be to just hack up the storage manager
(smgr).  Create a new access method, and hook your remote stuff to that.

You could try something easy like common backend NFS with some locking
protocol to prevent contension.  In fact, NFS would be transparent
except for locking issues.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] Mariposa

From

"Ross J. Reedstrom"

Date:

02 August 1999, 23:38:43

On Mon, Aug 02, 1999 at 10:25:15PM -0400, Bruce Momjian wrote:
> 
> Sounds interesting, and doable.  People have asked from time to time
> about this.  Our access routines are very modular, so if you can get
> your stuff working inside the tuple access routines, you will have it
> made.  The easiest way may be to just hack up the storage manager
> (smgr).  Create a new access method, and hook your remote stuff to that.
> 

I considered two quick and dirty proof-of-concept implementations first:
hacking the smgr, and upgrading functions to allow them to return sets,
then building views with an ON SELECT rule that fired an arbitrary db
access routine.

One advantage of going the smgr route is I automagically get all the
buffer and relation caching that's builit-in. A negative is that it
basically just gives me a remote table: every query's going to pull
the whole table, or I'm going to have to write somewhat hairy access
methods, I think. I think it's a little _too_ low level. I want to be
able to have the backend DB do as much work as possible, minimizing the
expensive network transfer.

Oh, and BTW, while the smgr infrastructure is still in place, and all
accesses vector through it, the relsmgr field has been removed from
the pg_class table, and all calls to smgropen() and friends use the
DEFAULT_SMGR #define.

The second, having functions return sets, is still a possibility, but it
suffers from the same lack of extensibility - I only get complete tables
back. Even worse, I don't get the buffer caches for free. If rules 
_currently_ could return sets, it's be a 1-2 day hack to get something
working.

So, I think I'll see what Jan has to say about subselect RTEs, as Tom
suggested. Seems the way to go for future work.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

Re: [HACKERS] Mariposa

From

Bruce Momjian

Date:

02 August 1999, 23:43:44

> One advantage of going the smgr route is I automagically get all the
> buffer and relation caching that's builit-in. A negative is that it
> basically just gives me a remote table: every query's going to pull
> the whole table, or I'm going to have to write somewhat hairy access
> methods, I think. I think it's a little _too_ low level. I want to be
> able to have the backend DB do as much work as possible, minimizing the
> expensive network transfer.
> 
> Oh, and BTW, while the smgr infrastructure is still in place, and all
> accesses vector through it, the relsmgr field has been removed from
> the pg_class table, and all calls to smgropen() and friends use the
> DEFAULT_SMGR #define.

That was me.  The only other storage manager we had was the stable
memory storage manager, and that was just too confusing to the
developers.  It can be easily added back by just changing the
DEFAULT_SMGR define to access the relation tuple.  In fact, our code
uses the storage manager much more consistently now because to enable
multi-segment tables (>2 gigs for non-supporting OS's), we had to clean
that up.


> The second, having functions return sets, is still a possibility, but it
> suffers from the same lack of extensibility - I only get complete tables
> back. Even worse, I don't get the buffer caches for free. If rules 
> _currently_ could return sets, it's be a 1-2 day hack to get something
> working.
> 
> So, I think I'll see what Jan has to say about subselect RTEs, as Tom
> suggested. Seems the way to go for future work.

Yes, I think he needs that for foreign keys, and I think he wants to do
that for 6.6.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026