Thread: Threaded Sorting

Threaded Sorting

From
Hans-Jürgen Schönig
Date:
Did anybody think about threaded sorting so far?
Assume an SMP machine. In the case of building an index or in the case 
of sorting a lot of data there is just one backend working. Therefore 
just one CPU is used.
What about starting a thread for every temporary file being created? 
This way CREATE INDEX could use many CPUs.
Maybe this is worth thinking about because it will speed up huge 
databases and enterprise level computing.
   Best regards,
       Hans-Jürgen Schönig

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>



Re: Threaded Sorting

From
"Shridhar Daithankar"
Date:
On 4 Oct 2002 at 9:46, Hans-Jürgen Schönig wrote:

> Did anybody think about threaded sorting so far?
> Assume an SMP machine. In the case of building an index or in the case 
> of sorting a lot of data there is just one backend working. Therefore 
> just one CPU is used.
> What about starting a thread for every temporary file being created? 
> This way CREATE INDEX could use many CPUs.
> Maybe this is worth thinking about because it will speed up huge 
> databases and enterprise level computing.

I have a better plan. I have a thread architecture ready which acts as generic 
thread templates. Even the function pointers in the thread can be altered on 
the fly.

I suggest we use some such architecture for threading. It can be used in any 
module without hardcoding things. Like say in sorting we assign exclusive 
jobs/data ranges to threads then there would be minimum locking and one thread 
could merge the results.. Something like that.

All it takes to change entry functions to accept one more parameter that 
indicates range of values to act upon. In non-threaded version, it's not there 
because the function acts on entire data set.

Further more, with this model threading support can be turned off easily. In 
non-threaded model, a wrapper function can call the entry point in series with 
necessary arguments. So postgresql does not have to deal with not-so-good-
enough thread implementations. Keeping tradition to conservative defaults we 
can set default threads to off..

The code is in C++ but it's hardly couple of pages. I can convert it to C and 
post it if required..

Let me know..

ByeShridhar

--
Parkinson's Fourth Law:    The number of people in any working group tends to 
increase    regardless of the amount of work to be done.



Re: Threaded Sorting

From
Tom Lane
Date:
Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> Did anybody think about threaded sorting so far?
> Assume an SMP machine. In the case of building an index or in the case 
> of sorting a lot of data there is just one backend working. Therefore 
> just one CPU is used.
> What about starting a thread for every temporary file being created? 
> This way CREATE INDEX could use many CPUs.

In my experience, once you have enough data to force a temp file to be
used, the sort algorithm is I/O bound anyway.  Throwing more CPUs at it
won't help much.
        regards, tom lane


Re: Threaded Sorting

From
Greg Copeland
Date:
I wouldn't hold your breath for any form of threading.  Since PostgreSQL
is process based, you might consider having a pool of sort processes
which address this but I doubt you'll get anywhere talking about threads
here.

Greg


On Fri, 2002-10-04 at 02:46, Hans-Jürgen Schönig wrote:
> Did anybody think about threaded sorting so far?
> Assume an SMP machine. In the case of building an index or in the case
> of sorting a lot of data there is just one backend working. Therefore
> just one CPU is used.
> What about starting a thread for every temporary file being created?
> This way CREATE INDEX could use many CPUs.
> Maybe this is worth thinking about because it will speed up huge
> databases and enterprise level computing.
>
>     Best regards,
>
>         Hans-Jürgen Schönig
>
> --
> *Cybertec Geschwinde u Schoenig*
> Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
> Tel: +43/1/913 68 09; +43/664/233 90 75
> www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at
> <http://cluster.postgresql.at>, www.cybertec.at
> <http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html


Re: Threaded Sorting

From
Greg Copeland
Date:
On Fri, 2002-10-04 at 09:40, Hans-Jürgen Schönig wrote:
>
> I had a brief look at the code used for sorting. It is very well
> documented so maybe it is worth thinking about a parallel algorithm.
>
> When talking about threads: A pool of processes for sorting? Maybe this
> could be useful but I doubt if it the best solution to avoid overhead.
> Somewhere in the TODO it says that there will be experiments with a
> threaded backend. This make me think that threads are not a big no no.
>
>     Hans

That was a fork IIRC.  Threading is not used in baseline PostgreSQL nor
is there any such plans that I'm aware of.  People from time to time ask
about threads for this or that and are always told what I'm telling
you.  The use of threads leads to portability issues not to mention
PostgreSQL is entirely built around the process model.

Tom is right to dismiss the notion of adding additional CPUs to
something that is already I/O bound, however, the concept it self should
not be dismissed.  Applying multiple CPUs to a sort operation is well
accepted and understood technology.

At this point, perhaps Tom or one of the other core developers having
insight in this area would be willing to address how readily such a
mechanism could could be put in place.

Also, don't be so fast to dismiss what the process model can do.  There
is not reason to believe that having a process pool would not be able to
perform wonderful things if implemented properly.  Basically, the notion
would be that the backend processing the query would solicit assistance
from the sort pool if one or more processes were available.  At that
point, several methods could be employed to divide the work.  Some form
of threshold would also have to be created to prevent the pool from
being used when a single backend is capable of addressing the need.
Basically the idea is, you only have the pool assist with large tuple
counts and then, only when resources are available and resource are
available from within the pool.  By doing this, you avoid additional
overhead for small sort efforts and gain when it matters the most.


Regards,
Greg


Re: Threaded Sorting

From
Greg Copeland
Date:
On Fri, 2002-10-04 at 10:37, Hans-Jürgen Schönig wrote:
> My concern was that a process model might be a bit too slow for that but
> if we had processes in memory this would be wonderful thing.

Yes, that's the point of having a pool.  The idea is not only do you
avoid process creation and destruction which is notoriously expensive on
many platforms, they would sit idle until signaled to begin working on
it's assigned sort operation.  Ideally, these would be configurable
options which would include items such as, pool size (maximum number of
processes in the pool), max concurrency level (maximum number of process
from the pool which can contribute to a single backend) and tuple count
threshold (threshold which triggers solicition for assistance from the
sort pool).

> Using it for small amounts of data is pretty useless - I totally agree
> but when it comes to huge amounts of data it can be useful.
>
> It is a mechanism for huge installations with a lot of data.
>
>     Hans

Agreed.  Thus the importance of being able to specify some type of
meaningful threshold.

Any of the core developers wanna chime in here on this concept?

Greg



Re: Threaded Sorting

From
Bruce Momjian
Date:
Hans-J�rgen Sch�nig wrote:
> Did anybody think about threaded sorting so far?
> Assume an SMP machine. In the case of building an index or in the case 
> of sorting a lot of data there is just one backend working. Therefore 
> just one CPU is used.
> What about starting a thread for every temporary file being created? 
> This way CREATE INDEX could use many CPUs.
> Maybe this is worth thinking about because it will speed up huge 
> databases and enterprise level computing.

We haven't thought about it yet because there are too many buggy thread
implementations.  We are probably just now getting to a point where we
can consider it.  However, lots of databases have moved to threads for
all sorts of things and ended up with a royal mess of code.  Threads
can only improve things in a few areas of the backend so it would be
nice if we could limit the exposure to threads to those areas;  sorting
could certainly be one of them, but frankly, I think disk I/O is our
limiting factore there.  I would be interested to see some tests that
showed otherwise.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
"scott.marlowe"
Date:
On Fri, 4 Oct 2002, Bruce Momjian wrote:

> Hans-Jürgen Schönig wrote:
> > Did anybody think about threaded sorting so far?
> > Assume an SMP machine. In the case of building an index or in the case 
> > of sorting a lot of data there is just one backend working. Therefore 
> > just one CPU is used.
> > What about starting a thread for every temporary file being created? 
> > This way CREATE INDEX could use many CPUs.
> > Maybe this is worth thinking about because it will speed up huge 
> > databases and enterprise level computing.
> 
> We haven't thought about it yet because there are too many buggy thread
> implementations.  We are probably just now getting to a point where we
> can consider it.  However, lots of databases have moved to threads for
> all sorts of things and ended up with a royal mess of code.  Threads
> can only improve things in a few areas of the backend so it would be
> nice if we could limit the exposure to threads to those areas;  sorting
> could certainly be one of them, but frankly, I think disk I/O is our
> limiting factore there.  I would be interested to see some tests that
> showed otherwise.

Wouldn't the type of disk subsystem really make a big difference here?

With a couple of U160 cards and a dozen 15krpm hard drives, I would 
imagine I/O would no longer be as much of an issue as a single drive 
system would be.

It seems like sometimes we consider these issues more from the one or two 
SCSI drives perspective insted of the big box o drives perspective.



Re: Threaded Sorting

From
Bruce Momjian
Date:
Hans-J�rgen Sch�nig wrote:
> Threads are bad - I know ...
> I like the idea of a pool of processes instead of threads - from my 
> point of view this would be useful.
> 
> I am planning to run some tests (GEQO, AIX, sorts) as soon as I have 
> time to do so (still too much work ahead before :( ...).
> If I had time I'd love to do something for the PostgreSQL community :(.
> 
> As far as sorting is concerned: It would be fine if it was possible to 
> define an alternative location for temporary sort files using SET.
> If you had multiple disks this would help in the case of concurrent 
> sorts because this way people could insert and index many tables at once 
> without having to access just one storage system.
> This would be an easy way out of the IO limitation ... - at least for 
> some problems.

Bingo!  Want to increase sorting performance, give it more I/O
bandwidth, and it will take 1/100th of the time to do threading.

Ingres had a nice feature where you could specify sort directories and
it would cycle through those directories while it did the tape sort.

Added to TODO:
* Allow sorting to use multiple work directories

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
Bruce Momjian
Date:
scott.marlowe wrote:
> > We haven't thought about it yet because there are too many buggy thread
> > implementations.  We are probably just now getting to a point where we
> > can consider it.  However, lots of databases have moved to threads for
> > all sorts of things and ended up with a royal mess of code.  Threads
> > can only improve things in a few areas of the backend so it would be
> > nice if we could limit the exposure to threads to those areas;  sorting
> > could certainly be one of them, but frankly, I think disk I/O is our
> > limiting factore there.  I would be interested to see some tests that
> > showed otherwise.
> 
> Wouldn't the type of disk subsystem really make a big difference here?
> 
> With a couple of U160 cards and a dozen 15krpm hard drives, I would 
> imagine I/O would no longer be as much of an issue as a single drive 
> system would be.
> 
> It seems like sometimes we consider these issues more from the one or two 
> SCSI drives perspective insted of the big box o drives perspective.

Yes, it is mostly for non-RAID drives, but also, sometimes single drives
can be faster.  When you have a drive array, it isn't as easy to hit
each drive and keep it running sequentially.  Of course, I don't have
any hard numbers on that.  ;-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
Justin Clift
Date:
Bruce Momjian wrote:
> 
> scott.marlowe wrote:
<snip>
> > It seems like sometimes we consider these issues more from the one or two
> > SCSI drives perspective insted of the big box o drives perspective.
> 
> Yes, it is mostly for non-RAID drives, but also, sometimes single drives
> can be faster.  When you have a drive array, it isn't as easy to hit
> each drive and keep it running sequentially.  Of course, I don't have
> any hard numbers on that.  ;-)

Arrghh... please remember that "big bunch of drives" != "all in one
array".

It's common to have a bunch of drives and allocate different ones for
different tasks appropriately, whether in array sets, individually,
mirrored, etc.

100% totally feasible to have a separate 15k SCSI drive or two just
purely for doing sorts if it would assist in throughput.

:-)

Regards and best wishes,

Justin Clift


> 
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: Threaded Sorting

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Bingo!  Want to increase sorting performance, give it more I/O
> bandwidth, and it will take 1/100th of the time to do threading.

> Added to TODO:
>     * Allow sorting to use multiple work directories

Yeah, I like that.  Actually it should apply to all temp files not only
sorting.

A crude hack would be to allow there to be multiple pg_temp_NNN/
subdirectories (read symlinks) in a database, and then the code would
automatically switch among these.

Probably a cleaner idea would be to somehow integrate this with
tablespace management --- if you could mark some tablespaces as intended
for temp stuff, the system could round-robin among those as it creates
temp files and/or temp tables.
        regards, tom lane


Re: Threaded Sorting

From
Greg Copeland
Date:
On Fri, 2002-10-04 at 12:26, Bruce Momjian wrote:
> Added to TODO:
>
>     * Allow sorting to use multiple work directories

Why wouldn't that fall under the table space effort???

Greg


Re: Threaded Sorting

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Bingo!  Want to increase sorting performance, give it more I/O
> > bandwidth, and it will take 1/100th of the time to do threading.
> 
> > Added to TODO:
> >     * Allow sorting to use multiple work directories
> 
> Yeah, I like that.  Actually it should apply to all temp files not only
> sorting.
> 
> A crude hack would be to allow there to be multiple pg_temp_NNN/
> subdirectories (read symlinks) in a database, and then the code would
> automatically switch among these.

TODO updated:
* Allow sorting/temp files to use multiple work directories     

Tom, what temp files do we use that aren't for sorting;  I forgot.       
> Probably a cleaner idea would be to somehow integrate this with
> tablespace management --- if you could mark some tablespaces as intended
> for temp stuff, the system could round-robin among those as it creates
> temp files and/or temp tables.

Yes, tablespaces would be the place for this.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
Bruce Momjian
Date:
Greg Copeland wrote:
-- Start of PGP signed section.
> On Fri, 2002-10-04 at 12:26, Bruce Momjian wrote:
> > Added to TODO:
> > 
> >     * Allow sorting to use multiple work directories
> 
> Why wouldn't that fall under the table space effort???

Yes, but we make it a separate item so we are sure that is implemented
as part of tablespaces.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
Greg Copeland
Date:
I see.  I just always assumed that it would be done as part of table
space effort as it's such a defacto feature.

I am curious as to why no one has commented on the other rather obvious
performance enhancement which was brought up in this thread.  Allowing
for parallel sorting seems rather obvious and is a common enhancement
yet seems to of been completely dismissed as people seem to be fixated
on I/O.  Go figure.

Greg



On Fri, 2002-10-04 at 14:02, Bruce Momjian wrote:
> Greg Copeland wrote:
> -- Start of PGP signed section.
> > On Fri, 2002-10-04 at 12:26, Bruce Momjian wrote:
> > > Added to TODO:
> > >
> > >     * Allow sorting to use multiple work directories
> >
> > Why wouldn't that fall under the table space effort???
>
> Yes, but we make it a separate item so we are sure that is implemented
> as part of tablespaces.
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org


Re: Threaded Sorting

From
Bruce Momjian
Date:
Greg Copeland wrote:
-- Start of PGP signed section.
> I see.  I just always assumed that it would be done as part of table
> space effort as it's such a defacto feature.
> 
> I am curious as to why no one has commented on the other rather obvious
> performance enhancement which was brought up in this thread.  Allowing
> for parallel sorting seems rather obvious and is a common enhancement
> yet seems to of been completely dismissed as people seem to be fixated
> on I/O.  Go figure. 

We think we are fixated on I/O because we think that is where the delay
is.  Is there a reason we shouldn't think that?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
Greg Copeland
Date:
Well, that's why I was soliciting developer input as to exactly what
goes on with sorts.  From what I seem to be hearing, all sorts result in
temp files being created and/or used.  If that's the case then yes, I
can understand the fixation.  Of course that opens the door for it being
a horrible implementation.  If that's not the case, then parallel sorts
still seem like a rather obvious route to look into.

Greg


On Fri, 2002-10-04 at 14:15, Bruce Momjian wrote:
> Greg Copeland wrote:
> -- Start of PGP signed section.
> > I see.  I just always assumed that it would be done as part of table
> > space effort as it's such a defacto feature.
> >
> > I am curious as to why no one has commented on the other rather obvious
> > performance enhancement which was brought up in this thread.  Allowing
> > for parallel sorting seems rather obvious and is a common enhancement
> > yet seems to of been completely dismissed as people seem to be fixated
> > on I/O.  Go figure.
>
> We think we are fixated on I/O because we think that is where the delay
> is.  Is there a reason we shouldn't think that?
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073


Re: Threaded Sorting

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom, what temp files do we use that aren't for sorting;  I forgot.

MATERIALIZE plan nodes are the only thing I can think of offhand that
uses a straight temp file.  But ISTM that if this makes sense for
our internal temp files, it makes sense for user-created temp tables
as well.
        regards, tom lane


Re: Threaded Sorting

From
Bruce Momjian
Date:
Greg Copeland wrote:
-- Start of PGP signed section.
> Well, that's why I was soliciting developer input as to exactly what
> goes on with sorts.  From what I seem to be hearing, all sorts result in
> temp files being created and/or used.  If that's the case then yes, I
> can understand the fixation.  Of course that opens the door for it being
> a horrible implementation.  If that's not the case, then parallel sorts
> still seem like a rather obvious route to look into.

We use tape sorts, ala Knuth, meaning we sort in memory as much as
possible, but when there is more data than fits in memory, rather than
swapping, we write to temp files then merge the temp files (aka tapes).

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom, what temp files do we use that aren't for sorting;  I forgot.
> 
> MATERIALIZE plan nodes are the only thing I can think of offhand that
> uses a straight temp file.  But ISTM that if this makes sense for
> our internal temp files, it makes sense for user-created temp tables
> as well.

Yes, I was thinking that, but of course, those are real tables, rather
than just files.  Not sure how clean it will be to mix those in the same
directory.  We haven't in the past.  Is it a good idea?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
Greg Copeland
Date:
On Fri, 2002-10-04 at 14:31, Bruce Momjian wrote:
> We use tape sorts, ala Knuth, meaning we sort in memory as much as
> possible, but when there is more data than fits in memory, rather than
> swapping, we write to temp files then merge the temp files (aka tapes).

Right, which is what I originally assumed.  On lower end systems, that
works great.  Once you allow that people may actually have high-end
systems with multiple CPUs and lots of memory, wouldn't it be nice to
allow for huge improvements on large sorts?  Basically, you have two
ends of the spectrum.  One, where you don't have enough memory and
become I/O bound.  The other is where you have enough memory but are CPU
bound; where potentially you have extra CPUs to spare.  Seems to me they
are not mutually exclusive.

Unless I've missed something, the ideal case is to never use tapes for
sorting.  Which is saying, you're trying to optimize an already less an
ideal situation (which is of course good).  I'm trying to discuss making
it a near ideal use of available resources.  I can understand why
addressing the seemingly more common I/O bound case would receive
priority, however, I'm at a loss as to why the other would be completely
ignored.  Seems to me, implementing both would even work in a
complimentary fashion on the low-end cases and yield more breathing room
for the high-end cases.

What am I missing for the other case to be completely ignored?

Greg


Re: Threaded Sorting

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> ...  But ISTM that if this makes sense for
>> our internal temp files, it makes sense for user-created temp tables
>> as well.

> Yes, I was thinking that, but of course, those are real tables, rather
> than just files.  Not sure how clean it will be to mix those in the same
> directory.  We haven't in the past.  Is it a good idea?

Sure we have --- up till recently, pg_temp files just lived in the
database directory.  I think it was you that added the pg_temp
subdirectory, and the reason you did it was to let people symlink the
temp files to someplace else.  But that's just a zeroth-order
approximation to providing a tablespace facility for these things.
        regards, tom lane


Re: Threaded Sorting

From
Tom Lane
Date:
Greg Copeland <greg@CopelandConsulting.Net> writes:
> ... I can understand why
> addressing the seemingly more common I/O bound case would receive
> priority, however, I'm at a loss as to why the other would be completely
> ignored.

Bruce already explained that we avoid threads because of portability and
robustness considerations.

The notion of a sort process pool seems possibly attractive.  I'm
unconvinced that it's going to be a win though because of the cost of
shoving data across address-space boundaries.  Another issue is that
the sort comparison function can be anything, including user-defined
code that does database accesses or other interesting stuff.  This
would mean that the sort auxiliary process would have to adopt the
database user identity of the originating process, and quite possibly
absorb a whole lot of other context information before it could
correctly/safely execute the comparison function.  That pushes the
overhead up a lot more.

(The need to allow arbitrary operations in the comparison function would
put a pretty substantial crimp on a thread-based approach, too, even if
we were willing to ignore the portability issue.)

Still, if you want to try it out, feel free ... this is an open-source
project, and if you can't convince other people that an idea is worth
implementing, that doesn't mean you can't implement it yourself and
prove 'em wrong.
        regards, tom lane


Re: Threaded Sorting

From
Greg Copeland
Date:
On Fri, 2002-10-04 at 15:07, Tom Lane wrote:
> the sort comparison function can be anything, including user-defined
> code that does database accesses or other interesting stuff.  This

This is something that I'd not considered.

> would mean that the sort auxiliary process would have to adopt the
> database user identity of the originating process, and quite possibly
> absorb a whole lot of other context information before it could
> correctly/safely execute the comparison function.  That pushes the
> overhead up a lot more.

Significantly!  Agreed.

>
> Still, if you want to try it out, feel free ... this is an open-source
> project, and if you can't convince other people that an idea is worth
> implementing, that doesn't mean you can't implement it yourself and
> prove 'em wrong.

No Tom, my issue wasn't if I could or could not convince someone but
rather that something has been put on the table requesting additional
feedback on it's feasibility but had been completely ignored.  Fact is,
I knew I didn't know enough about the implementation details to even
attempt to convince anyone of anything.  I simply wanted to explore the
idea or rather the feasibility of the idea.  In theory, it's a great
idea.  In practice, I had no idea, thus my desire to seek additional
input.  As such, it seems a practical implementation may prove
difficult.  I now understand.  Thank you for taking the take to respond
in a manner that satisfies my curiosity.  That's all I was looking for.
:)

Best Regards,
Greg


Re: Threaded Sorting

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> ...  But ISTM that if this makes sense for
> >> our internal temp files, it makes sense for user-created temp tables
> >> as well.
> 
> > Yes, I was thinking that, but of course, those are real tables, rather
> > than just files.  Not sure how clean it will be to mix those in the same
> > directory.  We haven't in the past.  Is it a good idea?
> 
> Sure we have --- up till recently, pg_temp files just lived in the
> database directory.  I think it was you that added the pg_temp
> subdirectory, and the reason you did it was to let people symlink the
> temp files to someplace else.  But that's just a zeroth-order
> approximation to providing a tablespace facility for these things.

OK, TODO updated:
* Allow sorting, temp files, temp tables to use multiple workdirectories  

FYI, I originally created that directory so a postmaster startup could
clear that dir.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Threaded Sorting

From
"Curtis Faith"
Date:
tom lane writes:
>The notion of a sort process pool seems possibly attractive.  I'm
>unconvinced that it's going to be a win though because of the cost of
>shoving data across address-space boundaries.  

What about splitting out parsing, optimization and plan generation from
execution and having a separate pool of exececutor processes.

As an optimizer finished with a query plan it would initiate execution
by grabbing an executor from a pool and passing it the plan.

This would allow the potential for passing partial plans to multiple
executors so a given query might be split up into three or four pieces
and then executed in parallel with the results passed through a
shared memory area owned by each executor process.

This would allow for potential optimization of sorts without threads or 
incurring the overhead problems you mentioned for a separate sorter
process. The optimizer could do things like split a scan into 3 or 4
pieces before sending it off for execution and then join the pieces
back together.

It could also make complex queries much faster if there are idling CPUs
if the optimizer was updated to take advantage of this.

If we are going to split things apart, then this should be done at a
natural communication boundary right? The code has this logical split
right now anyway so the change would be more natural.

OTOH, there are much bigger fish to fry at the moment, I suspect.

- Curtis


Re: Threaded Sorting

From
Tom Lane
Date:
"Curtis Faith" <curtis@galtair.com> writes:
> What about splitting out parsing, optimization and plan generation from
> execution and having a separate pool of exececutor processes.

> As an optimizer finished with a query plan it would initiate execution
> by grabbing an executor from a pool and passing it the plan.

So different executors would potentially handle the queries from a
single transaction?  How will you deal with pushing transaction-local
state from one to the other?

Even if you restrict it to switching at transaction boundaries, you
still have session-local state (at minimum user ID and SET settings)
to worry about.

Being able to apply multiple CPUs to a single query is attractive,
but I've not yet seen schemes for it that don't look like the extra
CPU power would be chewed up in overhead :-(.
        regards, tom lane


Parallel Executors [was RE: Threaded Sorting]

From
"Curtis Faith"
Date:
tom lane wrote:
> "Curtis Faith" <curtis@galtair.com> writes:
> > What about splitting out parsing, optimization and plan generation from
> > execution and having a separate pool of exececutor processes.
> 
> > As an optimizer finished with a query plan it would initiate execution
> > by grabbing an executor from a pool and passing it the plan.
> 
> So different executors would potentially handle the queries from a
> single transaction?  How will you deal with pushing transaction-local
> state from one to the other?
> 
> Even if you restrict it to switching at transaction boundaries, you
> still have session-local state (at minimum user ID and SET settings)
> to worry about.

Hmmm, what transaction boundaries did you mean? Since we are talking
about single statement parallization, there must be some specific
internal semantics that you believe need isolation. It seems like
we'd be able to get most of the benefit and restrict the parallization
in a way that would preserve this isolation but I'm curious what
you were specifically referring to?

The current transaction/user state seems to be stored in process
global space. This could be changed to be a sointer to a struct 
stored in a back-end specific shared memory area which would be
accessed by the executor process at execution start. The backend
would destroy and recreate the shared memory and restart execution
in the case where an executor process dies much like the postmaster
does with backends now.

To the extent the executor process might make changes to the state,
which I'd try to avoid if possible (don't know if it is), the
executors could obtain locks, otherwise if the executions were 
constrained to isolated elements (changes to different indexes for
example) it seems like it would be possible using an architecture
where you have:

Main Executor: Responsible for updating global meta data from
each sub-executor and assembling the results of multiple executions.
In the case of multiple executor sorts, the main executor would
perform a merge sort on the results of it and it's subordinates
pre-sorted sub-sets of the relation.

Subordinate Executor: Executes sub-plans and returns results or
meta-data update information into front-end shared memory directly.

To make this optimal, the index code would have to be changed to
support the idea of partial scans. In the case of btrees it would
be pretty easy using the root page to figure out what index values
delineated different 1/2's, 1/3's, 1/4's etc. of the index space.

I'm not sure what you'd have to do to support this for table scans as
I don't know the PostgreSQL tuple storage mechanism, yet.

This does not seem like too much architectural complexity or
performance overhead (even for the single executor case) for a big
gain for complex query performance.

> Being able to apply multiple CPUs to a single query is attractive,
> but I've not yet seen schemes for it that don't look like the extra
> CPU power would be chewed up in overhead :-(.

Do you remember specifc overhead problems/issues?

- Curtis


Re: Threaded Sorting

From
Hans-Jürgen Schönig
Date:
Greg Copeland wrote:

>I wouldn't hold your breath for any form of threading.  Since PostgreSQL
>is process based, you might consider having a pool of sort processes
>which address this but I doubt you'll get anywhere talking about threads
>here.
>
>Greg
>
>  
>

I came across the problem yesterday. We thought about SMP and did some 
tests on huge tables. The postmaster was running full speed to get the 
stuff sorted - even on an IDE system.
I asked my friends who are doing a lot of work with Oracle on huge SMP 
machines. I was told that Oracle has a mechanism which can run efficient 
sorts on SMP machines. It seems to speed up sorting a lot.

If we could reduce the time needed to build up an index by 25% it would 
be a wonderful thing. Just think of a scenario:
1 thread: 24 hours
many threads: 18 hours

We  could gain 6 hours which is a LOT.
We have many people running PostgreSQL on systems having wonderful IO 
systems - in this case IO is not the bottleneck anymore.

I had a brief look at the code used for sorting. It is very well 
documented so maybe it is worth thinking about a parallel algorithm.

When talking about threads: A pool of processes for sorting? Maybe this 
could be useful but I doubt if it the best solution to avoid overhead.
Somewhere in the TODO it says that there will be experiments with a 
threaded backend. This make me think that threads are not a big no no.
   Hans

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>



Re: Threaded Sorting

From
Hans-Jürgen Schönig
Date:
Bingo = great :).
The I/O problem seems to be solved :).

A table space concept would be top of the histlist :).

The symlink version is not very comfortable and I think it would be a 
real hack.
Also: If we had a clean table space concept it would be real advantage.
In the first place it would be enough to define a directory (alter 
tablespace, changing sizes etc. could be a lot of work).

How could CREATE TABLESPACE look like?
Personally I like the Oracle Syntax.

Is it already time to work on the parser for CREATE/ALTER/DROP TABLESPACE?
   Hans



Tom Lane wrote:

>Bruce Momjian <pgman@candle.pha.pa.us> writes:
>  
>
>>Bingo!  Want to increase sorting performance, give it more I/O
>>bandwidth, and it will take 1/100th of the time to do threading.
>>    
>>
>
>  
>
>>Added to TODO:
>>    * Allow sorting to use multiple work directories
>>    
>>
>
>Yeah, I like that.  Actually it should apply to all temp files not only
>sorting.
>
>A crude hack would be to allow there to be multiple pg_temp_NNN/
>subdirectories (read symlinks) in a database, and then the code would
>automatically switch among these.
>
>Probably a cleaner idea would be to somehow integrate this with
>tablespace management --- if you could mark some tablespaces as intended
>for temp stuff, the system could round-robin among those as it creates
>temp files and/or temp tables.
>
>            regards, tom lane
>  
>


-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>



Re: Threaded Sorting

From
Hans-Jürgen Schönig
Date:
Threads are not the best solutions when it comes to portability. A 
prefer a process model as well.
My concern was that a process model might be a bit too slow for that but 
if we had processes in memory this would be wonderful thing.
Using it for small amounts of data is pretty useless - I totally agree 
but when it comes to huge amounts of data it can be useful.

It is a mechanism for huge installations with a lot of data.
   Hans



>That was a fork IIRC.  Threading is not used in baseline PostgreSQL nor
>is there any such plans that I'm aware of.  People from time to time ask
>about threads for this or that and are always told what I'm telling
>you.  The use of threads leads to portability issues not to mention
>PostgreSQL is entirely built around the process model.
>
>Tom is right to dismiss the notion of adding additional CPUs to
>something that is already I/O bound, however, the concept it self should
>not be dismissed.  Applying multiple CPUs to a sort operation is well
>accepted and understood technology.
>
>At this point, perhaps Tom or one of the other core developers having
>insight in this area would be willing to address how readily such a
>mechanism could could be put in place.
>
>Also, don't be so fast to dismiss what the process model can do.  There
>is not reason to believe that having a process pool would not be able to
>perform wonderful things if implemented properly.  Basically, the notion
>would be that the backend processing the query would solicit assistance
>from the sort pool if one or more processes were available.  At that
>point, several methods could be employed to divide the work.  Some form
>of threshold would also have to be created to prevent the pool from
>being used when a single backend is capable of addressing the need. 
>Basically the idea is, you only have the pool assist with large tuple
>counts and then, only when resources are available and resource are
>available from within the pool.  By doing this, you avoid additional
>overhead for small sort efforts and gain when it matters the most.
>
>
>Regards,
>
>    Greg
>
>  
>


-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>



Re: Threaded Sorting

From
Hans-Jürgen Schönig
Date:
Threads are bad - I know ...
I like the idea of a pool of processes instead of threads - from my 
point of view this would be useful.

I am planning to run some tests (GEQO, AIX, sorts) as soon as I have 
time to do so (still too much work ahead before :( ...).
If I had time I'd love to do something for the PostgreSQL community :(.

As far as sorting is concerned: It would be fine if it was possible to 
define an alternative location for temporary sort files using SET.
If you had multiple disks this would help in the case of concurrent 
sorts because this way people could insert and index many tables at once 
without having to access just one storage system.
This would be an easy way out of the IO limitation ... - at least for 
some problems.
   Hans



Bruce Momjian wrote:

>
>We haven't thought about it yet because there are too many buggy thread
>implementations.  We are probably just now getting to a point where we
>can consider it.  However, lots of databases have moved to threads for
>all sorts of things and ended up with a royal mess of code.  Threads
>can only improve things in a few areas of the backend so it would be
>nice if we could limit the exposure to threads to those areas;  sorting
>could certainly be one of them, but frankly, I think disk I/O is our
>limiting factore there.  I would be interested to see some tests that
>showed otherwise.
>  
>

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>



Table spaces again [was Re: Threaded Sorting]

From
"Shridhar Daithankar"
Date:
On 4 Oct 2002 at 21:13, Hans-Jürgen Schönig wrote:

> Bingo = great :).
> The I/O problem seems to be solved :).
> 
> A table space concept would be top of the histlist :).
> 
> The symlink version is not very comfortable and I think it would be a 
> real hack.
> Also: If we had a clean table space concept it would be real advantage.
> In the first place it would be enough to define a directory (alter 
> tablespace, changing sizes etc. could be a lot of work).
> 
> How could CREATE TABLESPACE look like?
> Personally I like the Oracle Syntax.

Well. I (hopefully) understand need to get table spaces. But I absolutely hate 
it the way oracle does it.. I am repeating all the points I posted before. 
There was no follow up. I hope I get some on this.

1) It tries to be a volume assuming OS handles volumes inefficiently. Same 
mentality as handling all disk I/O by in it self. May be worth when oracle did 
it but is it worth now?

2) It allows joining multiple volumes for performance reason. If you want to 
join multiple volume for performance, let RAID handle it. Is it job of RDBMS?

3) It puts multiple objets together. Why? I never fully understood having a 
opeque file sitting on drive v/s neatly laid directory structure. I would 
always prefer the directory structure.

Can anybody please tell me in detail.(Not just a pointing towards TODO items)

1) What a table space supposed to offer?

2) What a directory structure does not offer that table space does?

3) How do they compare for advantages/disadvantages..

Oracle familiarity is out. That's not even close to being good merit IMO. If 
postgresql moves to oracle way of doing things, .. well, I won't be as much 
hapy as I am now..

Thanks for your patience..


ByeShridhar

--
Newton's Little-Known Seventh Law:    A bird in the hand is safer than one 
overhead.



Re: Table spaces again [was Re: Threaded Sorting]

From
Hans-Jürgen Schönig
Date:
>
>
>Can anybody please tell me in detail.(Not just a pointing towards TODO items)
>
>1) What a table space supposed to offer?
>

They allow you to define a maximum amount of storage for a certain set 
of data.
They help you to define the location of data.
They help you to define how much data can be used by which ressource.

>2) What a directory structure does not offer that table space does?
>

You need to the command line in order to manage quotas - you might not 
want that.
Quotas are handled differently on ever platform (if available).
With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
Table spaces are a nice abstraction layer to the file system.

>
>3) How do they compare for advantages/disadvantages..
>
>Oracle familiarity is out. That's not even close to being good merit IMO. If 
>postgresql moves to oracle way of doing things, .. well, I won't be as much 
>hapy as I am now..
>
>Thanks for your patience..
>

how would you handle table spaces? just propose it to the hackers' list ...
we should definitely discuss that ...
a bad implementation of table spaces would be painful ...

   Hans

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>




Re: Table spaces again [was Re: Threaded Sorting]

From
"Shridhar Daithankar"
Date:
On 7 Oct 2002 at 15:52, Hans-Jürgen Schönig wrote:

> >Can anybody please tell me in detail.(Not just a pointing towards TODO items)
> >1) What a table space supposed to offer?
> They allow you to define a maximum amount of storage for a certain set 
> of data.

Use quota

> They help you to define the location of data.

Mount/symlink whereever you want assuming database supports one directory per 
object metaphor. This is finer control than tablespaces as tablespaces often 
hosts many objects from possibly many databases. Once a tablespace is created, 
it's difficult to keep track of what all goes on a table space. You look at 
directory structure and you get a clear picture..

> They help you to define how much data can be used by which ressource.

Which resource? I am confused. Disk space is only resource we are talking 
about.

> >2) What a directory structure does not offer that table space does?
> You need to the command line in order to manage quotas - you might not 
> want that.

Mount a directory on a partition. If the data exceeds on that partition, there 
would be disk error. Like tablespace getting overflown. I have seen both the 
scenarios in action..

> Quotas are handled differently on ever platform (if available).

Yeah. But that's sysadmins responsibility not DBA's.

> With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
> Table spaces are a nice abstraction layer to the file system.

Hmm.. And how does that fit in database metaphor? What practical use is that? I 
can't imagine as I am a developer and not a DBA.

> how would you handle table spaces? just propose it to the hackers' list ...
> we should definitely discuss that ...
> a bad implementation of table spaces would be painful ...

I suggest a directory per object structure. A database gets it's own directory, 
an index gets it's own dir. etc. 

In addition to that, postgresql should offer capability to suspend a 
database/table so that they can be moved without restarting database daemon. 

This is to acknowledge the fact that database storage is relocatable. In 
current implementation, database does not offer any assistance in relocating 
the database structure on disk..

Besides postgresql runs a cluster of databases as opposed to single database 
run by likes of oracle. So relocating a single database should not affect 
others.

Additionally postgresql should handle out of disk space errors gracefully 
noting that out of space for an object may not mean out of space for all 
objects. Obviously tablespaces can do better here.

Let's say for each object that gets storage, e.g. database, table, index and 
transaction log, we maintain a flag in metadata to indicate whether it's 
relocated or not. t can offer a command to set this flag. Typically the command 
sequence would look like this

sql> take <object> offline;
---
relocate/symlink/remount the appropriate dir.
--
sql> take <object> online mark relocated;

Postgresql should continue working if a relocated object experiences disk space 
full.

Of course, if postgresql could accept arguments at object creation time for 
alternate directories to symlink to, that would be better than sliced bread.

I believe giving each database it's own transaction log would be a great 
advantage of this scheme.

These are some thoughts how it would work. I believe this course of action 
would make the transition easier, offer a much better granular control over 
object allocation on disk and ultimately would prove useful to users.

I might have lost couple of points as I took long time composing it. But having 
all the possible objections dealt with should be the starting point. 

Thanks once again..

ByeShridhar

--
Cheit's Lament:    If you help a friend in need, he is sure to remember you--    the 
next time he's in need.



Re: Table spaces again [was Re: Threaded Sorting]

From
Hans-Jürgen Schönig
Date:
>>>2) What a directory structure does not offer that table space does?
>>>      
>>>
>>You need to the command line in order to manage quotas - you might not 
>>want that.
>>    
>>
>
>Mount a directory on a partition. If the data exceeds on that partition, there 
>would be disk error. Like tablespace getting overflown. I have seen both the 
>scenarios in action..
>  
>

Of course it can be done somehow. However, with tablespaces it is more 
db-like and you need not be familiar with the operating system itself.
Just think of a company having several different operating systems 
(suns, linux, bsd, ...).
what do you think could be done in this case? my answer would be an 
abstraction layer called table spaces ...

>  
>
>>Quotas are handled differently on ever platform (if available).
>>    
>>
>
>Yeah. But that's sysadmins responsibility not DBA's.
>

Maybe many people ARE the sysadmins of their PostgreSQL box ...
When developing a database with an open mind people should try to see a 
problem from more than just one perspective.
Why should anybody just forget about sysdbas???


>>With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
>>Table spaces are a nice abstraction layer to the file system.
>>    
>>
>
>Hmm.. And how does that fit in database metaphor? What practical use is that? I 
>can't imagine as I am a developer and not a DBA.
>
>  
>

One of our customers did some minor hosting projects with PostgreSQL. 
That's what he wanted to have because it is a practical issue.
a. you don't want to have more than one instance per machine.
b. you want to assign a certain amount of space to a certain user 
without using quotas. just think of administration tools - tablespaces 
are as simple as a select.

per directory is a first step - a good step and a good idea but 
tablespaces are a useful invention. just think of hosting companies, 
hybrid environments, etc ...
tablespaces or not a devil and sysdbas may be developers ...

   Hans


-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>




Re: Table spaces again [was Re: Threaded Sorting]

From
"Shridhar Daithankar"
Date:
On 7 Oct 2002 at 16:49, Hans-Jürgen Schönig wrote:

> >Mount a directory on a partition. If the data exceeds on that partition, there 
> >would be disk error. Like tablespace getting overflown. I have seen both the 
> >scenarios in action..
> Of course it can be done somehow. However, with tablespaces it is more 
> db-like and you need not be familiar with the operating system itself.
> Just think of a company having several different operating systems 
> (suns, linux, bsd, ...).
> what do you think could be done in this case? my answer would be an 
> abstraction layer called table spaces ...

OK. Point noted. Suspended till next point.

> >>Quotas are handled differently on ever platform (if available).
> >Yeah. But that's sysadmins responsibility not DBA's.
> Maybe many people ARE the sysadmins of their PostgreSQL box ...
> When developing a database with an open mind people should try to see a 
> problem from more than just one perspective.
> Why should anybody just forget about sysdbas???

If DBA is sysadmin, does it make a difference if he learnes about mount/ln or 
table spaces. Yes it does. Table spaces are limited to databases but mount/ln 
is useful for any general purpose sysadmin work.

That answers the last point as well, I guess..

> >>With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
> >>Table spaces are a nice abstraction layer to the file system.
> >Hmm.. And how does that fit in database metaphor? What practical use is that? I 
> >can't imagine as I am a developer and not a DBA.
> One of our customers did some minor hosting projects with PostgreSQL. 
> That's what he wanted to have because it is a practical issue.
> a. you don't want to have more than one instance per machine.
> b. you want to assign a certain amount of space to a certain user 
> without using quotas. just think of administration tools - tablespaces 
> are as simple as a select.

Agreed. Perfect point and I didn't thought of it.

But it can be done in directory structure as well. Of course it's quite a 
deviation from what one thinks as plain old directory structure. But if this is 
one point where table spaces win, let's borrow that. There is lot of baggage in 
table spaces that can be left out..

Besides AFAIU, tablespaces implements quota using data files which are pre-
allocated. Pre-claiming space/resource  is the evil of everything likes of 
oracle do and runs in exact opposite direction of postgresql philosophy.

If postgresql has to implement quotas on object, it should do without 
preclaiming space.

Besides if postgresql offers quota on per object basis in directory/object 
scheme, I am sure that's far more granular than tablespaces. Choice is good..
> per directory is a first step - a good step and a good idea but 
> tablespaces are a useful invention. just think of hosting companies, 
> hybrid environments, etc ...
> tablespaces or not a devil and sysdbas may be developers ...

It's not about devil. It's about revaluating need once again. Especially at the 
level of tablespace concept in itself.

ByeShridhar

--
Oblivion together does not frighten me, beloved.        -- Thalassa (in Anne 
Mulhall's body), "Return to Tomorrow",           stardate 4770.3.



Re: Table spaces again [was Re: Threaded Sorting]

From
Tom Lane
Date:
Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> how would you handle table spaces?

The plan that's been discussed simply defines a tablespace as being a
directory somewhere; physical storage of individual tables would remain
basically the same, one or more files under the containing directory.

The point of this being, of course, that the DBA could create the
tablespace directories on different partitions or volumes in order to
provide the behavior he wants.

In my mind this would be primarily a cleaner, more flexible
reimplementation of the existing "database location" feature.
        regards, tom lane


Re: Table spaces again [was Re: Threaded Sorting]

From
Hans-Jürgen Schönig
Date:
>
>
>>>>Quotas are handled differently on ever platform (if available).
>>>>        
>>>>
>>>Yeah. But that's sysadmins responsibility not DBA's.
>>>      
>>>
>>Maybe many people ARE the sysadmins of their PostgreSQL box ...
>>When developing a database with an open mind people should try to see a 
>>problem from more than just one perspective.
>>Why should anybody just forget about sysdbas???
>>    
>>
>
>If DBA is sysadmin, does it make a difference if he learnes about mount/ln or 
>table spaces. Yes it does. Table spaces are limited to databases but mount/ln 
>is useful for any general purpose sysadmin work.
>
>That answers the last point as well, I guess..
>  
>

I agree but still, think of hybrid systems ..

>
>Agreed. Perfect point and I didn't thought of it.
>
>But it can be done in directory structure as well. Of course it's quite a 
>deviation from what one thinks as plain old directory structure. But if this is 
>one point where table spaces win, let's borrow that. There is lot of baggage in 
>table spaces that can be left out..
>
>Besides AFAIU, tablespaces implements quota using data files which are pre-
>allocated. Pre-claiming space/resource  is the evil of everything likes of 
>oracle do and runs in exact opposite direction of postgresql philosophy.
>
>If postgresql has to implement quotas on object, it should do without 
>preclaiming space.
>
>Besides if postgresql offers quota on per object basis in directory/object 
>scheme, I am sure that's far more granular than tablespaces. Choice is good..
> 
>
I didn't think of pre allocation - this is pretty much like Oracle would 
do it.
I was thinking of having a maximum size or something like that.
Overhead such as EXTENDS and things like that don't seem too useful for 
me - that's what a filesystem can be used for.

I agree with Tom: If a tablespace was a directory it would be pretty 
simple and pretty useful. If people could define a maximum size it would 
be more than perfect.
All I think is necessary is:   - having data in different, user defined, locations   - having the chance to define a
maximumsize for that tablespace.
 

Suggestion:
CREATE TABLESPACE: Create a "directory" with a certain size (optional) - 
nothing special here.
ALTER TABLESPACE: resize table space. resizing is possible if the amount 
of data in the tablespace < new size of tablespace
DROP TABLESPACE:  remove table space. the question in this case is - 
what about the objects in the tablespace?   objects can not always be deleted (just think of inheritance and 
parent tables)

>It's not about devil. It's about revaluating need once again. Especially at the 
>level of tablespace concept in itself.
>
>  
>

That's why people should discuss it and think about it :).
People want a good implementation or no implementation :).
This is Open Source - it is designed to be discussed :).
   Hans

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>




Re: Table spaces again [was Re: Threaded Sorting]

From
"Jim Buttafuoco"
Date:
Is this NOT what I have been after for many months now.  I dropped the tablespace/location idea before 7.2 because
that
didn't seem to be any interest.  Please see my past email's for the SQL commands and on disk directory layout I have
proposed.  I have a working 7.2 system with tablespaces/locations (what ever you want to call them,  I like locations
because tablespace are an Oracle thing).  I would like to get this code ported into 7.4.

Jim


> Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> > how would you handle table spaces?
> 
> The plan that's been discussed simply defines a tablespace as being a
> directory somewhere; physical storage of individual tables would remain
> basically the same, one or more files under the containing directory.
> 
> The point of this being, of course, that the DBA could create the
> tablespace directories on different partitions or volumes in order to
> provide the behavior he wants.
> 
> In my mind this would be primarily a cleaner, more flexible
> reimplementation of the existing "database location" feature.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster






Re: Table spaces again [was Re: Threaded Sorting]

From
"Michael Paesold"
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> wrote:
[snip]
> On 7 Oct 2002 at 15:52, Hans-Jürgen Schönig wrote:
[snip]
> > With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
> > Table spaces are a nice abstraction layer to the file system.
>
> Hmm.. And how does that fit in database metaphor? What practical use is
that? I
> can't imagine as I am a developer and not a DBA.

Virtual hosting at ISP's for example.

> I believe giving each database it's own transaction log would be a great
> advantage of this scheme.

Well, if you think of Tom's recent patch (ganged WAL writes), from a
performance point of view, this would only be good if each transaction
log had it's own disk. Otherwise a single transaction log is still better.

I think tablespaces is a good idea. I also prefer associating tablespaces
with directory structures better over the oracle style.

Regards,
Michael Paesold



Re: Parallel Executors [was RE: Threaded Sorting]

From
Jan Wieck
Date:
Curtis Faith wrote:

> The current transaction/user state seems to be stored in process
> global space. This could be changed to be a sointer to a struct
> stored in a back-end specific shared memory area which would be
> accessed by the executor process at execution start. The backend
> would destroy and recreate the shared memory and restart execution
> in the case where an executor process dies much like the postmaster
> does with backends now.
> 
> To the extent the executor process might make changes to the state,
> which I'd try to avoid if possible (don't know if it is), the
> executors could obtain locks, otherwise if the executions were
> constrained to isolated elements (changes to different indexes for
> example) it seems like it would be possible using an architecture
> where you have:

Imagine there is a PL/Tcl function. On the first call in a session, the
PL/Tcl interpreter get's created (that's during execution, okay?). Now
the procedure that's called inside of that interpreter creates a
"global" variable ... a global Tcl variable inside of that interpreter,
which is totally unknown to the backend since it doesn't know what Tcl
is at all and that variable is nothing than an entry in a private hash
table inside of that interpreter. On a subsequent call to any PL/Tcl
function during that session, it might be good if that darn hashtable
entry exists.

How do you propose to let this happen?

And while at it, the Tcl procedure next calls spi_exec, causing the
PL/Tcl function handler to call SPI_exec(), so your isolated executor
all of the sudden becomes a fully operational backend, doing the
parsing, planning and optimizing, or what?


Jan

-- 

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Parallel Executors [was RE: Threaded Sorting]

From
"Curtis Faith"
Date:
> Curtis Faith wrote:
>
> > The current transaction/user state seems to be stored in process
> > global space. This could be changed to be a sointer to a struct
> > stored in a back-end specific shared memory area which would be
> > accessed by the executor process at execution start. The backend
> > would destroy and recreate the shared memory and restart execution
> > in the case where an executor process dies much like the postmaster
> > does with backends now.
> >
> > To the extent the executor process might make changes to the state,
> > which I'd try to avoid if possible (don't know if it is), the
> > executors could obtain locks, otherwise if the executions were
> > constrained to isolated elements (changes to different indexes for
> > example) it seems like it would be possible using an architecture
> > where you have:

Jan Wieck replied:
> Imagine there is a PL/Tcl function. On the first call in a session, the
> PL/Tcl interpreter get's created (that's during execution, okay?). Now
> the procedure that's called inside of that interpreter creates a
> "global" variable ... a global Tcl variable inside of that interpreter,
> which is totally unknown to the backend since it doesn't know what Tcl
> is at all and that variable is nothing than an entry in a private hash
> table inside of that interpreter. On a subsequent call to any PL/Tcl
> function during that session, it might be good if that darn hashtable
> entry exists.
>
> How do you propose to let this happen?
>
> And while at it, the Tcl procedure next calls spi_exec, causing the
> PL/Tcl function handler to call SPI_exec(), so your isolated executor
> all of the sudden becomes a fully operational backend, doing the
> parsing, planning and optimizing, or what?

You bring up a good point, we couldn't do what I propose for all
situations. I had never anticipated that splitting things up would be the
rule. For example, the optimizer would have to decide whether it made sense
to split up a query from a strictly performance perspective. So now, if we
consider the fact that some things could not be done with split backend
execution, the logic becomes:

if ( splitting is possible && splitting is faster )do the split execution;
elsedo the normal execution;

Since the design already splits the backend internally into a separate
execution phase, it seems like one could keep the current current
implementation for the typical case where splitting doesn't buy anything or
cases where there is complex state information that needs to be maintained.
If there are no triggers or functions that will be accessed by a given
query then I don't see your concerns applying.

If there are triggers or other conditions which preclude multi-process
execution, we can keep exactly the same behavior as now. The plan execution
entry could easily be a place where it either A) did the same thing it
currently does or B) passed execution off to a pool as per the original
proposal.

I have to believe that most SELECTs won't be affected by your concerns.
Additionally, even in the case of an UPDATE, many times there are large
portions of the operation's actual work that wouldn't be affected even if
there are lots of triggers on the tables being updated. The computation of
the inside of the WHERE could often be split out without causing any
problems with context or state information. The master executor could
always be the original backend as it is now and this would be the place
where the UPDATE part would be processed after the WHERE tuples had been
identified.

As with any optimization, it is more complicated and won't handle all the
cases. It's just an idea to handle common cases that would otherwise be
much slower.

That having been said, I'm sure there are much lower hanging fruit on the
performance tree and likely will be for a little while.

- Curtis



Re: Table spaces again [was Re: Threaded Sorting]

From
Hans-Jürgen Schönig
Date:
Jim Buttafuoco wrote:

>Is this NOT what I have been after for many months now.  I dropped the tablespace/location idea before 7.2 because
that
>didn't seem to be any interest.  Please see my past email's for the SQL commands and on disk directory layout I have
>proposed.  I have a working 7.2 system with tablespaces/locations (what ever you want to call them,  I like locations
>because tablespace are an Oracle thing).  I would like to get this code ported into 7.4.
>
>Jim
>
>
>  
>
>>Hans-Jürgen Schönig <postgres@cybertec.at> writes:
>>    
>>
>>>how would you handle table spaces?
>>>      
>>>
>>The plan that's been discussed simply defines a tablespace as being a
>>directory somewhere; physical storage of individual tables would remain
>>basically the same, one or more files under the containing directory.
>>
>>The point of this being, of course, that the DBA could create the
>>tablespace directories on different partitions or volumes in order to
>>provide the behavior he wants.
>>
>>In my mind this would be primarily a cleaner, more flexible
>>reimplementation of the existing "database location" feature.
>>
>>            
>>

wow :)
can we have the patch? i'd like to try it with my 7.2.2 :).
is it stable?

how did you implement it precisely? is it as you have proposed it?
   Hans


-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at 
<http://cluster.postgresql.at>, www.cybertec.at 
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>