Thread: Index/Function organized table layout

Index/Function organized table layout

From

James Rogers

Date:

01 October 2003, 03:34:15

Hi,

I am running a large and mature mission-critical PostgreSQL implementation
with multiple tables that are growing at a rate of several million rows per
month (the rate of addition is growing as well).  It is a 24x7 operation, so
downtime is a party foul and apparent performance has to be reasonably good
all the time. 

The problem: My working set is typically several million rows (and growing)
at any one time, which has a tendency to thrash the buffers mercilessly.
Records are inserted in an order that does not reflect typical retrieval
such that a typical query has to hit many, many blocks to collect all the
rows in a given range query.  CLUSTER isn't cutting it (more below).
Definitely sub-optimal.

Now, I've actually hacked commercial MVCC engines back in the day, and am
comfortable playing around in database internals.  I have an "itch to
scratch" for improving the scalability of Really Large Tables by explicitly
allowing control of table layouts as an optional property of the tables.  I
would like some feedback as to whether this is practical given the current
architecture; it appears that it is, but I'm not intimately familiar with it
(yet).

1.) B-tree organized tables.  The primary key index also contains the entire
row.  Upside: Very fast retrieval for indexed queries over large data sets
and very efficient buffer usage.  Downside:  In practice, the primary key
index can be the only index on the table, and the locking is more like an
index than data page.  I've actually used this feature a lot for very large
tables in Oracle implementations, and you can get substantial speed
improvements versus normal indexed tables if used correctly on large tables.
Vastly more convenient for a 24x7 operation than using the CLUSTER command,
which creates nasty contention, doesn't automatically order the table on
insert/update (i.e. it has to be run regularly), and basically ends up with
an index that is redundant.

2.) Function/Hash organized tables (aka Partitioning).  Partitioning tuples
across blocks according to a user provided function is the second target on
my short list.  Doing this well (i.e. being able to take individual
partitions offline or setting them read-only) takes a hell of a lot of work,
but having a table that internally clustered its records according to a user
provided hash function is a good start.  This isn't a beginner feature
(stupidly designed transactions can be quite costly on partitioned tables),
but it is a very nice tool to have.  True partitioning would be
enterprise-grade gold, but that target may be a bit high for now.


Both of these things really are attempts to address the same basic problem,
which is optimizing the number of buffers a given query uses by making the
tables layout reflect typical queries.  Given the size of our typical
working sets, optimizing layout greatly improves performance by reducing
buffer thrashing.

I would very much like to work on improving this type of feature (automatic
table layout optimization), but I thought I would check with the people
currently hacking on the system first, to see if there was a showstopper or
if someone is already working on this.

Cheers,

-James Rogersjamesr@best.com

Re: Index/Function organized table layout

From

Alvaro Herrera

Date:

01 October 2003, 12:22:57

On Tue, Sep 30, 2003 at 11:31:26PM -0700, James Rogers wrote:

> The problem: My working set is typically several million rows (and growing)
> at any one time, which has a tendency to thrash the buffers mercilessly.
> Records are inserted in an order that does not reflect typical retrieval
> such that a typical query has to hit many, many blocks to collect all the
> rows in a given range query.  CLUSTER isn't cutting it (more below).
> Definitely sub-optimal.

I have a situation that is very similar to yours, and my problems are
very similar to yours.  I hope some of your ideas are implementable in
some way, because I think they would solve some of my problems too.

> 1.) B-tree organized tables.  The primary key index also contains the
> entire row.

I think this is called a clustered index on some other database systems.
Basically you want to replace the content of the btree item with the
whole tuple, instead of the pointer to the heap element which contains
the tuple.  One thing to keep in mind is that index tuples are limited
to BLCKSZ/3 IIRC, so this limits the size of tuples that can be put in
such a "table".  TOASTing may help alleviate this problem.

As for other indexes, I'm not sure why you say this precludes the use of
other indexes.  The only thing they have to do is keep pointers to index
elements, instead of heap elements.  Doesn't sound impossible to me.

Another thing to keep in mind:  L&Y requires unique keys.  In the
current btree implementation this is worked around using the
pointers-to-heap (ctid?) to differentiate items that have the same key.
If you use a clustered index you won't have this pointer; maybe it will
be required that this index is marked UNIQUE.  This may not be a too
onerous restriction, given that the index is a primary key after all.

I don't have anything to say about 2).

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)

Re: Index/Function organized table layout

From

Tom Lane

Date:

01 October 2003, 12:38:40

James Rogers <jamesr@best.com> writes:
> Now, I've actually hacked commercial MVCC engines back in the day, and am
> comfortable playing around in database internals.  I have an "itch to
> scratch" for improving the scalability of Really Large Tables by explicitly
> allowing control of table layouts as an optional property of the tables.  I
> would like some feedback as to whether this is practical given the current
> architecture; it appears that it is, but I'm not intimately familiar with it

I think you'd need to do some basic architectural work first.  Right now
we have a clean API for index access methods, but there is no comparable
abstraction layer for heaps (tables).  It'd probably be necessary to
create such a layer in order to allow different heap organizations to be
supported.  Another point of confusion is that heaps and indexes are
rigidly distinct.  Perhaps some heaps could be considered to be indexes
as well, in order to support your idea of b-tree-organized tables.
Doing that would undoubtedly break a few places though.

> Both of these things really are attempts to address the same basic problem,
> which is optimizing the number of buffers a given query uses by making the
> tables layout reflect typical queries.

Hm, are you sure that smarter buffer management wouldn't serve the
purpose?
        regards, tom lane

Re: Index/Function organized table layout

From

Tom Lane

Date:

01 October 2003, 12:50:48

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> As for other indexes, I'm not sure why you say this precludes the use of
> other indexes.  The only thing they have to do is keep pointers to index
> elements, instead of heap elements.  Doesn't sound impossible to me.

However, btree feels free to move index entries around while inserting
other entries.  I'm not sure that you could usefully use "ctid" as an
identifier for tuples in a btree-organized heap.  This will break more
things than just other indexes :-( ... UPDATE chaining for one example.

> Another thing to keep in mind:  L&Y requires unique keys.  In the
> current btree implementation this is worked around using the
> pointers-to-heap (ctid?) to differentiate items that have the same key.
> If you use a clustered index you won't have this pointer; maybe it will
> be required that this index is marked UNIQUE.

IIRC this assumption is really only used when re-finding the parent
downlink after a page split, and so we were able to get rid of it by
looking for the downlink by child block number, without using the key at
all.  So that part doesn't bother me.  The mutability of index ctids
seems like a much worse problem.
        regards, tom lane

Re: Index/Function organized table layout

From

Alvaro Herrera

Date:

01 October 2003, 13:30:51

On Wed, Oct 01, 2003 at 11:37:38AM -0400, Tom Lane wrote:
> James Rogers <jamesr@best.com> writes:

> > Both of these things really are attempts to address the same basic problem,
> > which is optimizing the number of buffers a given query uses by making the
> > tables layout reflect typical queries.
> 
> Hm, are you sure that smarter buffer management wouldn't serve the
> purpose?

It doesn't help when there a lot of access locality in searching.  In my
case I want to select some thousands of records that were inserted very
apart from each other, but are logically very near.  Having this
pseudoheap that is ordered by definition helps very much with the
selection; the current heap requires me to bring to buffers lots of
uninteresting tuples, whichever buffer management algorithm is used,
because they are in the same page as interesting tuples.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)

Re: Index/Function organized table layout

From

James Rogers

Date:

02 October 2003, 14:51:58

On Wed, 2003-10-01 at 09:29, Alvaro Herrera wrote:
> On Wed, Oct 01, 2003 at 11:37:38AM -0400, Tom Lane wrote:
> > Hm, are you sure that smarter buffer management wouldn't serve the
> > purpose?
> 
> It doesn't help when there a lot of access locality in searching.  In my
> case I want to select some thousands of records that were inserted very
> apart from each other, but are logically very near.  Having this
> pseudoheap that is ordered by definition helps very much with the
> selection; the current heap requires me to bring to buffers lots of
> uninteresting tuples, whichever buffer management algorithm is used,
> because they are in the same page as interesting tuples.

Yes, what Alvaro said.

For very large tables that routinely run modest range queries, it can be
very expensive in terms of cache efficiency if tuples that are closely
grouped and ordered logically are scattered throughout the heap.  The
requirement to buffer a lot of unrelated data for typical case queries
can greatly reduce the cache hit rate if the active portion of the data
is already quite large relative to the physical RAM available.

To give a real world example, a standard query on one of our tables that
has not been CLUSTER-ed recently (i.e. within the last several days)
generates an average of ~2,000 cache misses.  Recently CLUSTER-ed, it
generates ~0 cache misses on average.  Needless to say, one is *much*
faster than the other.  The problem is that the number of buffers
required to satisfy this query with the tuples scattered is enough to
make it swap out the buffers of another competing query on another table
that is also running.  The result is that performance grinds to a halt
as processes are competing with each other and trying to swap out each
others buffers, resulting in a lot less *actual* buffering than should
be occurring given the amount of data actually being queried.

In my case, not only does CLUSTER-ing increase the number of concurrent
queries possible without disk thrashing by an integer factor, but the
number of buffers touched on a query that generates a cache misses is
greatly reduced as well.  The problem is that CLUSTER-ing is costly and
index-organizing some of the tables would reduce the buffer needs, since
the index tuple in these cases are almost as large as the heap tuples
they reference.

The classic scenario for this is when you have a large collection of
time-series data stored in a table, with each series keyed to another
table.  The the typical tuple distribution creates pathological
behaviors when buffer space becomes tight.

Cheers,

-James Rogersjamesr@best.com

Re: Index/Function organized table layout

From

James Rogers

Date:

02 October 2003, 15:25:24

On Wed, 2003-10-01 at 08:37, Tom Lane wrote:
> I think you'd need to do some basic architectural work first.  Right now
> we have a clean API for index access methods, but there is no comparable
> abstraction layer for heaps (tables).  It'd probably be necessary to
> create such a layer in order to allow different heap organizations to be
> supported.  Another point of confusion is that heaps and indexes are
> rigidly distinct.  Perhaps some heaps could be considered to be indexes
> as well, in order to support your idea of b-tree-organized tables.
> Doing that would undoubtedly break a few places though.

Ahhh, okay.  So it would require an architectural mod.

Still, it would probably be a worthwhile project (for me at least). 
There are a number of enterprise features that rely on a similar
architectural premise that would also be good to have eventually.  We
are at the point at this company where we need some of these features in
our databases for scalability and management purposes, but we aren't all
that eager to migrate to Oracle (the alternative).  From my standpoint,
it would be substantially cheaper to actively work on adding the
features needed to Postgres than to move to Oracle to get the same
features.

I clearly need to take a good look at the relevant source.

> > Both of these things really are attempts to address the same basic problem,
> > which is optimizing the number of buffers a given query uses by making the
> > tables layout reflect typical queries.
> 
> Hm, are you sure that smarter buffer management wouldn't serve the
> purpose?

No, the problem isn't really buffer management; the buffer usage
behaviors are largely valid for the situation.  The problem is that the
number of relevant tuples per buffer is low for a given query, so we
aren't getting much tuple bang for the buffer buck.  I suppose one could
look at it as trying to improve the intra-buffer hit rate for a given
query.

Cheers,

-James Rogersjamesr@best.com

Re: Index/Function organized table layout

From

Hannu Krosing

Date:

02 October 2003, 16:10:29

James Rogers kirjutas N, 02.10.2003 kell 20:50:

> To give a real world example, a standard query on one of our tables that
> has not been CLUSTER-ed recently (i.e. within the last several days)
> generates an average of ~2,000 cache misses.  Recently CLUSTER-ed, it
> generates ~0 cache misses on average.  Needless to say, one is *much*
> faster than the other. 

So what you really need is the CLUSTER command to leave pages half-empty
and the tuple placement logic on inserts/updates to place new tuples
near the place where they would be placed by CLUSTER. I.e. the code that
does actual inserting should be aware of CLUSTERing.

I guess that similar behaviour (half-empty pages, or even each second
page empty which is better as it creates less dirty buffers) could also
significantly speed up updates on huge number of tuples, as then code
could always select a place near the old one and thus avoid needless
head-movements between reading and writing areas.

> In my case, not only does CLUSTER-ing increase the number of concurrent
> queries possible without disk thrashing by an integer factor, but the
> number of buffers touched on a query that generates a cache misses is
> greatly reduced as well.  The problem is that CLUSTER-ing is costly and
> index-organizing some of the tables would reduce the buffer needs, since
> the index tuple in these cases are almost as large as the heap tuples
> they reference.

True, but my above suggestion would be much easier to implement
near-term. It seems to be a nice incremental improvement just needs 
to touch places:

1. selecting where new tuples go : 
 * updated ones go near old ones if not clustered and near the place   CLUSTER would place them if clustered. 
 * inserted ones go to the less than half-empty pages if not clustered   and near the place CLUSTER would place them if
clustered.
 

2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
in pages for clustered updates/inserts.

the "half" above could of course mean anything from 10% to 95% depending
on access patterns.

---------------------
Hannu

Re: Index/Function organized table layout

From

James Rogers

Date:

02 October 2003, 17:45:42

On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote:
> So what you really need is the CLUSTER command to leave pages half-empty
> and the tuple placement logic on inserts/updates to place new tuples
> near the place where they would be placed by CLUSTER. I.e. the code that
> does actual inserting should be aware of CLUSTERing.

Not exactly. What you are describing is more akin to partitioning or
hash-organized tables i.e. sorting insert/update tuples to various pages
according to some hash function.  This works well if you a table is
composed of many well-defined working sets but the access and query
patterns for a given working set are varied.  A common example of this
is for large accounting tables, which are frequently partitioned by a
function which truncates the record timestamp to the year/month.  Note
that indexes can be partitioned as well, so that as long as you are
doing queries within a given month (using the accounting example), the
query performance is generally what you would expect if the entire table
was the size of one month's data, no matter how big the table actually
gets.

Partitions and hash-organized tables (for RDBMS that support them) are
often handled internally as multiple tables masquerading as one (kind of
like a view), with the hash function determining which table is actually
being accessed.  This allows the possibility of doing things like
locking individual partitions or setting them "read-only", and generally
having the option of treating them as individual tables for
administrative purposes e.g. deleting a partition of old unused data
without adversely affecting the "active" partition in the way it would
if they were truly one table, even if they look like one table from a
SQL-level view.

What I really need in my specific case is B-Tree organized tables,
though hashing/partitioning would help too.  It *is* a similar kind of
problem.

B-Tree organized tables basically make the table and the primary index
the same thing, and tend to be most useful for tables that use a single
multi-column primary key index for queries.  This has the effect of
putting all the tuples for a typical query in the same small number of
heap pages (and therefore buffers), allowing very efficient access in
the typical case with the caveat that non-indexed queries will be quite
slow.

B-Tree organized tables are particularly useful when the insert order is
orthogonal to the typical query order.  As I mentioned before, tables
that store parallel collections of time-series data are classic
examples.  Often the data is inserted into the pages in order that can
roughly be described as (timestamp, id), but is queried using (id,
timestamp) as the index.  If you have enough ids, you end up with the
pathological case where you typically have one relevant tuple per page
for a given query.

> True, but my above suggestion would be much easier to implement
> near-term. It seems to be a nice incremental improvement just needs 
> to touch places:
> 
> 1. selecting where new tuples go : 
> 
>   * updated ones go near old ones if not clustered and near the place
>     CLUSTER would place them if clustered. 
> 
>   * inserted ones go to the less than half-empty pages if not clustered
>     and near the place CLUSTER would place them if clustered. 
> 
> 2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
> in pages for clustered updates/inserts.

The nuisance would be keeping track of which pages are collecting which
tuples.  Knowing the CLUSTER index doesn't tell you much about which
pages would currently be a good place to put a new tuple.  You could
always markup the index that CLUSTER uses to keep track of good
candidates (plus some additional structures), but the more I think about
that, the more it looks like a nasty hack.

Cheers,

-James Rogersjamesr@best.com

Re: Index/Function organized table layout

From

Alvaro Herrera Munoz

Date:

02 October 2003, 18:09:37

On Thu, Oct 02, 2003 at 10:09:12PM +0300, Hannu Krosing wrote:

> So what you really need is the CLUSTER command to leave pages half-empty
> and the tuple placement logic on inserts/updates to place new tuples
> near the place where they would be placed by CLUSTER. I.e. the code that
> does actual inserting should be aware of CLUSTERing.

Oh, you mean like what I half-proposed in

http://archives.postgresql.org/pgsql-general/2002-06/msg00968.php

BTW, this is the one message (the one on the CVS log I cite at the end)
that got me into Postgres hacking.  The little involvement I've had has
been great.  Thanks to all PostgreSQL hackers!

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"El realista sabe lo que quiere; el idealista quiere lo que sabe" (Anonimo)

Re: Index/Function organized table layout

From

Hannu Krosing

Date:

03 October 2003, 03:35:39

James Rogers kirjutas N, 02.10.2003 kell 23:44:
> On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote:
> > So what you really need is the CLUSTER command to leave pages half-empty
> > and the tuple placement logic on inserts/updates to place new tuples
> > near the place where they would be placed by CLUSTER. I.e. the code that
> > does actual inserting should be aware of CLUSTERing.
> 
> 
> Not exactly. What you are describing is more akin to partitioning or
> hash-organized tables i.e. sorting insert/update tuples to various pages
> according to some hash function.

What I actually thought I was describing is how CLUSTER should work in a
postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
whole feature.

> B-Tree organized tables basically make the table and the primary index
> the same thing, and tend to be most useful for tables that use a single
> multi-column primary key index for queries.  This has the effect of
> putting all the tuples for a typical query in the same small number of
> heap pages (and therefore buffers), allowing very efficient access in
> the typical case with the caveat that non-indexed queries will be quite
> slow.

AFAICS we could resolve this problem (querying indexes only) by keeping
a copy of visibility info (tmin,tmax,...) in index tuples. This would
make index updates bigger and thus slower, so this should be optional.

If you then put all fields in primary key, then the main table could be
dropped. If there is no "data" table then no other indexes would then be
allowed, or they must be "double-indexes" referencing the primary key,
not tuple and thus even bigger ...

> B-Tree organized tables are particularly useful when the insert order is
> orthogonal to the typical query order.  As I mentioned before, tables
> that store parallel collections of time-series data are classic
> examples.  Often the data is inserted into the pages in order that can
> roughly be described as (timestamp, id), but is queried using (id,
> timestamp) as the index.  If you have enough ids, you end up with the
> pathological case where you typically have one relevant tuple per page
> for a given query.

But if we had clustered the table on (id, timestamp), then the data
would be in right order for queries, if cluster worked well.

> The nuisance would be keeping track of which pages are collecting which
> tuples.  Knowing the CLUSTER index doesn't tell you much about which
> pages would currently be a good place to put a new tuple.  You could
> always markup the index that CLUSTER uses to keep track of good
> candidates (plus some additional structures), but the more I think about
> that, the more it looks like a nasty hack.

Yeah, index-organized tables seems exact fit for your problem, but then
my abstract idea of what clustering should do is exactly that - keep
tuples in roughly the same order as an index ;)

So what really is needed is a smart tuple-placer which can keep tuples
that are close (as defined by index) together in a small number of
pages. These pages themselves need not be coninuous, they can be
sprinkled around in the whole data table, but they need to stay clusters
of index-close tuples.
------------
Hannu

Re: Index/Function organized table layout

From

James Rogers

Date:

04 October 2003, 16:22:35

On 10/2/03 11:34 PM, "Hannu Krosing" <hannu@tm.ee> wrote:
> James Rogers kirjutas N, 02.10.2003 kell 23:44:
>> Not exactly. What you are describing is more akin to partitioning or
>> hash-organized tables i.e. sorting insert/update tuples to various pages
>> according to some hash function.
> 
> What I actually thought I was describing is how CLUSTER should work in a
> postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
> whole feature.

Yeah, I can see this.  "Clustering" doesn't really imply ordering, just
tuple proximity in the heap.  Loosely implemented by default (i.e. only
grouping a tuple if it is cheap to do so at insert/update time) on a table's
primary key might give measurable query performance improvement in the
typical case without impacting the average performance of queries that use
other indexes.  

Even if the tuples were not ordered in the heap, tuple proximity in the heap
would solve a significant percentage of the performance issue that
index-organized tables solve i.e. greatly improved cache efficiency.

> AFAICS we could resolve this problem (querying indexes only) by keeping
> a copy of visibility info (tmin,tmax,...) in index tuples. This would
> make index updates bigger and thus slower, so this should be optional.

It wouldn't be a feature you would want to use anyway if actually indexing a
table.  But yes, adding the necessary support to make an index page
masquerade as a proper heap page would be most of the effort.  That and
making the SQL execution efficiently make use of the fact that the index and
the table are the same relation with two different interfaces, such that you
don't have to modify or access "both" when using either one.

> If you then put all fields in primary key, then the main table could be
> dropped. If there is no "data" table then no other indexes would then be
> allowed, or they must be "double-indexes" referencing the primary key,
> not tuple and thus even bigger ...

Putting an index on an index is pretty horrible.  But quite frankly, if you
actually require a second index on a table, then B-Tree organized tables are
not what you need.

> But if we had clustered the table on (id, timestamp), then the data
> would be in right order for queries, if cluster worked well.

Right, and it doesn't even have to be in perfect index order really, as what
I'm really trying to do is significantly decrease the amount of buffers
required for a typical query.  Having to order the tuples later is a small
matter.

> Yeah, index-organized tables seems exact fit for your problem, but then
> my abstract idea of what clustering should do is exactly that - keep
> tuples in roughly the same order as an index ;)

It would be a good approximation of what index-organizing aims to
accomplish.

> So what really is needed is a smart tuple-placer which can keep tuples
> that are close (as defined by index) together in a small number of
> pages. These pages themselves need not be coninuous, they can be
> sprinkled around in the whole data table, but they need to stay clusters
> of index-close tuples.

More or less, yes.

Cheers,

-James Rogersjamesr@best.com

Re: Index/Function organized table layout

From

Greg Stark

Date:

04 October 2003, 18:18:40

James Rogers <jamesr@best.com> writes:

> On 10/2/03 11:34 PM, "Hannu Krosing" <hannu@tm.ee> wrote:
> > 
> > What I actually thought I was describing is how CLUSTER should work in a
> > postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
> > whole feature.
> 
> 
> Yeah, I can see this.  "Clustering" doesn't really imply ordering, just
> tuple proximity in the heap.  

So I'm a bit confused about the term "Clustering". It seems Postgres uses it
to mean simply ordering the tuple storage within an otherwise normal table.

However in other databases it seems to mean something more complex. I've never
used it in Oracle, but from what I read it seems Oracle thinks "clustering"
means storing the tuples for one table in the heap for *another* table
entirely.

The typical usage scenario envisioned is to store the tuples for child records
near the parent record in a related table. Ie, say you have a users table with
an ACL list, each user has possibly many ACL entries on the ACL list. You
could cluster the ACL entry list onto the users table so that the entries for
a given user would be stored *in the users table heap* near the record for
that user.

I've never seen anyone use this feature, and I never seriously considered it
myself. It sort of has the feel of an antiquated feature that traded too much
flexibility and abstraction for raw performance on very slow disk hardware. 
However I wonder if the "nested tables" feature doesn't use it under the hood
though. It seems they would both be useful for the same types of tables.

I'm not sure what this means for Postgres. I'm not sure if Postgres should use
a different name to avoid confusion and possibly to leave room in the future
for the possibility of supporting something like this. Or perhaps something
like this would be useful for Postgres now or in the near future? Or perhaps
the consensus is as I said, that this is an old idea that no longer gets any
respect and postgres should just pretend it doesn't exist?

-- 
greg

Re: Index/Function organized table layout

From

Tom Lane

Date:

04 October 2003, 18:55:43

Greg Stark <gsstark@mit.edu> writes:
> So I'm a bit confused about the term "Clustering". It seems Postgres uses it
> to mean simply ordering the tuple storage within an otherwise normal table.

> However in other databases it seems to mean something more complex.

My take is that "clustering" means not only placing related tuples near
each other, but taking steps to preserve that organization over time.
PG is really misusing the term because our current CLUSTER command does
the first but not the second.

If tuple insertions were to try to preserve the heap ordering induced
by the latest CLUSTER command, then I think we'd have something closer
to what is usually meant by the term.

> I've never used it in Oracle, but from what I read it seems Oracle
> thinks "clustering" means storing the tuples for one table in the heap
> for *another* table entirely.

I think that's an implementation detail rather than the fundamental
meaning.
        regards, tom lane

Re: Index/Function organized table layout

From

Hannu Krosing

Date:

05 October 2003, 07:11:05

Greg Stark kirjutas P, 05.10.2003 kell 00:17:

> I've never seen anyone use this feature, and I never seriously considered it
> myself. It sort of has the feel of an antiquated feature that traded too much
> flexibility and abstraction for raw performance on very slow disk hardware. 

Read "A Conversation with Jim Gray" referenced from this slashdot
article: 
http://slashdot.org/article.pl?sid=03/09/17/1246255&mode=thread&tid=126
for info on how disk drives are slower than ever (relatively), and how
one should treat them as such, especially for large data volumes.

> However I wonder if the "nested tables" feature doesn't use it under the hood
> though. It seems they would both be useful for the same types of tables.
> 
> I'm not sure what this means for Postgres. I'm not sure if Postgres should use
> a different name to avoid confusion and possibly to leave room in the future
> for the possibility of supporting something like this. Or perhaps something
> like this would be useful for Postgres now or in the near future? Or perhaps
> the consensus is as I said, that this is an old idea that no longer gets any
> respect and postgres should just pretend it doesn't exist?

We can't pretend CLUSTER does not exist until we have some better
technology to offer instead.

------------
Hannu

Re: Index/Function organized table layout

From

Mike Mascari

Date:

05 October 2003, 08:13:12

Hannu Krosing wrote:
> Greg Stark kirjutas P, 05.10.2003 kell 00:17:
> 
>>I've never seen anyone use this feature, and I never seriously considered it
>>myself. It sort of has the feel of an antiquated feature that traded too much
>>flexibility and abstraction for raw performance on very slow disk hardware. 
> 
> 
> Read "A Conversation with Jim Gray" referenced from this slashdot
> article: 
> http://slashdot.org/article.pl?sid=03/09/17/1246255&mode=thread&tid=126
> for info on how disk drives are slower than ever (relatively), and how
> one should treat them as such, especially for large data volumes.

Too bad PostgreSQL is misspelled ("Postgress") and MySQL dominates the
open source discussion. And the MySQL questions are coming from:

"David Patterson, who holds the Pardee Chair of Computer Science at
the University of California at Berkeley."

Outrageous! :-)

Mike Mascari
mascarm@mascari.com

Architecture Roadmap?

From

James Rogers

Date:

06 October 2003, 19:48:54

Hi,

I've been going through the source tree (pretty nice, BTW) and have
found a number of things I'd like to work on.  My primary interest is in
adding performance and scalability features to the underlying engine. 
However, one thing I haven't found is a roadmap/schedule for major
architectural features i.e. while some of the features could be
incrementally hacked into the current source pretty transparently,
others might generate a bigger footprint.  I guess what I am wondering
is whether there is any kind of quasi-official mechanism for putting
certain features in certain future versions depending on the nature and
significance of implementing those features and the number of things it
touches.

Features I am interested in working on in the relative short-term:

* Buffer Management - A more optimal/adaptive cache replacement algorithm - Use of fine-grained adaptive latching to
increaseefficiency on   SMP systems (particularly 4+ processor systems) - Add a background cache writer process to
opportunisticallyflush   dirty buffers
 

* Heap Organization - Hash partitioned heaps - BTree organized heaps


While these should generate some performance improvements on smaller
databases, more important to me is that it will go along way to making
it scale more like Oracle when dealing with very large and active
systems (my current problem). It has been a few years, but I've done
implementations of most of these features in software before.

So I guess I have two questions:

1) Does anyone object to me working on these two areas?
2) What version target should I realistically be shooting for?


Cheers,

-James Rogersjamesr@best.com

Re: Architecture Roadmap?

From

Bruce Momjian

Date:

06 October 2003, 19:56:10

James Rogers wrote:
> While these should generate some performance improvements on smaller
> databases, more important to me is that it will go along way to making
> it scale more like Oracle when dealing with very large and active
> systems (my current problem). It has been a few years, but I've done
> implementations of most of these features in software before.
> 
> So I guess I have two questions:
> 
> 1) Does anyone object to me working on these two areas?

Sounds good.  I assume you have read the developers FAQ.  I would ask on
the list about each item to we can give you tips and get you in touch
with anyone else who knows about that topic.

> 2) What version target should I realistically be shooting for?

They would be all 7.5.  We are only doing bug fixes now.  There is no
limit to what can be added to a major release, if that is your concern.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Architecture Roadmap?

From

Hannu Krosing

Date:

07 October 2003, 04:34:42

James Rogers kirjutas T, 07.10.2003 kell 01:48:
> Hi,
> 
> I guess what I am wondering
> is whether there is any kind of quasi-official mechanism for putting
> certain features in certain future versions depending on the nature and
> significance of implementing those features and the number of things it
> touches.

I don't know of any (except discussing it om [HACKERS] ;), but I very
much like the way Pyhton does it via PEPs (Python Enchancement
Proposals) see: http://www.python.org/peps/ .

-----
Hannu