Thread: Equivalent praxis to CLUSTERED INDEX?

Equivalent praxis to CLUSTERED INDEX?

From

Mischa Sandberg

Date:

25 August 2004, 21:50:21

Coming from the MSSQL world, I'm used to the first step in optimization
to be, choose your clustered index and choose it well.

I see that PG has a one-shot CLUSTER command, but doesn't support
continuously-updated clustered indexes.

What I infer from newsgroup browsing is, such an index is impossible,
given the MVCC versioning of records (happy to learn I'm wrong).

I'd be curious to know what other people, who've crossed this same
bridge from MSSQL or Oracle or Sybase to PG, have devised,
faced with the same kind of desired performance gain for retrieving
blocks of rows with the same partial key.

Re: Equivalent praxis to CLUSTERED INDEX?

From

Christopher Kings-Lynne

Date:

25 August 2004, 22:13:16

> I see that PG has a one-shot CLUSTER command, but doesn't support
> continuously-updated clustered indexes.
>
> What I infer from newsgroup browsing is, such an index is impossible,
> given the MVCC versioning of records (happy to learn I'm wrong).
>
> I'd be curious to know what other people, who've crossed this same
> bridge from MSSQL or Oracle or Sybase to PG, have devised,
> faced with the same kind of desired performance gain for retrieving
> blocks of rows with the same partial key.

I just run clusterdb each weekend in a cron job...

Chris

Re: Equivalent praxis to CLUSTERED INDEX?

From

"J. Andrew Rogers"

Date:

26 August 2004, 15:14:37

On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote:
> I see that PG has a one-shot CLUSTER command, but doesn't support
> continuously-updated clustered indexes.
>
> What I infer from newsgroup browsing is, such an index is impossible,
> given the MVCC versioning of records (happy to learn I'm wrong).

It is possible to have MVCC and ordered/indexed heaps, but it isn't
something you can just tack onto the currently supported types -- I
looked into this myself.  It would take substantial additional code
infrastructure to support it, basically an alternative heap system and
adding support for tables with odd properties to many parts of the
system.  Pretty non-trivial.

This is probably my #1 "I wish postgres had this feature" feature.  It
is a serious scalability enhancer for big systems and a pain to work
around not having.

> I'd be curious to know what other people, who've crossed this same
> bridge from MSSQL or Oracle or Sybase to PG, have devised,
> faced with the same kind of desired performance gain for retrieving
> blocks of rows with the same partial key.

The CLUSTER command is often virtually useless for precisely the kinds
of tables that need to be clustered.  My databases are on-line 24x7, and
the tables that are ideal candidates for clustering are in the range of
50-100 million rows. I can afford to lock these tables up for no more
than 5-10 minutes during off-peak in the hopes that no one notices, and
CLUSTER does not work remotely in the ballpark of that fast for tables
of that size.  People who can run CLUSTER in a cron job must either have
relatively small tables or regular large maintenance windows.

My solution, which may or may not work for you, was to write a table
partitioning system using the natural flexibility and programmability of
postgresql (e.g. table inheritance).  From this I automatically get a
roughly ordered heap according to the index I would cluster on, with
only slightly funky SQL access.  The end result works much better with
CLUSTER too, though CLUSTER is much less necessary at that point
because, at least for my particular purposes, the rows are mostly
ordered due to how the data was partitioned.

So there are ways to work around CLUSTER, but you'll have to be clever
and it will require tailoring the solution to your particular
requirements.

J. Andrew Rogers

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

26 August 2004, 15:19:11

How do vendors actually implement auto-clustering?  I assume they move
rows around during quiet periods or have lots of empty space in each
value bucket.

---------------------------------------------------------------------------

J. Andrew Rogers wrote:
> On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote:
> > I see that PG has a one-shot CLUSTER command, but doesn't support
> > continuously-updated clustered indexes.
> >
> > What I infer from newsgroup browsing is, such an index is impossible,
> > given the MVCC versioning of records (happy to learn I'm wrong).
>
>
> It is possible to have MVCC and ordered/indexed heaps, but it isn't
> something you can just tack onto the currently supported types -- I
> looked into this myself.  It would take substantial additional code
> infrastructure to support it, basically an alternative heap system and
> adding support for tables with odd properties to many parts of the
> system.  Pretty non-trivial.
>
> This is probably my #1 "I wish postgres had this feature" feature.  It
> is a serious scalability enhancer for big systems and a pain to work
> around not having.
>
>
> > I'd be curious to know what other people, who've crossed this same
> > bridge from MSSQL or Oracle or Sybase to PG, have devised,
> > faced with the same kind of desired performance gain for retrieving
> > blocks of rows with the same partial key.
>
>
> The CLUSTER command is often virtually useless for precisely the kinds
> of tables that need to be clustered.  My databases are on-line 24x7, and
> the tables that are ideal candidates for clustering are in the range of
> 50-100 million rows. I can afford to lock these tables up for no more
> than 5-10 minutes during off-peak in the hopes that no one notices, and
> CLUSTER does not work remotely in the ballpark of that fast for tables
> of that size.  People who can run CLUSTER in a cron job must either have
> relatively small tables or regular large maintenance windows.
>
>
> My solution, which may or may not work for you, was to write a table
> partitioning system using the natural flexibility and programmability of
> postgresql (e.g. table inheritance).  From this I automatically get a
> roughly ordered heap according to the index I would cluster on, with
> only slightly funky SQL access.  The end result works much better with
> CLUSTER too, though CLUSTER is much less necessary at that point
> because, at least for my particular purposes, the rows are mostly
> ordered due to how the data was partitioned.
>
> So there are ways to work around CLUSTER, but you'll have to be clever
> and it will require tailoring the solution to your particular
> requirements.
>
>
> J. Andrew Rogers
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Josh Berkus

Date:

26 August 2004, 15:33:02

Bruce,

> How do vendors actually implement auto-clustering?  I assume they move
> rows around during quiet periods or have lots of empty space in each
> value bucket.

That's how SQL Server does it.   In old versions (6.5) you had to manually
send commands to update the cluster, same as PG.   Also, when you create a
cluster (or an index or table for that matter) you can manually set an amount
of "space" to be held open on each data page for updates.

Also keep in mind that SQL Server, as a "single-user database" has a much
easier time with this.  They don't have to hold several versions of an index
in memory and collapse it into a single version at commit time.

All that being said, we could do a better job of "auto-balancing" clustered
tables.   I believe that someone was working on this in Hackers through what
they called "B-Tree Tables".  What happened to that?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Equivalent praxis to CLUSTERED INDEX?

From

"J. Andrew Rogers"

Date:

26 August 2004, 16:04:52

On Thu, 2004-08-26 at 11:18, Bruce Momjian wrote:
> How do vendors actually implement auto-clustering?  I assume they move
> rows around during quiet periods or have lots of empty space in each
> value bucket.

As far as I know, Oracle does it by having a B-Tree organized heap (a
feature introduced around v8 IIRC), basically making the primary key
index and the heap the same physical structure.  Any non-index columns
are stored in the index along with the index columns.  Implementing it
is slightly weird because searching the index and selecting the rows
from the heap are not separate operations.

The major caveat to having tables of this type is that you can only have
a primary key index.  No other indexes are possible because the "heap"
constantly undergoes local reorganizations if you have a lot of write
traffic, the same kind of reorganization you would normally expect in a
BTree index.

The performance improvements come from two optimizations.  First, you
have to touch significantly fewer blocks to get all the rows, even
compared to a CLUSTERed heap.  Second, the footprint is smaller and
plays nicely with the buffer cache.

When I've used these types of heaps in Oracle 8 on heavily used tables
with tens of millions of rows, we frequently got a 10x or better
performance improvement on queries against those tables.  It is only
really useful for tables with vast quantities of relatively small rows,
but it can be a lifesaver in those cases.

J. Andrew Rogers

Re: Equivalent praxis to CLUSTERED INDEX?

From

"Magnus Hagander"

Date:

26 August 2004, 16:30:33

>> How do vendors actually implement auto-clustering?  I assume
>they move
>> rows around during quiet periods or have lots of empty space in each
>> value bucket.
>
>
>As far as I know, Oracle does it by having a B-Tree organized heap (a
>feature introduced around v8 IIRC), basically making the primary key
>index and the heap the same physical structure.  Any non-index columns
>are stored in the index along with the index columns.  Implementing it
>is slightly weird because searching the index and selecting the rows
>from the heap are not separate operations.

Almost the same for MSSQL. The clustered index is always forced unique.
If you create a non-unique clustered index, SQLServer will internally
pad it with random (or is it sequential? Can't remember right now) data
to make each key unique. The clustered index contains all the data
fields - both the index key and the other columns from the database.

It does support non-clustered indexes as well on the same table. Any
"secondary index" will then contain the index key and the primary key
value. This means a lookup in a non-clustered index means a two-step
index lookup: First look in the non-clustered index for the clustered
key. Then look in the clustered index for the rest of the data.

Naturally a non-clustered index needs better selectivity before it's
actually used than a clustered index does.

IIRC, SQL Server always creates clustered indexes by default for primary
keys.


//Magnus

Re: Equivalent praxis to CLUSTERED INDEX?

From

Josh Berkus

Date:

26 August 2004, 16:48:53

Magnus,

> IIRC, SQL Server always creates clustered indexes by default for primary
> keys.

I think that's a per-database setting; certainly the ones I admin do not.

However, since SQL Server orders its data pages, those data pages tend to be
in the order of the primary key regardless if there is no clustered index.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Equivalent praxis to CLUSTERED INDEX?

From

"J. Andrew Rogers"

Date:

26 August 2004, 17:44:08

On Thu, 2004-08-26 at 12:30, Magnus Hagander wrote:
> Almost the same for MSSQL. The clustered index is always forced unique.
> If you create a non-unique clustered index, SQLServer will internally
> pad it with random (or is it sequential? Can't remember right now) data
> to make each key unique. The clustered index contains all the data
> fields - both the index key and the other columns from the database.
>
> It does support non-clustered indexes as well on the same table. Any
> "secondary index" will then contain the index key and the primary key
> value. This means a lookup in a non-clustered index means a two-step
> index lookup: First look in the non-clustered index for the clustered
> key. Then look in the clustered index for the rest of the data.

Ah, okay.  I see how that would work for a secondary index, though it
would make for a slow secondary index.  Neat workaround.  For all I
know, current versions of Oracle may support secondary indexes on
index-organized tables; all this Postgres usage over the last couple
years has made my Oracle knowledge rusty.

> IIRC, SQL Server always creates clustered indexes by default for primary
> keys.

That would surprise me actually.  For some types of tables, e.g. ones
with multiple well-used indexes or large rows, index-organizing the heap
could easily give worse performance than a normal index/heap pair
depending on access patterns.  It also tends to be more prone to having
locking contention under some access patterns.  This is one of those
options that needs to be used knowledgeably; it is not a general
architectural improvement that you would want to apply to every table
all the time.

J. Andrew Rogers

Re: Equivalent praxis to CLUSTERED INDEX?

From

Gaetano Mendola

Date:

26 August 2004, 18:31:55

Bruce Momjian wrote:
> How do vendors actually implement auto-clustering?  I assume they move
> rows around during quiet periods or have lots of empty space in each
> value bucket.
>
> ---------------------------------------------------------------------------

IIRC informix doesn't have it, and you have to recluster periodically
the table. After having clustered the table with an index in order to
recluster the table with another index you have to release the previous
one ( ALTER index TO NOT CLUSTER ), the CLUSTER is an index attribute and
each table can have only one index with that attribute ON.

Regards
Gaetano Mendola

Re: Equivalent praxis to CLUSTERED INDEX?

From

"Gregory S. Williamson"

Date:

26 August 2004, 19:36:30

FWIW,

Informix does allow the fragmentation of data over named dbspaces by round-robin and expression; this is autosupporting
aslong as the dba keeps enough space available. You may also fragment the index although there are some variations
dependingon type of Informix (XPS, etc.); this is available in at least 9.3 ... I have never used the index
fragmentationas its own beast, but the fragmenting of data works like a charm for spreadling load over more disks. 

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From: Gaetano Mendola [mailto:mendola@bigfoot.com]
Sent: Thursday, August 26, 2004 2:10 PM
To: Bruce Momjian; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

Bruce Momjian wrote:
> How do vendors actually implement auto-clustering?  I assume they move
> rows around during quiet periods or have lots of empty space in each
> value bucket.
>
> ---------------------------------------------------------------------------

IIRC informix doesn't have it, and you have to recluster periodically
the table. After having clustered the table with an index in order to
recluster the table with another index you have to release the previous
one ( ALTER index TO NOT CLUSTER ), the CLUSTER is an index attribute and
each table can have only one index with that attribute ON.

Regards
Gaetano Mendola

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

26 August 2004, 22:45:53

Updated TODO item:

        o Automatically maintain clustering on a table

        This would require some background daemon to maintain clustering
        during periods of low usage. It might also require tables to be only
        paritally filled for easier reorganization.  It also might require
        creating a merged heap/index data file so an index lookup would
        automatically access the heap data too.

---------------------------------------------------------------------------

J. Andrew Rogers wrote:
> On Thu, 2004-08-26 at 12:30, Magnus Hagander wrote:
> > Almost the same for MSSQL. The clustered index is always forced unique.
> > If you create a non-unique clustered index, SQLServer will internally
> > pad it with random (or is it sequential? Can't remember right now) data
> > to make each key unique. The clustered index contains all the data
> > fields - both the index key and the other columns from the database.
> >
> > It does support non-clustered indexes as well on the same table. Any
> > "secondary index" will then contain the index key and the primary key
> > value. This means a lookup in a non-clustered index means a two-step
> > index lookup: First look in the non-clustered index for the clustered
> > key. Then look in the clustered index for the rest of the data.
>
>
> Ah, okay.  I see how that would work for a secondary index, though it
> would make for a slow secondary index.  Neat workaround.  For all I
> know, current versions of Oracle may support secondary indexes on
> index-organized tables; all this Postgres usage over the last couple
> years has made my Oracle knowledge rusty.
>
>
> > IIRC, SQL Server always creates clustered indexes by default for primary
> > keys.
>
>
> That would surprise me actually.  For some types of tables, e.g. ones
> with multiple well-used indexes or large rows, index-organizing the heap
> could easily give worse performance than a normal index/heap pair
> depending on access patterns.  It also tends to be more prone to having
> locking contention under some access patterns.  This is one of those
> options that needs to be used knowledgeably; it is not a general
> architectural improvement that you would want to apply to every table
> all the time.
>
>
> J. Andrew Rogers
>
>
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Greg Stark

Date:

27 August 2004, 00:39:55

Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Updated TODO item:
>
>         o Automatically maintain clustering on a table
>
>         This would require some background daemon to maintain clustering
>         during periods of low usage. It might also require tables to be only
>         paritally filled for easier reorganization.  It also might require
>         creating a merged heap/index data file so an index lookup would
>         automatically access the heap data too.

Fwiw, I would say the first "would" is also a "might". None of the previous
discussions here presumed a maintenance daemon. The discussions before talked
about a mechanism to try to place new tuples as close as possible to the
proper index position.

I would also suggest making some distinction between a cluster system similar
to what we have now but improved to maintain the clustering continuously, and
an actual index-organized-table where the tuples are actually only stored in a
btree structure.

They're two different approaches to similar problems. But they might both be
useful to have, and have markedly different implementation details.

--
greg

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

27 August 2004, 01:35:13

OK, new wording:

        o Automatically maintain clustering on a table

        This might require some background daemon to maintain clustering
        during periods of low usage. It might also require tables to be only
        paritally filled for easier reorganization.  Another idea would
        be to create a merged heap/index data file so an index lookup would
        automatically access the heap data too.


---------------------------------------------------------------------------

Greg Stark wrote:
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
> > Updated TODO item:
> >
> >         o Automatically maintain clustering on a table
> >
> >         This would require some background daemon to maintain clustering
> >         during periods of low usage. It might also require tables to be only
> >         paritally filled for easier reorganization.  It also might require
> >         creating a merged heap/index data file so an index lookup would
> >         automatically access the heap data too.
>
> Fwiw, I would say the first "would" is also a "might". None of the previous
> discussions here presumed a maintenance daemon. The discussions before talked
> about a mechanism to try to place new tuples as close as possible to the
> proper index position.
>
> I would also suggest making some distinction between a cluster system similar
> to what we have now but improved to maintain the clustering continuously, and
> an actual index-organized-table where the tuples are actually only stored in a
> btree structure.
>
> They're two different approaches to similar problems. But they might both be
> useful to have, and have markedly different implementation details.
>
> --
> greg
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Gaetano Mendola

Date:

27 August 2004, 05:32:06

Greg Stark wrote:

> The discussions before talked about a mechanism to try to place new
 > tuples as close as possible to the proper index position.

Means this that an index shall have a "fill factor" property, similar to
Informix one ?

 From the manual:

The FILLFACTOR option takes effect only when you build an index on a table
that contains more than 5,000 rows and uses more than 100 table pages, when
you create an index on a fragmented table, or when you create a fragmented
index on a nonfragmented table.
Use the FILLFACTOR option to provide for expansion of an index at a later
date or to create compacted indexes.
When the index is created, the database server initially fills only that
percentage of the nodes specified with the FILLFACTOR value.

# Providing a Low Percentage Value
If you provide a low percentage value, such as 50, you allow room for growth
in your index. The nodes of the index initially fill to a certain percentage and
contain space for inserts. The amount of available space depends on the
number of keys in each page as well as the percentage value.
For example, with a 50-percent FILLFACTOR value, the page would be half
full and could accommodate doubling in size. A low percentage value can
result in faster inserts and can be used for indexes that you expect to grow.

# Providing a High Percentage Value
If you provide a high percentage value, such as 99, your indexes are
compacted, and any new index inserts result in splitting nodes. The
maximum density is achieved with 100 percent. With a 100-percent
FILLFACTOR value, the index has no room available for growth; any
additions to the index result in splitting the nodes.
A 99-percent FILLFACTOR value allows room for at least one insertion per
node. A high percentage value can result in faster selects and can be used for
indexes that you do not expect to grow or for mostly read-only indexes.

Regards
Gaetano Mendola

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

27 August 2004, 13:24:02

I had FILLFACTOR in the TODO list until just a few months ago, but
because no one had discussed it in 3-4 years, I removed the item.  I
have added mention now in the auto-cluster section because that actually
seems like the only good reason for a non-100% fillfactor.  I don't
think our ordinary btrees have enough of a penalty for splits to make a
non-full fillfactor worthwhile, but having a non-full fillfactor for
autocluster controls how often items have to be shifted around.

---------------------------------------------------------------------------

Gaetano Mendola wrote:
> Greg Stark wrote:
>
> > The discussions before talked about a mechanism to try to place new
>  > tuples as close as possible to the proper index position.
>
> Means this that an index shall have a "fill factor" property, similar to
> Informix one ?
>
>  From the manual:
>
>
> The FILLFACTOR option takes effect only when you build an index on a table
> that contains more than 5,000 rows and uses more than 100 table pages, when
> you create an index on a fragmented table, or when you create a fragmented
> index on a nonfragmented table.
> Use the FILLFACTOR option to provide for expansion of an index at a later
> date or to create compacted indexes.
> When the index is created, the database server initially fills only that
> percentage of the nodes specified with the FILLFACTOR value.
>
> # Providing a Low Percentage Value
> If you provide a low percentage value, such as 50, you allow room for growth
> in your index. The nodes of the index initially fill to a certain percentage and
> contain space for inserts. The amount of available space depends on the
> number of keys in each page as well as the percentage value.
> For example, with a 50-percent FILLFACTOR value, the page would be half
> full and could accommodate doubling in size. A low percentage value can
> result in faster inserts and can be used for indexes that you expect to grow.
>
>
> # Providing a High Percentage Value
> If you provide a high percentage value, such as 99, your indexes are
> compacted, and any new index inserts result in splitting nodes. The
> maximum density is achieved with 100 percent. With a 100-percent
> FILLFACTOR value, the index has no room available for growth; any
> additions to the index result in splitting the nodes.
> A 99-percent FILLFACTOR value allows room for at least one insertion per
> node. A high percentage value can result in faster selects and can be used for
> indexes that you do not expect to grow or for mostly read-only indexes.
>
>
>
>
> Regards
> Gaetano Mendola
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Josh Berkus

Date:

27 August 2004, 13:31:43

Bruce,

What happened to the B-Tree Table patch discussed on Hackers ad nauseum last
winter?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

27 August 2004, 13:32:49

Josh Berkus wrote:
> Bruce,
>
> What happened to the B-Tree Table patch discussed on Hackers ad nauseum last
> winter?

I don't remember that. The only issue I remember is sorting btree index
by heap tid on creation.  We eventually got that into CVS for 8.0.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Adi Alurkar

Date:

27 August 2004, 13:51:18

Greetings,

I am not sure if this applies only to clustering but for storage in
general,

IIRC  Oracle has 2 parameters that can be set at table creation :
from Oracle docs

PCTFREE integer :
Specify the percentage of space in each data block of the table, object
table OID index, or partition reserved for future updates to the
table's rows. The value of PCTFREE must be a value from 0 to 99. A
value of 0 allows the entire block to be filled by inserts of new rows.
The default value is 10. This value reserves 10% of each block for
updates to existing rows and allows inserts of new rows to fill a
maximum of 90% of each block.
PCTFREE has the same function in the PARTITION description and in the
statements that create and alter clusters, indexes, materialized views,
and materialized view logs. The combination of PCTFREE and PCTUSED
determines whether new rows will be inserted into existing data blocks
or into new blocks.

PCTUSED integer
Specify the minimum percentage of used space that Oracle maintains for
each data block of the table, object table OID index, or
index-organized table overflow data segment. A block becomes a
candidate for row insertion when its used space falls below PCTUSED.
PCTUSED is specified as a positive integer from 0 to 99 and defaults to
40.
PCTUSED has the same function in the PARTITION description and in the
statements that create and alter clusters, materialized views, and
materialized view logs.
PCTUSED is not a valid table storage characteristic for an
index-organized table (ORGANIZATION INDEX).
The sum of PCTFREE and PCTUSED must be equal to or less than 100. You
can use PCTFREE and PCTUSED together to utilize space within a table
more efficiently.

PostgreSQL could take some hints from the above.

On Aug 27, 2004, at 1:26 AM, Gaetano Mendola wrote:

> Greg Stark wrote:
>
>> The discussions before talked about a mechanism to try to place new
> > tuples as close as possible to the proper index position.
>
> Means this that an index shall have a "fill factor" property, similar
> to
> Informix one ?
>
> From the manual:
>
>
> The FILLFACTOR option takes effect only when you build an index on a
> table
> that contains more than 5,000 rows and uses more than 100 table pages,
> when
> you create an index on a fragmented table, or when you create a
> fragmented
> index on a nonfragmented table.
> Use the FILLFACTOR option to provide for expansion of an index at a
> later
> date or to create compacted indexes.
> When the index is created, the database server initially fills only
> that
> percentage of the nodes specified with the FILLFACTOR value.
>
> # Providing a Low Percentage Value
> If you provide a low percentage value, such as 50, you allow room for
> growth
> in your index. The nodes of the index initially fill to a certain
> percentage and
> contain space for inserts. The amount of available space depends on the
> number of keys in each page as well as the percentage value.
> For example, with a 50-percent FILLFACTOR value, the page would be half
> full and could accommodate doubling in size. A low percentage value can
> result in faster inserts and can be used for indexes that you expect
> to grow.
>
>
> # Providing a High Percentage Value
> If you provide a high percentage value, such as 99, your indexes are
> compacted, and any new index inserts result in splitting nodes. The
> maximum density is achieved with 100 percent. With a 100-percent
> FILLFACTOR value, the index has no room available for growth; any
> additions to the index result in splitting the nodes.
> A 99-percent FILLFACTOR value allows room for at least one insertion
> per
> node. A high percentage value can result in faster selects and can be
> used for
> indexes that you do not expect to grow or for mostly read-only indexes.
>
>
>
>
> Regards
> Gaetano Mendola
>
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>
--
Adi Alurkar (DBA sf.NET) <adi@vasoftware.com>
1024D/79730470 A491 5724 74DE 956D 06CB  D844 6DF1 B972 7973 0470

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

27 August 2004, 14:27:39

But what is the advantage of non-full pages in Oracle?

---------------------------------------------------------------------------

Adi Alurkar wrote:
> Greetings,
>
> I am not sure if this applies only to clustering but for storage in
> general,
>
> IIRC  Oracle has 2 parameters that can be set at table creation :
> from Oracle docs
>
> PCTFREE integer :
> Specify the percentage of space in each data block of the table, object
> table OID index, or partition reserved for future updates to the
> table's rows. The value of PCTFREE must be a value from 0 to 99. A
> value of 0 allows the entire block to be filled by inserts of new rows.
> The default value is 10. This value reserves 10% of each block for
> updates to existing rows and allows inserts of new rows to fill a
> maximum of 90% of each block.
> PCTFREE has the same function in the PARTITION description and in the
> statements that create and alter clusters, indexes, materialized views,
> and materialized view logs. The combination of PCTFREE and PCTUSED
> determines whether new rows will be inserted into existing data blocks
> or into new blocks.
>
> PCTUSED integer
> Specify the minimum percentage of used space that Oracle maintains for
> each data block of the table, object table OID index, or
> index-organized table overflow data segment. A block becomes a
> candidate for row insertion when its used space falls below PCTUSED.
> PCTUSED is specified as a positive integer from 0 to 99 and defaults to
> 40.
> PCTUSED has the same function in the PARTITION description and in the
> statements that create and alter clusters, materialized views, and
> materialized view logs.
> PCTUSED is not a valid table storage characteristic for an
> index-organized table (ORGANIZATION INDEX).
> The sum of PCTFREE and PCTUSED must be equal to or less than 100. You
> can use PCTFREE and PCTUSED together to utilize space within a table
> more efficiently.
>
> PostgreSQL could take some hints from the above.
>
> On Aug 27, 2004, at 1:26 AM, Gaetano Mendola wrote:
>
> > Greg Stark wrote:
> >
> >> The discussions before talked about a mechanism to try to place new
> > > tuples as close as possible to the proper index position.
> >
> > Means this that an index shall have a "fill factor" property, similar
> > to
> > Informix one ?
> >
> > From the manual:
> >
> >
> > The FILLFACTOR option takes effect only when you build an index on a
> > table
> > that contains more than 5,000 rows and uses more than 100 table pages,
> > when
> > you create an index on a fragmented table, or when you create a
> > fragmented
> > index on a nonfragmented table.
> > Use the FILLFACTOR option to provide for expansion of an index at a
> > later
> > date or to create compacted indexes.
> > When the index is created, the database server initially fills only
> > that
> > percentage of the nodes specified with the FILLFACTOR value.
> >
> > # Providing a Low Percentage Value
> > If you provide a low percentage value, such as 50, you allow room for
> > growth
> > in your index. The nodes of the index initially fill to a certain
> > percentage and
> > contain space for inserts. The amount of available space depends on the
> > number of keys in each page as well as the percentage value.
> > For example, with a 50-percent FILLFACTOR value, the page would be half
> > full and could accommodate doubling in size. A low percentage value can
> > result in faster inserts and can be used for indexes that you expect
> > to grow.
> >
> >
> > # Providing a High Percentage Value
> > If you provide a high percentage value, such as 99, your indexes are
> > compacted, and any new index inserts result in splitting nodes. The
> > maximum density is achieved with 100 percent. With a 100-percent
> > FILLFACTOR value, the index has no room available for growth; any
> > additions to the index result in splitting the nodes.
> > A 99-percent FILLFACTOR value allows room for at least one insertion
> > per
> > node. A high percentage value can result in faster selects and can be
> > used for
> > indexes that you do not expect to grow or for mostly read-only indexes.
> >
> >
> >
> >
> > Regards
> > Gaetano Mendola
> >
> >
> >
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 7: don't forget to increase your free space map settings
> >
> >
> --
> Adi Alurkar (DBA sf.NET) <adi@vasoftware.com>
> 1024D/79730470 A491 5724 74DE 956D 06CB  D844 6DF1 B972 7973 0470
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

"Jeremy Dunn"

Date:

27 August 2004, 14:39:39


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
> Bruce Momjian
> Sent: Friday, August 27, 2004 1:27 PM
> To: Adi Alurkar
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?
>
>
>
> But what is the advantage of non-full pages in Oracle?
>

One advantage has to do with updates of variable-length columns, e.g.
varchars.

If the block is fully packed with data, an update to a varchar column
that makes the column wider, causes "row-chaining".  This means that a
portion of the row is stored in a different data block, which may be
somewhere completely different in the storage array.  Retrieving that
row (or even just that column from that row) as a unit may now require
additional disk seek(s).

Leaving some space for updates in each data block doesn't prevent this
problem completely, but mitigates it to a certain extent.  If for
instance a row is typically inserted with a null value for a varchar
column, but the application developer knows it will almost always get
updated with some value later on, then leaving a certain percentage of
empty space in each block allocated to that table makes sense.

Conversely, if you know that your data is never going to get updated
(e.g. a data warehousing application), you might specify to pack the
blocks as full as possible.  This makes for the most efficient data
retrieval performance.

- Jeremy

Re: Equivalent praxis to CLUSTERED INDEX?

From

Adi Alurkar

Date:

27 August 2004, 14:40:04

IIRC it it to reduce the "overflow" of data or what oracle calls
chained rows. i.e if a table has variable length columns and 10 rows
get inserted into a datapage, if this datapage is full and one of the
variable length field gets updated the row will now "overflow" into
another datapage, but if the datapage is created with an appropriate
amount of free space the updated row will be stored in one single
datapage.

On Aug 27, 2004, at 10:27 AM, Bruce Momjian wrote:

>
> But what is the advantage of non-full pages in Oracle?
>
> -----------------------------------------------------------------------
> ----
>
> Adi Alurkar wrote:
>> Greetings,
>>
>> I am not sure if this applies only to clustering but for storage in
>> general,
>>
>> IIRC  Oracle has 2 parameters that can be set at table creation :
>> from Oracle docs
>>
>> PCTFREE integer :
>> Specify the percentage of space in each data block of the table,
>> object
>> table OID index, or partition reserved for future updates to the
>> table's rows. The value of PCTFREE must be a value from 0 to 99. A
>> value of 0 allows the entire block to be filled by inserts of new
>> rows.
>> The default value is 10. This value reserves 10% of each block for
>> updates to existing rows and allows inserts of new rows to fill a
>> maximum of 90% of each block.
>> PCTFREE has the same function in the PARTITION description and in the
>> statements that create and alter clusters, indexes, materialized
>> views,
>> and materialized view logs. The combination of PCTFREE and PCTUSED
>> determines whether new rows will be inserted into existing data blocks
>> or into new blocks.
>>
>> PCTUSED integer
>> Specify the minimum percentage of used space that Oracle maintains for
>> each data block of the table, object table OID index, or
>> index-organized table overflow data segment. A block becomes a
>> candidate for row insertion when its used space falls below PCTUSED.
>> PCTUSED is specified as a positive integer from 0 to 99 and defaults
>> to
>> 40.
>> PCTUSED has the same function in the PARTITION description and in the
>> statements that create and alter clusters, materialized views, and
>> materialized view logs.
>> PCTUSED is not a valid table storage characteristic for an
>> index-organized table (ORGANIZATION INDEX).
>> The sum of PCTFREE and PCTUSED must be equal to or less than 100. You
>> can use PCTFREE and PCTUSED together to utilize space within a table
>> more efficiently.
>>
>> PostgreSQL could take some hints from the above.
>>
>> On Aug 27, 2004, at 1:26 AM, Gaetano Mendola wrote:
>>
>>> Greg Stark wrote:
>>>
>>>> The discussions before talked about a mechanism to try to place new
>>>> tuples as close as possible to the proper index position.
>>>
>>> Means this that an index shall have a "fill factor" property, similar
>>> to
>>> Informix one ?
>>>
>>> From the manual:
>>>
>>>
>>> The FILLFACTOR option takes effect only when you build an index on a
>>> table
>>> that contains more than 5,000 rows and uses more than 100 table
>>> pages,
>>> when
>>> you create an index on a fragmented table, or when you create a
>>> fragmented
>>> index on a nonfragmented table.
>>> Use the FILLFACTOR option to provide for expansion of an index at a
>>> later
>>> date or to create compacted indexes.
>>> When the index is created, the database server initially fills only
>>> that
>>> percentage of the nodes specified with the FILLFACTOR value.
>>>
>>> # Providing a Low Percentage Value
>>> If you provide a low percentage value, such as 50, you allow room for
>>> growth
>>> in your index. The nodes of the index initially fill to a certain
>>> percentage and
>>> contain space for inserts. The amount of available space depends on
>>> the
>>> number of keys in each page as well as the percentage value.
>>> For example, with a 50-percent FILLFACTOR value, the page would be
>>> half
>>> full and could accommodate doubling in size. A low percentage value
>>> can
>>> result in faster inserts and can be used for indexes that you expect
>>> to grow.
>>>
>>>
>>> # Providing a High Percentage Value
>>> If you provide a high percentage value, such as 99, your indexes are
>>> compacted, and any new index inserts result in splitting nodes. The
>>> maximum density is achieved with 100 percent. With a 100-percent
>>> FILLFACTOR value, the index has no room available for growth; any
>>> additions to the index result in splitting the nodes.
>>> A 99-percent FILLFACTOR value allows room for at least one insertion
>>> per
>>> node. A high percentage value can result in faster selects and can be
>>> used for
>>> indexes that you do not expect to grow or for mostly read-only
>>> indexes.
>>>
>>>
>>>
>>>
>>> Regards
>>> Gaetano Mendola
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 7: don't forget to increase your free space map settings
>>>
>>>
>> --
>> Adi Alurkar (DBA sf.NET) <adi@vasoftware.com>
>> 1024D/79730470 A491 5724 74DE 956D 06CB  D844 6DF1 B972 7973 0470
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>>                http://archives.postgresql.org
>>
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania
> 19073
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match
>
>
--
Adi Alurkar (DBA sf.NET) <adi@vasoftware.com>
1024D/79730470 A491 5724 74DE 956D 06CB  D844 6DF1 B972 7973 0470

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

27 August 2004, 14:49:36

Adi Alurkar wrote:
> IIRC it it to reduce the "overflow" of data or what oracle calls
> chained rows. i.e if a table has variable length columns and 10 rows
> get inserted into a datapage, if this datapage is full and one of the
> variable length field gets updated the row will now "overflow" into
> another datapage, but if the datapage is created with an appropriate
> amount of free space the updated row will be stored in one single
> datapage.

Agreed.  What I am wondering is with our system where every update gets
a new row, how would this help us?  I know we try to keep an update on
the same row as the original, but is there any significant performance
benefit to doing that which would offset the compaction advantage?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Tom Lane

Date:

27 August 2004, 15:19:38

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Agreed.  What I am wondering is with our system where every update gets
> a new row, how would this help us?  I know we try to keep an update on
> the same row as the original, but is there any significant performance
> benefit to doing that which would offset the compaction advantage?

Because Oracle uses overwrite-in-place (undoing from an UNDO log on
transaction abort), while we always write a whole new row, it would take
much larger PCTFREE wastage to get a useful benefit in PG than it does
in Oracle.  That wastage translates directly into increased I/O costs,
so I'm a bit dubious that we should assume there is a win to be had here
just because Oracle offers the feature.

            regards, tom lane

Re: Equivalent praxis to CLUSTERED INDEX?

From

Greg Stark

Date:

27 August 2004, 16:31:37

Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Agreed.  What I am wondering is with our system where every update gets
> a new row, how would this help us?  I know we try to keep an update on
> the same row as the original, but is there any significant performance
> benefit to doing that which would offset the compaction advantage?

Hm. Posit a system where all transactions are short updates executed in
autocommit mode.

In such a system as soon as a transaction commits it would take a very short
time before the previous record was a dead tuple.

If every backend kept a small list of tuples it had marked deleted and
whenever it was idle checked to see if they were dead yet, it might avoid much
of the need for vacuum. And in such a circumstance I think you wouldn't need
more than a pctfree of 50% even on a busy table. Every tuple would need about
one extra slot.

This would only be a reasonable idea if a) if the list of potential dead
tuples is short and if it overflows it just forgets them leaving them for
vacuum to deal with. and b) It only checks the potentially dead tuples when
the backend is otherwise idle.

Even so it would be less efficient than a batch vacuum, and it would be taking
up i/o bandwidth (to maintain indexes even if the heap buffer is in ram), even
if that backend is idle it doesn't mean other backends couldn't have used that
i/o bandwidth.

But I think it would deal with a lot of the complaints about vacuum and it
would make it more feasible to use a pctfree parameter to make clustering more
effective.

--
greg

Re: Equivalent praxis to CLUSTERED INDEX?

From

Greg Stark

Date:

27 August 2004, 16:35:03

Bruce Momjian <pgman@candle.pha.pa.us> writes:

> but is there any significant performance benefit to doing that which would
> offset the compaction advantage?

Just as a side comment. Setting PCTFREE 0 PCTUSED 100 on tables that have no
updates on them has an astonishingly big effect on speed. So the penalty for
leaving some space free really is substantial.

I think the other poster is right. Oracle really needs pctfree because of the
way it handles updates. Postgres doesn't really need as much because it
doesn't try to squeeze the new tuple in the space the old one took up. If it
doesn't fit on the page the worst that happens is it has to store it on some
other page, whereas oracle has to do its strange row chaining thing.

--
greg

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

27 August 2004, 16:43:53

Greg Stark wrote:
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
> > but is there any significant performance benefit to doing that which would
> > offset the compaction advantage?
>
> Just as a side comment. Setting PCTFREE 0 PCTUSED 100 on tables that have no
> updates on them has an astonishingly big effect on speed. So the penalty for
> leaving some space free really is substantial.
>
> I think the other poster is right. Oracle really needs pctfree because of the
> way it handles updates. Postgres doesn't really need as much because it
> doesn't try to squeeze the new tuple in the space the old one took up. If it
> doesn't fit on the page the worst that happens is it has to store it on some
> other page, whereas oracle has to do its strange row chaining thing.

Oracle also does that chain thing so moving updates to different pages
might have more of an impact than it does on PostgreSQL.  We have chains
too but just for locking.  Not sure on Oracle.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Gaetano Mendola

Date:

27 August 2004, 22:32:21

Tom Lane wrote:

 > Bruce Momjian <pgman@candle.pha.pa.us> writes:
 >
 >>Agreed.  What I am wondering is with our system where every update gets
 >>a new row, how would this help us?  I know we try to keep an update on
 >>the same row as the original, but is there any significant performance
 >>benefit to doing that which would offset the compaction advantage?
 >
 >
 > Because Oracle uses overwrite-in-place (undoing from an UNDO log on
 > transaction abort), while we always write a whole new row, it would take
 > much larger PCTFREE wastage to get a useful benefit in PG than it does
 > in Oracle.  That wastage translates directly into increased I/O costs,
 > so I'm a bit dubious that we should assume there is a win to be had here
 > just because Oracle offers the feature.

Mmmm. Consider this scenario:

ctid           datas
(0,1)   yyy-xxxxxxxxxxxxxxxxxxx
(0,2)   -------- EMPTY --------
(0,3)   -------- EMPTY --------
(0,4)   -------- EMPTY --------
(0,5)   -------- EMPTY --------
(0,6)   yyy-xxxxxxxxxxxxxxxxxxx
(0,7)   -------- EMPTY --------
....    -------- EMPTY --------
(0,11)  yyy-xxxxxxxxxxxxxxxxxxx


the row (0,2) --> (0,5) are space available for the (0,1) updates.
This will help a table clustered ( for example ) to mantain his
own correct cluster order.



Regards
Gaetano Mendola

Re: Equivalent praxis to CLUSTERED INDEX?

From

Gaetano Mendola

Date:

27 August 2004, 23:08:28

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bruce Momjian wrote:

| Gaetano Mendola wrote:
|
|>Tom Lane wrote:
|>
|> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
|> >
|> >>Agreed.  What I am wondering is with our system where every update gets
|> >>a new row, how would this help us?  I know we try to keep an update on
|> >>the same row as the original, but is there any significant performance
|> >>benefit to doing that which would offset the compaction advantage?
|> >
|> >
|> > Because Oracle uses overwrite-in-place (undoing from an UNDO log on
|> > transaction abort), while we always write a whole new row, it would take
|> > much larger PCTFREE wastage to get a useful benefit in PG than it does
|> > in Oracle.  That wastage translates directly into increased I/O costs,
|> > so I'm a bit dubious that we should assume there is a win to be had here
|> > just because Oracle offers the feature.
|>
|>Mmmm. Consider this scenario:
|>
|>ctid           datas
|>(0,1)   yyy-xxxxxxxxxxxxxxxxxxx
|>(0,2)   -------- EMPTY --------
|>(0,3)   -------- EMPTY --------
|>(0,4)   -------- EMPTY --------
|>(0,5)   -------- EMPTY --------
|>(0,6)   yyy-xxxxxxxxxxxxxxxxxxx
|>(0,7)   -------- EMPTY --------
|>....    -------- EMPTY --------
|>(0,11)  yyy-xxxxxxxxxxxxxxxxxxx
|>
|>
|>the row (0,2) --> (0,5) are space available for the (0,1) updates.
|>This will help a table clustered ( for example ) to mantain his
|>own correct cluster order.
|
|
| Right.  My point was that non-full fill is valuable for us only when
| doing clustering, while for Oracle it is a win even in non-cluster cases
| because of the way they update in place.

Don't you think this will permit also to avoid extra disk seek and cache
invalidation? If you are updating the row (0,1) I think is less expensive
put the new version in (0,2) instead of thousand line far from that point.


Regards
Gaetano Mendola





-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBL+kA7UpzwH2SGd4RAp6fAJ9rSs5xmTXsy4acUGcnCRTbEUCwrwCgo/o6
0JPtziuf1E/EGLaqjbPMV44=
=pIgX
-----END PGP SIGNATURE-----

Re: Equivalent praxis to CLUSTERED INDEX?

From

Bruce Momjian

Date:

28 August 2004, 00:55:01

Gaetano Mendola wrote:
> | Right.  My point was that non-full fill is valuable for us only when
> | doing clustering, while for Oracle it is a win even in non-cluster cases
> | because of the way they update in place.
>
> Don't you think this will permit also to avoid extra disk seek and cache
> invalidation? If you are updating the row (0,1) I think is less expensive
> put the new version in (0,2) instead of thousand line far from that point.

It would, but does that outweigh the decreased I/O by having things more
densely packed?  I would think not.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Equivalent praxis to CLUSTERED INDEX?

From

Greg Stark

Date:

28 August 2004, 05:15:29

Bruce Momjian <pgman@candle.pha.pa.us> writes:

> > Don't you think this will permit also to avoid extra disk seek and cache
> > invalidation? If you are updating the row (0,1) I think is less expensive
> > put the new version in (0,2) instead of thousand line far from that point.

Well if the other buffer "a thousand lines far from that point" is already in
ram, then no, there's no penalty at the time for storing it there.

However it destroys the clustering, which was the original point.

> It would, but does that outweigh the decreased I/O by having things more
> densely packed?  I would think not.

Well the dense packing is worth something. But so is the clustering. There's
definitely a trade-off.

I always found my largest tables are almost always insert-only tables anyways.
So in Oracle I would have pctused 100 pctfree 0 on them and get the
performance gain.

The tables that would benefit from this would be tables always accessed by
indexes in index scans of more than one record. The better the clustering the
fewer pages the index scan would have to read in. If the data took 10% more
space but the index scan only needs 1/4 as many buffers it could be a big net
win.

--
greg

Re: Equivalent praxis to CLUSTERED INDEX?

From

Gaetano Mendola

Date:

28 August 2004, 09:02:17

Greg Stark wrote:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
>
>>>Don't you think this will permit also to avoid extra disk seek and cache
>>>invalidation? If you are updating the row (0,1) I think is less expensive
>>>put the new version in (0,2) instead of thousand line far from that point.
>
>
> Well if the other buffer "a thousand lines far from that point" is already in
> ram, then no, there's no penalty at the time for storing it there.

I was wandering about the cache invalidation, may be the ram is big enough but I
doubt about the cache, the recommendation in this case is to modify adjacent
memory address instead of jumping.

Regards
Gaetano Mendola

Re: Equivalent praxis to CLUSTERED INDEX?

From

Mischa Sandberg

Date:

29 August 2004, 12:15:31

Ummm ... not quite. In MSSQL/Sybase/Oracle, a clustered index maintains
its space saturation as part of each update operation. High activity
does indeed result in less-full pages (typically 60-80% full for tables
with heavy deletions or rowsize changes). To bring the percentage back
up, you run DBCC INDEXDEFRAG, which also does what you'd expect of a
normal file defragmenter -- put related disk pages together on the platter.

But the performance difference is hardly as severe as I gather it can be
if you neglect to vacuum.

As for SQL Server being a 'single-user database' ... ummm ... no, I
don't think so. I'm REALLY happy to be shut of the Microsoft world, but
MSSQL 7/2000/2005 is a serious big DB engine. It also has some serious
bright heads behind it. They hired Goetz Graefe and Paul (aka Per-Ake)
Larsen away from academia, and it shows, in the join and aggregate
processing. I'll be a happy camper if I manage to contribute something
to PG that honks the way their stuff does. Happy to discuss, too.

Josh Berkus wrote:
> Bruce,
>
>
>>How do vendors actually implement auto-clustering?  I assume they move
>>rows around during quiet periods or have lots of empty space in each
>>value bucket.
>
>
> That's how SQL Server does it.   In old versions (6.5) you had to manually
> send commands to update the cluster, same as PG.   Also, when you create a
> cluster (or an index or table for that matter) you can manually set an amount
> of "space" to be held open on each data page for updates.
>
> Also keep in mind that SQL Server, as a "single-user database" has a much
> easier time with this.  They don't have to hold several versions of an index
> in memory and collapse it into a single version at commit time.
>
> All that being said, we could do a better job of "auto-balancing" clustered
> tables.   I believe that someone was working on this in Hackers through what
> they called "B-Tree Tables".  What happened to that?
>

Re: Equivalent praxis to CLUSTERED INDEX?

From

Mischa Sandberg

Date:

29 August 2004, 12:15:33

I think you've probably fingered the kicker of why PG doesn't have this
kind of clustering already. Hence perhaps the need for other approaches
to the issue (the disk-IO efficiency of reading groups of rows related
by a common key) that other DB's (with in-place update) address with
synchronous clustering ('heap rebalancing' ?).

Bruce Momjian wrote:
> Adi Alurkar wrote:
>
>>IIRC it it to reduce the "overflow" of data or what oracle calls
>>chained rows. i.e if a table has variable length columns and 10 rows
>>get inserted into a datapage, if this datapage is full and one of the
>>variable length field gets updated the row will now "overflow" into
>>another datapage, but if the datapage is created with an appropriate
>>amount of free space the updated row will be stored in one single
>>datapage.
>
>
> Agreed.  What I am wondering is with our system where every update gets
> a new row, how would this help us?  I know we try to keep an update on
> the same row as the original, but is there any significant performance
> benefit to doing that which would offset the compaction advantage?
>

Re: Equivalent praxis to CLUSTERED INDEX?

From

Mischa Sandberg

Date:

29 August 2004, 12:15:40

This discussion is starting to sound like the split in HEAP memory
management evolution, into garbage-collecting (e.g. Java) and
non-garbage-collecting (e.g. C++).

Reclamation by GC's these days has become seriously sophisticated.
CLUSTER resembles the first generation of GC's, which were
single-big-pass hold-everything-else threads.

Perhaps the latest in incremental GC algorithms would be worth scouting,
for the next step in PG page management.

Greg Stark wrote:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
>>but is there any significant performance benefit to doing that which would
>>offset the compaction advantage?
>
> Just as a side comment. Setting PCTFREE 0 PCTUSED 100 on tables that have no
> updates on them has an astonishingly big effect on speed. So the penalty for
> leaving some space free really is substantial.
>
> I think the other poster is right. Oracle really needs pctfree because of the
> way it handles updates. Postgres doesn't really need as much because it
> doesn't try to squeeze the new tuple in the space the old one took up. If it
> doesn't fit on the page the worst that happens is it has to store it on some
> other page, whereas oracle has to do its strange row chaining thing.

Re: Equivalent praxis to CLUSTERED INDEX?

From

Mischa Sandberg

Date:

29 August 2004, 12:15:59

Sheer nitpick here...

A B-tree is where the records (data) live at all levels of the tree;
B+ tree is where the records are only at the leaf level.
That's what Knuth calls them, anyway.

Clustered indexes for all known dbs are true B+ trees.
Nonclustered indexes could be B-trees (probably aren't),
since there's no big fanout penalty for storing the little
(heap) row locators everywhere at all levels.

J. Andrew Rogers wrote:
> As far as I know, Oracle does it by having a B-Tree organized heap (a
> feature introduced around v8 IIRC), basically making the primary key
> index and the heap the same physical structure.
...

Re: Equivalent praxis to CLUSTERED INDEX?

From

Mischa Sandberg

Date:

29 August 2004, 12:16:51

J. Andrew Rogers wrote:
> On Thu, 2004-08-26 at 12:30, Magnus Hagander wrote:
>>IIRC, SQL Server always creates clustered indexes by default for primary
>>keys.
>
> That would surprise me actually.

Yaz, it should. It doesn't ALWAYS create clustered (unique) index for
primary keys, but clustered is the default if you just specify

CREATE TABLE Foo (col1, ...
    ,PRIMARY KEY(col1, ...)
)

Saying PRIMARY KEY NONCLUSTERED(...) is how you override the default.

((Weird to be discussing so much MSSQL here))

Re: Equivalent praxis to CLUSTERED INDEX?

From

Josh Berkus

Date:

29 August 2004, 15:58:24

Mishca,

> Ummm ... not quite. In MSSQL/Sybase/Oracle, a clustered index maintains
> its space saturation as part of each update operation. High activity
> does indeed result in less-full pages (typically 60-80% full for tables
> with heavy deletions or rowsize changes). To bring the percentage back
> up, you run DBCC INDEXDEFRAG, which also does what you'd expect of a
> normal file defragmenter -- put related disk pages together on the platter.

Sure, it does now, which is a nice thing.  It didn't in the first version
(6.5) where this cluster maint needed to be done manually and asynchronously,
as I recall.

> As for SQL Server being a 'single-user database' ... ummm ... no, I
> don't think so.

Hmmm ... perhaps it would be better if I said "serial-only database".   MSSQL
(like earlier versions of Sybase) handles transactions by spooling everything
out to a serial log, effectively making all transcations SERIAL isolation.
This has some significant benefits in performance for OLAP and data
warehousing, but really kills you on high-concurrency transaction processing.

> I'm REALLY happy to be shut of the Microsoft world, but
> MSSQL 7/2000/2005 is a serious big DB engine. It also has some serious
> bright heads behind it. They hired Goetz Graefe and Paul (aka Per-Ake)
> Larsen away from academia, and it shows, in the join and aggregate
> processing. I'll be a happy camper if I manage to contribute something
> to PG that honks the way their stuff does. Happy to discuss, too.

Yeah, they also have very speedy cursor handling.  I can do things with
cursors in MSSQL that I wouldn't consider with other DBs.   Not that that
makes up for the lack of other functionality, but it is nice when you need
it.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Equivalent praxis to CLUSTERED INDEX?

From

"Jim C. Nasby"

Date:

31 August 2004, 18:50:21

On Thu, Aug 26, 2004 at 12:04:48PM -0700, J. Andrew Rogers wrote:
> The major caveat to having tables of this type is that you can only have
> a primary key index.  No other indexes are possible because the "heap"
> constantly undergoes local reorganizations if you have a lot of write
> traffic, the same kind of reorganization you would normally expect in a
> BTree index.

This isn't true, at least in 9i. You can create whatever indexes you
want on an index-organized table. I believe that the index stores the PK
value instead of the ROWID.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Equivalent praxis to CLUSTERED INDEX?

From

"Jim C. Nasby"

Date:

31 August 2004, 19:05:07

On Thu, Aug 26, 2004 at 11:39:42PM -0400, Greg Stark wrote:
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
> > Updated TODO item:
> >
> >         o Automatically maintain clustering on a table
> >
> >         This would require some background daemon to maintain clustering
> >         during periods of low usage. It might also require tables to be only
> >         paritally filled for easier reorganization.  It also might require
> >         creating a merged heap/index data file so an index lookup would
> >         automatically access the heap data too.
>
> Fwiw, I would say the first "would" is also a "might". None of the previous
> discussions here presumed a maintenance daemon. The discussions before talked
> about a mechanism to try to place new tuples as close as possible to the
> proper index position.
>
> I would also suggest making some distinction between a cluster system similar
> to what we have now but improved to maintain the clustering continuously, and
> an actual index-organized-table where the tuples are actually only stored in a
> btree structure.
>
> They're two different approaches to similar problems. But they might both be
> useful to have, and have markedly different implementation details.

There's a third approach that I think is worth considering. Half of the
benefit to clustered tables is limiting the number of pages you need to
access when scanning the primary key. The order of tuples in the pages
themselves isn't nearly as important as ordering of the pages. This
means you can get most of the benefit of an index-organized table just
by being careful about what page you place a tuple on. What I'm thinking
of is some means to ensure all the tuples on a page are within some PK
range, but not worrying about the exact order within the page since it's
relatively cheap to scan through the page in memory.

Some pros:
This would probably mean less change to the code that inserts tuples.

No need for a background daemon.

No need to create a new B-Tree table structure.

Ideally, there won't be need to move tuples around, which should mean
that current indexing code doesn't need to change.

Cons:
Need to have some way to deal with pages that fill up.

To gain full benefit some means of indicating what range of PK values
are on a page might be needed.

It's not as beneficial as a true IOT since you don't get the benefit of
storing your tuples inline with your B-Tree.

I'm sure there's a ton of things I'm missing, especially since I'm not
familiar with the postgresql code, but hopefully others can explore this
further.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Equivalent praxis to CLUSTERED INDEX?

From

Mischa Sandberg

Date:

11 September 2004, 06:07:36

Mischa Sandberg wrote:
> Coming from the MSSQL world, I'm used to the first step in optimization
> to be, choose your clustered index and choose it well.
> I see that PG has a one-shot CLUSTER command, but doesn't support
> continuously-updated clustered indexes.
> What I infer from newsgroup browsing is, such an index is impossible,
> given the MVCC versioning of records (happy to learn I'm wrong).
> I'd be curious to know what other people, who've crossed this same
> bridge from MSSQL or Oracle or Sybase to PG, have devised,
> faced with the same kind of desired performance gain for retrieving
> blocks of rows with the same partial key.

Just to let people know, after trying various options, this looks the
most promising:

- segment the original table into four tables (call them A,B,C,D)

- all insertions go into A.
- longterm data lives in B.

- primary keys of all requests to delete rows from (B) go into D -- no
actual deletions are done against B. Deletions against A happen as normal.

- all queries are made against a view: a union of A and B and (not
exists) D.

- daily merge A,B and (where not exists...) D, into C
- run cluster on C, then swap names on B and C, truncate A and D.

Not rocket science, but it seems to give the payback of normal
clustering without locking the table for long periods of time. It also
saves on VACUUM FULL time.

At present, we're only at 1M rows in B on this. More when I know it.
Advance warning on any gotchas with this approach would be much
appreciated. Making a complete copy of (B) is a bit of an ouch.