Re: Index/Function organized table layout - Mailing list pgsql-hackers

From James Rogers
Subject Re: Index/Function organized table layout
Date
Msg-id 1065127491.9267.146.camel@localhost.localdomain
Whole thread Raw
In response to Re: Index/Function organized table layout  (Hannu Krosing <hannu@tm.ee>)
Responses Re: Index/Function organized table layout
List pgsql-hackers
On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote:
> So what you really need is the CLUSTER command to leave pages half-empty
> and the tuple placement logic on inserts/updates to place new tuples
> near the place where they would be placed by CLUSTER. I.e. the code that
> does actual inserting should be aware of CLUSTERing.


Not exactly. What you are describing is more akin to partitioning or
hash-organized tables i.e. sorting insert/update tuples to various pages
according to some hash function.  This works well if you a table is
composed of many well-defined working sets but the access and query
patterns for a given working set are varied.  A common example of this
is for large accounting tables, which are frequently partitioned by a
function which truncates the record timestamp to the year/month.  Note
that indexes can be partitioned as well, so that as long as you are
doing queries within a given month (using the accounting example), the
query performance is generally what you would expect if the entire table
was the size of one month's data, no matter how big the table actually
gets.

Partitions and hash-organized tables (for RDBMS that support them) are
often handled internally as multiple tables masquerading as one (kind of
like a view), with the hash function determining which table is actually
being accessed.  This allows the possibility of doing things like
locking individual partitions or setting them "read-only", and generally
having the option of treating them as individual tables for
administrative purposes e.g. deleting a partition of old unused data
without adversely affecting the "active" partition in the way it would
if they were truly one table, even if they look like one table from a
SQL-level view.

What I really need in my specific case is B-Tree organized tables,
though hashing/partitioning would help too.  It *is* a similar kind of
problem.

B-Tree organized tables basically make the table and the primary index
the same thing, and tend to be most useful for tables that use a single
multi-column primary key index for queries.  This has the effect of
putting all the tuples for a typical query in the same small number of
heap pages (and therefore buffers), allowing very efficient access in
the typical case with the caveat that non-indexed queries will be quite
slow.

B-Tree organized tables are particularly useful when the insert order is
orthogonal to the typical query order.  As I mentioned before, tables
that store parallel collections of time-series data are classic
examples.  Often the data is inserted into the pages in order that can
roughly be described as (timestamp, id), but is queried using (id,
timestamp) as the index.  If you have enough ids, you end up with the
pathological case where you typically have one relevant tuple per page
for a given query.


> True, but my above suggestion would be much easier to implement
> near-term. It seems to be a nice incremental improvement just needs 
> to touch places:
> 
> 1. selecting where new tuples go : 
> 
>   * updated ones go near old ones if not clustered and near the place
>     CLUSTER would place them if clustered. 
> 
>   * inserted ones go to the less than half-empty pages if not clustered
>     and near the place CLUSTER would place them if clustered. 
> 
> 2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
> in pages for clustered updates/inserts.


The nuisance would be keeping track of which pages are collecting which
tuples.  Knowing the CLUSTER index doesn't tell you much about which
pages would currently be a good place to put a new tuple.  You could
always markup the index that CLUSTER uses to keep track of good
candidates (plus some additional structures), but the more I think about
that, the more it looks like a nasty hack.

Cheers,

-James Rogersjamesr@best.com






pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: minor view creation weirdness
Next
From: Alvaro Herrera Munoz
Date:
Subject: Re: Index/Function organized table layout