Thread: Inherited indexes.

Inherited indexes.

From
Fredrik Olsson
Date:
Hi.

To allow indexes to be inherited so unique, foreign keys and such works 
properly with inheritance has been on the todo for quite some time. I 
thought that most probably it is a very non trivial thing, perhaps 
completely rethinking how indexes are done. Or perhaps it is not a 
feature that is requested allot and therefor no one ever got around to 
it. I am optimistic so I hoped for the second alternative and begun 
browsing the sources to see what could be done.

Well, from what I have been able to figure out it is not trivial, at 
least not to me. To be honest I have not completely figured out how the 
existing indexes works and fit into the constraints.

I am not quite sure what I am asking for, quite allot I guess. Is there 
someone already working on it? If so or if someone is considering 
perhaps I should start with one of the tasks clearly marked as easy, as 
an novice to postgresql hacking might be of better use then.
Or maybe it is quite easy with the right directions, as in; non complex 
but takes time. So some one who knows what needs to be done, but do not 
have the time themselves could give an outline?

regards

-- 
//Fredrik Olsson Treyst AB +46-19-362182 fredrik.olsson@treyst.se



Re: Inherited indexes.

From
Tom Lane
Date:
Fredrik Olsson <fredrik.olsson@treyst.se> writes:
> To allow indexes to be inherited so unique, foreign keys and such works 
> properly with inheritance has been on the todo for quite some time. I 
> thought that most probably it is a very non trivial thing, perhaps 
> completely rethinking how indexes are done.

Yup, you're right.  There are a couple of major problems, to my mind:

1. A cross-table index would need to store a table OID as well as the
existing block/offset information in order to tell you what an entry is
pointing at.  An extra 4 bytes per index entry (8 bytes if MAXALIGN is
8) is a lot of overhead, so you'd not want to pay that all the time.
Which means two index tuple header formats to support, which looks
painful.  How can that be handled cleanly and efficiently?

2. Nobody has any idea how to handle the locking requirements.  For the
most part, we assume that a lock on a table protects its associated
indexes too.  What happens when an index is shared by multiple tables?
Are there deadlock problems?  A particularly nasty example is that in a
unique index, inserting into one table may require visiting other tables
(that you've not even got lock on) to see if potentially conflicting
rows are still live.

> Or perhaps it is not a feature that is requested allot and therefor no
> one ever got around to it.

No, it's been requested plenty, but it looks hard.  See the pghackers
archives for previous discussions.
        regards, tom lane


Re: Inherited indexes.

From
"Jim C. Nasby"
Date:
On Sun, Oct 02, 2005 at 09:46:07PM -0400, Tom Lane wrote:
> Fredrik Olsson <fredrik.olsson@treyst.se> writes:
> > To allow indexes to be inherited so unique, foreign keys and such works 
> > properly with inheritance has been on the todo for quite some time. I 
> > thought that most probably it is a very non trivial thing, perhaps 
> > completely rethinking how indexes are done.
> 
> Yup, you're right.  There are a couple of major problems, to my mind:
> 
> 1. A cross-table index would need to store a table OID as well as the
> existing block/offset information in order to tell you what an entry is
> pointing at.  An extra 4 bytes per index entry (8 bytes if MAXALIGN is
> 8) is a lot of overhead, so you'd not want to pay that all the time.
> Which means two index tuple header formats to support, which looks
> painful.  How can that be handled cleanly and efficiently?

Wouldn't it make more sense to use a smaller pointer to a table of OIDs
that that index covers? I don't know off-hand how much padding there
currently is in index tuples, but hopefully this would allow bringing
the space usage under control for common cases involving less than a few
dozen tables.

Another possibility is optimizing for the special case of indexing on a
partitioning key. In this case, index values would be very localized to
one table, so just storing the table info on each index page (or
something similar) would work well.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Inherited indexes.

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Sun, Oct 02, 2005 at 09:46:07PM -0400, Tom Lane wrote:
>> 1. A cross-table index would need to store a table OID as well as the
>> existing block/offset information in order to tell you what an entry is
>> pointing at.

> Wouldn't it make more sense to use a smaller pointer to a table of OIDs
> that that index covers?

Smaller than what?  Don't tell me you want to restrict how many tables a
cross-table index can handle :-(

In any case, the gain from doing that would be exactly zero because of
alignment considerations: the size of an index tuple header really has
to be a multiple of MAXALIGN.
        regards, tom lane


Re: Inherited indexes.

From
"Zeugswetter Andreas DAZ SD"
Date:
> Another possibility is optimizing for the special case of
> indexing on a partitioning key. In this case, index values
> would be very localized to one table, so just storing the
> table info on each index page (or something similar) would work well.

If you have the partitioning key in the index and the partitions don't
overlap, it is better to create separate [unique] indexes on the
subtables.
Building separate indexes per partition is usually preferred because of:
1. performance of dropping a partition
2. smaller index for CE

Only if you need an "order by" without a sort step, that spawns more
than one partition
things usually get ugly. Imho the best solution would be a merge node,
that merges results of
several index accesses to avoid a sort and still use separate indexes.
Such
a merge node could probably also detect the case where accessing
partitions in a certain
order still produces ordered results.

Usually you will only want the "one big unique index" when the
partitioning is not
reflectable in the index keys, and then (also in other db's) such an
index is usually a pain ...

Andreas


Re: Inherited indexes.

From
"Jim C. Nasby"
Date:
On Tue, Oct 04, 2005 at 11:05:49AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Sun, Oct 02, 2005 at 09:46:07PM -0400, Tom Lane wrote:
> >> 1. A cross-table index would need to store a table OID as well as the
> >> existing block/offset information in order to tell you what an entry is
> >> pointing at.
> 
> > Wouldn't it make more sense to use a smaller pointer to a table of OIDs
> > that that index covers?
> 
> Smaller than what?  Don't tell me you want to restrict how many tables a
> cross-table index can handle :-(
> 
> In any case, the gain from doing that would be exactly zero because of
> alignment considerations: the size of an index tuple header really has
> to be a multiple of MAXALIGN.

Hrm, I see that IndexTupleData doesn't have room 'left over' like
HeapTupleData does. If it did, it would probably be a win to allow for
indexes that are on less than X number of tables, where X is whatever
value we can fit into the tuple header. Since this could be exceeded at
any time, we'd also need a flag to indicate that a given tuple is for a
table that's not in the lookup table and that an actual OID is stored.

Given that that's not (currently) the case, it seems that the unused bit
could be used to indicate if the tuple was for a table other than the
one the index was originally created on. That would allow for adding a
table to an existing index without re-writing the entire thing. It could
also provide some speed improvement in cases where the table the index
was defined on contained the majority of the data, but that's a pretty
far-out corner case.

Of course, this is all academic performance tuning until we actually
have cross-table indexes.

Does that 'just' leave locking as the issue? I think cross-table indexes
are going to become a lot more important as our partitioning support
increases, so it would be good if this could get done for 8.2 (I think
it's on Simon's list right now).
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Inherited indexes.

From
"Jim C. Nasby"
Date:
Well, I never said unique, but you're correct, it's pretty undesirable
to put a global index on your partitioning key.

On Tue, Oct 04, 2005 at 06:16:21PM +0200, Zeugswetter Andreas DAZ SD wrote:
> 
> > Another possibility is optimizing for the special case of 
> > indexing on a partitioning key. In this case, index values 
> > would be very localized to one table, so just storing the 
> > table info on each index page (or something similar) would work well.
> 
> If you have the partitioning key in the index and the partitions don't
> overlap, it is better to create separate [unique] indexes on the
> subtables.
> Building separate indexes per partition is usually preferred because of:
> 1. performance of dropping a partition
> 2. smaller index for CE
> 
> Only if you need an "order by" without a sort step, that spawns more
> than one partition
> things usually get ugly. Imho the best solution would be a merge node,
> that merges results of
> several index accesses to avoid a sort and still use separate indexes.
> Such
> a merge node could probably also detect the case where accessing
> partitions in a certain 
> order still produces ordered results.
> 
> Usually you will only want the "one big unique index" when the
> partitioning is not 
> reflectable in the index keys, and then (also in other db's) such an
> index is usually a pain ...
> 
> Andreas
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Inherited indexes.

From
Simon Riggs
Date:
On Tue, 2005-10-04 at 18:16 +0200, Zeugswetter Andreas DAZ SD wrote:
> > Another possibility is optimizing for the special case of 
> > indexing on a partitioning key. In this case, index values 
> > would be very localized to one table, so just storing the 
> > table info on each index page (or something similar) would work well.
> 
> If you have the partitioning key in the index and the partitions don't
> overlap, it is better to create separate [unique] indexes on the
> subtables.
> Building separate indexes per partition is usually preferred because of:
> 1. performance of dropping a partition
> 2. smaller index for CE

...

> Imho the best solution would be a merge node,
> that merges results of
> several index accesses to avoid a sort and still use separate indexes.
> Such
> a merge node could probably also detect the case where accessing
> partitions in a certain 
> order still produces ordered results.

Yes, that was my conclusion also.

There are a number of intermediate steps along the way, so it will take
some time to achieve it.

> Usually you will only want the "one big unique index" when the
> partitioning is not 
> reflectable in the index keys, and then (also in other db's) such an
> index is usually a pain ...

Agreed^2. The idea of a global index is a non-starter for all of the
reasons that Tom gave and the main one: Its's unusably huge. There's no
point in partitioning a 1TB table if you then have to build a 500GB
index on it. The tree would be so deep that each insert would require
maybe 3 I/Os on index branch blocks before you got to the leaf. Insert
performance would suck real bad, which is a blocker since if you have a
large table you almost certainly have a lot of data to load. If you
don't have a big table you shouldn't be partitioning it anyway. 

Best Regards, Simon Riggs



relational class vs partitioned table (was Inherited indexes)

From
Trent Shipley
Date:
On Tuesday 2005-10-04 13:54, Simon Riggs wrote:
> On Tue, 2005-10-04 at 18:16 +0200, Zeugswetter Andreas DAZ SD wrote:
> > > Another possibility is optimizing for the special case of
> > > indexing on a partitioning key. In this case, index values
> > > would be very localized to one table, so just storing the
> > > table info on each index page (or something similar) would work well.
> >
> > If you have the partitioning key in the index and the partitions don't
> > overlap, it is better to create separate [unique] indexes on the
> > subtables.
> > Building separate indexes per partition is usually preferred because of:
> > 1. performance of dropping a partition
> > 2. smaller index for CE
>
> ...

<snip>
merge node strategy.
</snip>


> > Usually you will only want the "one big unique index" when the
> > partitioning is not
> > reflectable in the index keys, and then (also in other db's) such an
> > index is usually a pain ...
>
> Agreed^2. The idea of a global index is a non-starter for all of the
> reasons that Tom gave and the main one: Its's unusably huge. There's no
> point in partitioning a 1TB table if you then have to build a 500GB
> index on it. The tree would be so deep that each insert would require
> maybe 3 I/Os on index branch blocks before you got to the leaf. Insert
> performance would suck real bad, which is a blocker since if you have a
> large table you almost certainly have a lot of data to load. If you
> don't have a big table you shouldn't be partitioning it anyway.

It has taken me a while to organize my thoughts on this post to the thread, 
but I am struck by the fact that what started out as a discussion of 
relational inheritance and support for multi-relation uniqueness by indexes 
morphed into a discussion of partitioning table storage and how that might be 
supported by indexes.

It is possible that the topical change was simply due to the usual meandering 
of threads but I fear that instead it may not have been random but caused by 
conflating the inheritance problem with partitioning.  The two problems have 
seeming similarities inasmuch as both involve multiple internal tables.  
Unfortunately, they are rather distinct.

Partitioning is a database engineering tool to enhance the performance of HUGE 
databases, most notably biogenetic databases.  Partitioning is a classic 
divide and conquer strategy -- the goal is to chop something unmanageably 
large into things that are manageably small.  

Just as critical partitioning is in no way a relational problem.  It has no 
effect on data representation and it should be irrelevant to a database 
developer or a report writer.  Partitioning is strictly a storage and 
implementation problem.

In general, one partitions by a hash rule (usually a simple modulus operation 
on a serial number or OID) or by a range rule -- typically on a date-time or 
possibly postal codes.  Since the partition rule is "published" the database 
engine can "hash" to the proper index or table.  

Note that just as one can have multi-dimensional arrays, partitioning presents 
an analogous data storage problem and there is no logical reason that a 
relation could not be partitioned by any number of ranges or hashes.  In 
practice, this serves no useful performance advantage, but simply divides the 
logical table (relation) into many small physical tables.

In Oracle indexes of attributes that are used as partitioning criteria are 
simply partitioned in parallel to the (logical) table they reference.  This 
seems to be the sort of solution that the thread was heading toward.

---

In contrast relational inheritance is a design tool that would allow 
polymorphic access to modestly large relations.  Note that an instance of 
relational inheritance explicitly or implicitly establishes a relational 
class.  Furtheremore, a relational class is a type of multi-relation. 
Relational inheritance provides partial semantic unification over a taxonomic 
space.  

The semantic element in relational inheritance is critical.  With a relation 
partitioned on _sequence_result_, given any _sequence_result_ one knows in 
what physical table(s) to look up the associated tuple (because by definition 
partitioned relations have partition rules).  With a multi-relation partially 
unified on _sequence_result_, however, there is no way one can know what 
member table has _sequence_result_'s tuple.  In the case of a relational 
class (or any other partially unified multi-relation) you have to use a layer 
of indirection.

Class-wide uniqueness (partial unification of multi-relations) can be solved 
by using a global index for the relational class.  It cannot be solved using 
partitioning.  In many ways the problems are quite distinct.

=====

Of course, there is no reason a relation in a relational class might not be 
huge.  By way of inclusion, a relational class containing one or more huge 
relations would also be huge.  It seems to me that rather that partitioning 
member relations on unifying attribute(s), implementation would be easier if 
one were required to partition entire semi-unified multi-relations as a 
whole.

Orthoganal partion rules would be created for the class.  The rules would be 
applied to each member relation.  Finally, the rules would be applied to the 
relevant unifying (presumably unique) indexes.

But inasmuch as Postgresql has implemented neither partitioning nor unique 
constraints for relational classes we are getting somewhat ahead of 
ourselves.

====

Partitioning is obviously dominated by partitioning rules.  Oracle's SQL 
dialect provides a negative example of how to elegantly incorporate 
partitioning rules into SQL.  Ideally partitioning rules should be 
first-class objects.  A database engineer or the poor DBA who inherits his 
implementation should be able to query the meta-data to get a listing of all 
partitioned relations.
  



Re: relational class vs partitioned table (was

From
Simon Riggs
Date:
On Mon, 2005-10-10 at 19:58 -0700, Trent Shipley wrote:
> Of course, there is no reason a relation in a relational class might not be 
> huge.  

Well, as a designer, I would make it so.

> Orthoganal partion rules would be created for the class.  The rules would be 
> applied to each member relation.  Finally, the rules would be applied to the 
> relevant unifying (presumably unique) indexes.
> 
> But inasmuch as Postgresql has implemented neither partitioning nor unique 
> constraints for relational classes we are getting somewhat ahead of 
> ourselves.

Maybe you aren't aware of the new constraint_exclusion feature in 8.1 ?

> Partitioning is obviously dominated by partitioning rules.  Oracle's SQL 
> dialect provides a negative example of how to elegantly incorporate 
> partitioning rules into SQL.  Ideally partitioning rules should be 
> first-class objects.  A database engineer or the poor DBA who inherits his 
> implementation should be able to query the meta-data to get a listing of all 
> partitioned relations.

The partitioning doesn't follow Oracle syntax at all. Partitions are
first class objects as you suggest.

Best Regards, Simon Riggs