Thread: Index only scan paving the way for "auto" clustered tables?

Index only scan paving the way for "auto" clustered tables?

From
Royce Ausburn
Date:
Hi all,

I wonder, could the recent work on index only scans pave the way for auto clustered tables?  Consider a wide, mostly
inserttable with some subset of columns that I'd like to cluster on.  I'm after locality of tuples that are very
frequentlyfetched together, but not keen on the downtime for a cluster, nor the maintenance that it requires.  Would it
bea stretch to have an index that branches on the subset of "cluster" columns, but still stores all the columns, making
ita covering index?  Given that we can already index concurrently, such an index would not require downtime, and would
beself maintaining.  From my understanding of the index-only scan implementation, I suspect that such an index would
effectivelygive locality, with some caveats…  

I'd expect the overhead of inserting in to such a table would be high, perhaps prohibitive.  Perhaps better ways have
beendiscussed.  Stupid idea? 

--Royce



Re: Index only scan paving the way for "auto" clustered tables?

From
Robert Haas
Date:
On Tue, Oct 11, 2011 at 7:08 AM, Royce Ausburn <royce.ml@inomial.com> wrote:
> I wonder, could the recent work on index only scans pave the way for auto clustered tables?  Consider a wide, mostly
inserttable with some subset of columns that I'd like to cluster on.  I'm after locality of tuples that are very
frequentlyfetched together, but not keen on the downtime for a cluster, nor the maintenance that it requires.  Would it
bea stretch to have an index that branches on the subset of "cluster" columns, but still stores all the columns, making
ita covering index?  Given that we can already index concurrently, such an index would not require downtime, and would
beself maintaining.  From my understanding of the index-only scan implementation, I suspect that such an index would
effectivelygive locality, with some caveats… 

I guess we could do that, but I'm not convinced there would be much
benefit.  The only thing you'd be saving would be the cost of keeping
the tuples sorted by only the high-order columns rather than all of
them, and I doubt that's significant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Index only scan paving the way for "auto" clustered tables?

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
>> [implement "clustered index" as a covering index with all columns
>> which are present in the heap]

> I guess we could do that, but I'm not convinced there would be
> much benefit. 
The "traditional" way to implement a clustered index is to have the
leaf level of the index contain the tuples rather than pointers to
the tuples.  If we're going to do clustered tables, we might want to
jump all the way to that, rather than a half-way solution which
stores everything twice.
-Kevin


Re: Index only scan paving the way for "auto" clustered tables?

From
Robert Haas
Date:
On Tue, Oct 11, 2011 at 3:02 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>>> [implement "clustered index" as a covering index with all columns
>>> which are present in the heap]
>> I guess we could do that, but I'm not convinced there would be
>> much benefit.
>
> The "traditional" way to implement a clustered index is to have the
> leaf level of the index contain the tuples rather than pointers to
> the tuples.  If we're going to do clustered tables, we might want to
> jump all the way to that, rather than a half-way solution which
> stores everything twice.

Not a bad thought.

Actually, I've been thinking for a while now that we might need a
pluggable heapam, similar to the pluggable indexams we already have.
Our current heap format has served us pretty well, but there are any
number of things that we can't really do without changing it.  Of
course, if we came up with a new format that was better in every case,
across the board, then perhaps we'd be willing to just replace the
current format outright -- though even then, that would break
pg_upgrade, which would be painful, to put it mildly.  And it seems to
me that there could easily be format changes that would make sense for
particular cases, but not across the board, like:

- index-organized tables (heap is a btree, and secondary indexes
reference the PK rather than the TID; this is how MySQL does it, and
Oracle offers it as an option)
- WORM tables (no updates or deletes, and no inserts after creating
transaction commits, allowing a much smaller tuple header)
- non-transactional tables (tuples visible as soon as they're written,
again allowing for smaller tuple header; useful for internal stuff and
perhaps for insert-only log tables)

Alternatively, we could try to graft the concept of a self-clustering
table on top of the existing heap implementation.  But I'm having
trouble seeing how that would work.  The TODO describes it as
something like "maintain CLUSTER ordering", but that's a gross
oversimplification, because we have no structure that would allow us
to sensibly do any such thing...  the current heap implementation is
just that: a pile of stuff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Index only scan paving the way for "auto" clustered tables?

From
Florian Pflug
Date:
On Oct11, 2011, at 21:27 , Robert Haas wrote:
> Alternatively, we could try to graft the concept of a self-clustering
> table on top of the existing heap implementation.  But I'm having
> trouble seeing how that would work.  The TODO describes it as
> something like "maintain CLUSTER ordering", but that's a gross
> oversimplification, because we have no structure that would allow us
> to sensibly do any such thing...  the current heap implementation is
> just that: a pile of stuff.

We could still be smarter about where we insert new rows in a clustered
table, though.

Upon INSERT and UPDATE, we'd need to lookup the leaf page where the new
tuple will eventually go in the index we're supposed to maintain CLUSTER
for. Then we'd check if any of the pages referenced there contains enough
space, and if so place the new tuple there. If not it'd go at the end.

best regards,
Florian Pflug



Re: Index only scan paving the way for "auto" clustered tables?

From
Kääriäinen Anssi
Date:
Robert Haas wrote:
"""
And it seems to me that there could easily be format changes that
would make sense for particular cases, but not across the board,
like:

- index-organized tables (heap is a btree, and secondary indexes
reference the PK rather than the TID; this is how MySQL does it, and
Oracle offers it as an option)
- WORM tables (no updates or deletes, and no inserts after creating
transaction commits, allowing a much smaller tuple header)
- non-transactional tables (tuples visible as soon as they're written,
again allowing for smaller tuple header; useful for internal stuff and
perhaps for insert-only log tables)
"""

This is probably a silly idea, but I have been wondering about the
following idea: Instead of having visibility info in the row header,
have a couple of row visibility slots in the page header. These slots
could be shared between rows in the page, so that if you do a bulk
insert/update/delete you would only use one slot. If the slots
overflow, you would use external slots buffer.

When the row is all visible, no slot would be used at all.

The xmin, xmax and cid would be in the slots. ctid would have its
current meaning, except when the external slots would be used,
then ctid would point to the external slot, and it would have the real
row header. I don't know if there would be any other row header
parts which could be shared.

The external slots buffer would then contain xmin, xmax, cid and
the real ctid.

Updates would write the new rows to another page in the heap,
and old rows would stay in place, just as now. So there would not
be any redo log like configuration. Also, the external slots buffer
would be small (18 bytes per row), so it would not get out of
cache too easily.

The performance would suck if you had lots of small updates, or
long running transactions. On the other hand in data warehousing,
where bulk loads are normal, and there are a lot of small rows,
this could actually work.

As said, this is probably a silly idea. But as pluggable heap types
came up, I thought to ask if this could actually work. If this kind of
wondering posts are inappropriate for this list, please tell me so
that I can avoid these in the future.
- Anssi Kääriäinen


Re: Index only scan paving the way for "auto" clustered tables?

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Alternatively, we could try to graft the concept of a self-clustering
> table on top of the existing heap implementation.  But I'm having
> trouble seeing how that would work.  The TODO describes it as
> something like "maintain CLUSTER ordering", but that's a gross
> oversimplification, because we have no structure that would allow us
> to sensibly do any such thing...  the current heap implementation is
> just that: a pile of stuff.

I currently think that's the way to go, with some coarser granularity
than tuple or page.  Picture HOT inserts, if you will.  That would be at
the page level, but do we need that level of precision?

I'm thinking that we need something more like segment based here, or
maybe some intermediate value would be good between a page of 8Kb and a
segment of 1GB, but I'm not so sure.  We would have to track the bounds
of each segment for the indexed columns, and maintain them, and the
planner would have to exercise pruning at the segment level.

So going down too much in granularity would have negative impacts on
planning performances (too many data to play with), and anyway a server
that needs that kind of optimization can certainly handle a couple of GB
in its file system cache.

So, it's quite hand wavy still, but Segment Exclusion has been discussed
here already, and it seems to me that's the next thing we need. Call it
partial Seq Scan and HOT inserts if new names are your thing :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Index only scan paving the way for "auto" clustered tables?

From
Robert Haas
Date:
On Tue, Oct 11, 2011 at 4:00 PM, Florian Pflug <fgp@phlo.org> wrote:
> On Oct11, 2011, at 21:27 , Robert Haas wrote:
>> Alternatively, we could try to graft the concept of a self-clustering
>> table on top of the existing heap implementation.  But I'm having
>> trouble seeing how that would work.  The TODO describes it as
>> something like "maintain CLUSTER ordering", but that's a gross
>> oversimplification, because we have no structure that would allow us
>> to sensibly do any such thing...  the current heap implementation is
>> just that: a pile of stuff.
>
> We could still be smarter about where we insert new rows in a clustered
> table, though.
>
> Upon INSERT and UPDATE, we'd need to lookup the leaf page where the new
> tuple will eventually go in the index we're supposed to maintain CLUSTER
> for. Then we'd check if any of the pages referenced there contains enough
> space, and if so place the new tuple there. If not it'd go at the end.

That's an interesting idea, but my guess is that the "if not" clause
would trigger frequently enough to make this not work very well.

Of course, we'd need to test it to know for sure....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Index only scan paving the way for "auto" clustered tables?

From
Robert Haas
Date:
On Tue, Oct 11, 2011 at 4:21 PM, Kääriäinen Anssi
<anssi.kaariainen@thl.fi> wrote:
> This is probably a silly idea, but I have been wondering about the
> following idea: Instead of having visibility info in the row header,
> have a couple of row visibility slots in the page header. These slots
> could be shared between rows in the page, so that if you do a bulk
> insert/update/delete you would only use one slot. If the slots
> overflow, you would use external slots buffer.
>
> When the row is all visible, no slot would be used at all.

I've thought about stuff like this, too.  One idea would be to make
the slots just be items on the page.  Each tuple includes a 2-byte
field which points to the item number (on the same page) where the
XMAX, CID, and CTID information for the update/delete of that tuple is
stored.  If the tuple isn't updated or deleted, it points back to the
tuple's own item ID.  When a tuple is updated or deleted, add an item
to the page with the necessary XMAX/CMAX/CTID and set the pointer to
the new item ID.  If the page is full, allocate an "overflow block"
and store the overflow block number in the page header.  Add the new
item to the overflow block and set the 2-byte field to point to it,
setting the high bit or something like that to indicate that the data
is in the overflow block rather than the data block.  The main
objection to this idea I see is that if the overflow blocks get much
use, the overall performance might end up being poor.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company