Thread: Re: Index/Function organized table layout (from Re:

Re: Index/Function organized table layout (from Re:

From

Hannu Krosing

Date:

04 October 2003, 09:01:08

Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> jllachan@nsd.ca (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
>
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
>
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.

Could this be made a TODO item, perhaps with your attack plan.
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes.

> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for "walking through the
tuples" in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.

Being able to do fast count(*) is just a side benefit.

----------------
Hannu

COUNT(*) again (was Re: Index/Function organized table layout)

From

Tom Lane

Date:

04 October 2003, 16:08:46

Hannu Krosing <hannu@tm.ee> writes:
> Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
>> A while back I outlined how this would have to be done, and for it to
>> be done efficiently, it would be anything BUT simple.

> Could this be made a TODO item, perhaps with your attack plan.

If I recall that discussion correctly, no one including Christopher
thought the attack plan was actually reasonable.

What this keeps coming down to is that an optimization that helps only
COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
development and maintenance effort to justify its existence.

At least if you insist on an exact, MVCC-correct answer.  So far as I've
seen, the actual use cases for unqualified COUNT(*) could be handled
equally well by an approximate answer.  What we should be doing rather
than wasting large amounts of time trying to devise exact solutions is
telling people to look at pg_class.reltuples for approximate answers.
We could also be looking at beefing up support for that approach ---
maybe provide some syntactic sugar for the lookup, maybe see if we can
update reltuples in more places than we do now, make sure that the
autovacuum daemon includes "keep reltuples accurate" as one of its
design goals, etc etc.

            regards, tom lane

Re: [PERFORM] COUNT(*) again (was Re: Index/Function

From

Hannu Krosing

Date:

04 October 2003, 21:01:06

Tom Lane kirjutas L, 04.10.2003 kell 19:07:
> Hannu Krosing <hannu@tm.ee> writes:
> > Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> >> A while back I outlined how this would have to be done, and for it to
> >> be done efficiently, it would be anything BUT simple.
>
> > Could this be made a TODO item, perhaps with your attack plan.
>
> If I recall that discussion correctly, no one including Christopher
> thought the attack plan was actually reasonable.
>
> What this keeps coming down to is that an optimization that helps only
> COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
> development and maintenance effort to justify its existence.

Please read further in my email ;)

The point I was trying to make was that faster count(*)'s is just a side
effect. If we could (conditionally) keep visibility info in indexes,
then this would also solve the problem fo much more tricky question of
index-structured tables.

Count(*) is *not* the only query that could benefit from not needing to
go to actual data table for visibilty info, The much more needed case
would be the "inveres time series" type of queries, which would
otherways trash cache pages badly.

----------------------------
Hannu

Re: [PERFORM] COUNT(*) again (was Re: Index/Function organized table layout)

From

Tom Lane

Date:

04 October 2003, 21:16:31

Hannu Krosing <hannu@tm.ee> writes:
> The point I was trying to make was that faster count(*)'s is just a side
> effect. If we could (conditionally) keep visibility info in indexes,

I think that's not happening, conditionally or otherwise.  The atomicity
problems alone are sufficient reason why not, even before you look at
the performance issues.

            regards, tom lane

Uses for Index/Function organizing

From

James Rogers

Date:

04 October 2003, 21:18:11

On 10/4/03 2:00 AM, "Hannu Krosing" <hannu@tm.ee> wrote:
>
> If the WHERE clause could use the same index (or any index with
> visibility info) then there would be no need for "walking through the
> tuples" in data relation.
>
> the typical usecase cited on [HACKERS] was time series data, where
> inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
> order. Now if the index would include all relevant fields
> (id,timestamp,data1,data2,...,dataN) then the query could run on index
> only touching just a few pages and thus vastly improving performance. I
> agree that this is not something everybody needs, but when it is needed
> it is needed bad.

I would add that automatically index-organizing tuples isn't just useful for
time-series data (though it is a good example), but can be used to
substantially improve the query performance of any really large table in a
number of different and not always direct ways.  Once working sets routinely
exceed the size of physical RAM, buffer access/utilization efficiency often
becomes the center of performance tuning, but not one that many people know
much about.

One of the less direct ways of using btree-organized tables for improving
scalability is to "materialize" table indexes of tables that *shouldn't* be
btree-organized.  Not only can you turn tables into indexes, but you can
also turn indexes into tables, which can have advantages in some cases.

For example, I did some scalability consulting at a well-known movie rental
company with some very large Oracle databases running on big Sun boxen.  One
of the biggest problems was that their rental history table, which had a
detailed record of every movie ever rented by every customer, had grown so
large that the performance was getting painfully slow.  To make matters
worse, it and a few related tables had high concurrent usage, a mixture of
many performance-sensitive queries grabbing windows of a customer's history
plus a few broader OLAP queries which were not time sensitive.  Everything
was technically optimized in a relational and basic configuration sense, and
the database guys at the company were at a loss on how to fix this problem.
Performance of all queries was essentially bound by how fast pages could be
moved between the disk and buffers.

Issue #1:  The history rows had quite a lot of columns and the OLAP
processes used non-primary indexes, so the table was not particularly
suitable for btree-organizing.

Issue #2:  Partitioning was not an option because it would have exceeded
certain limits in Oracle (at that time, I don't know if that has changed).

Issue #3:  Although customer histories were being constantly queried, data
needed most was really an index view of the customers history, not the
details of the history itself.

The solution I came up with was to use a synced btree-organized partial
clone of the main history table that only contained a small number of key
columns that mattered for generating customer history indexes in the
applications that used them.  While this substantially increased the disk
space footprint for the same data (since we were cloning it), it greatly
reduced the total number of cache misses for the typical query, only
fetching the full history row pages when actually needed.  In other words,
basically concentrating more buffer traffic into a smaller number of page
buffers.  What we had was an exceedingly active but relatively compact
materialized index of the history table that could essentially stay resident
in RAM, and a much less active history table+indexes that while less likely
to be buffered than before, had pages accessed at such a reduced frequency
that there was a huge net performance gain because disk access plummeted.

Average performance improvement for the time sensitive queries: 50-70x

So btree-organized tables can do more than make tables behave like indexes.
They can also make indexes behave like tables.  Both are very useful in some
cases when your working set exceeds the physical buffer space.  For smaller
databases this has much less utility and users need to understand the
limitations, nonetheless when tables and databases get really big it becomes
an important tool in the tool belt.

Cheers,

-James Rogers
 jamesr@best.com

Re: [PERFORM] COUNT(*) again (was Re: Index/Function organized

From

Bruce Momjian

Date:

05 October 2003, 04:21:29

Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > The point I was trying to make was that faster count(*)'s is just a side
> > effect. If we could (conditionally) keep visibility info in indexes,
>
> I think that's not happening, conditionally or otherwise.  The atomicity
> problems alone are sufficient reason why not, even before you look at
> the performance issues.

What are the atomicity problems of adding a create/expire xid to the
index tuples?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PERFORM] COUNT(*) again (was Re: Index/Function organized table layout)

From

Tom Lane

Date:

05 October 2003, 06:09:29

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> I think that's not happening, conditionally or otherwise.  The atomicity
>> problems alone are sufficient reason why not, even before you look at
>> the performance issues.

> What are the atomicity problems of adding a create/expire xid to the
> index tuples?

You can't update a tuple's status in just one place ... you have to
update the copies in the indexes too.

            regards, tom lane

Re: [PERFORM] COUNT(*) again (was Re: Index/Function organized

From

Bruce Momjian

Date:

05 October 2003, 13:37:36

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I think that's not happening, conditionally or otherwise.  The atomicity
> >> problems alone are sufficient reason why not, even before you look at
> >> the performance issues.
>
> > What are the atomicity problems of adding a create/expire xid to the
> > index tuples?
>
> You can't update a tuple's status in just one place ... you have to
> update the copies in the indexes too.

But we don't update the tuple status for a commit, we just mark the xid
as committed.  We do have lazy status bits that prevent later lookups in
pg_clog, but we have those in the index already also.

What am I missing?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073