Thread: Joins and links

Joins and links

From
Leon
Date:
Hello all!

 You probably remember me - recently I complained about speed
 of joins in Postgres. After a short investigation the way was
 found in which Postgres's optimizer can do the right job. It
 was constructive discussion. Now I want to tell you what could
 make Postgres better and faster. And what will make us (our
 development group) happy. Maybe I am bothering someone, if
 I do - tell me that.

 Let me begin.

 First of all, some accounting reports need to be delivered
 very fast - within minutes or so. And what's bad is that
 quite a few of these reports are quite time-consuming and search
 intensive. In particular, internals of these reports include
 a lot of joins on tables.

 Secondly, almost all of accounting information naturally
 fits into network data model, which can be implemented very
 efficiently.

 This stuff described here is not accounting-specific, it
 can be found in every database which uses master-detail
 tables and other such types of relations.

 So. How is join being performed in such cases? Although I am
 not an expert, I can imagine the way: first there is an (index)
 scan on first table, and then an (index) scan on the second.
 It is the best way, reality could be much worse as we have seen.

 How can we radically improve performance in such cases? There
 is a simple and quite obvious way. (For you not to think that
 I am hallucinating I will tell you that there exist some
 real servers that offer such features I am talking about)
 We should make a real reference in one table to another! That
 means there could be special data type called, say, "link",
 which is a physical record number in the foreign table.

 Queries could look like this:

 table1:
 a int4
 b link (->table2)

 table2:
 c int4
 recnum (system auxiliary field, really a record number in the table)

 select * from table2 where table1.a > 5 and table1.b = table2.recnum

 Such joins can fly really fast, as practice shows :)
 Just consider: the thing table1.b = table2.recnum is a READY-MADE
 join, so server doesn't have to build anything on top of that. It
 can simply perform lookup through link, and since it is a physical
 record number, this is done with the efficiency of C pointers! Thus
 performance gain is ENORMOUS.

 And it simplifies the optimizer, because it doesn't have to decide
 anything about whether to use indices and such like. The join is
 performed always the same way, and it is the best way.

 This feature, being implemented, could bring Postgres ahead
 of most commercial servers, so proving creative abilities of
 free software community. Let us make a step in the future!

Best regards,
 Leon



Re: [GENERAL] Joins and links

From
Herouth Maoz
Date:
At 15:46 +0300 on 05/07/1999, Leon wrote:


> ow can we radically improve performance in such cases? There
>  is a simple and quite obvious way. (For you not to think that
>  I am hallucinating I will tell you that there exist some
>  real servers that offer such features I am talking about)
>  We should make a real reference in one table to another! That
>  means there could be special data type called, say, "link",
>  which is a physical record number in the foreign table.
>
>  Queries could look like this:
>
>  table1:
>  a int4
>  b link (->table2)
>
>  table2:
>  c int4
>  recnum (system auxiliary field, really a record number in the table)
>
>  select * from table2 where table1.a > 5 and table1.b = table2.recnum
>
>  Such joins can fly really fast, as practice shows :)

If you are interested in such a feature, I would send it to the hackers
list and not the general list, which is not intended for development
issues, but for general problems and issues with existing versions.

The best would be, of course, to get hold of CVS and develop the needed
code yourself. That's what open software is all about. Perhaps if it's so
important to you, you could pay PostgreSQL incorporated, buying programmer
time for this feature.

In any case, a message to the hackers list may help you understand how
things are implemented in Postgres and how much work will be needed for
such a development. On the face of it, I can see several problems with this
idea, namely inefficiency in deletions, the need for fixed-length records
with no ability to add or drop columns or have variable-length fields
without maximum length, and needing to know all the tables that reference
a table for the sake of vacuuming. But maybe the hackers (who write
Postgres) would think differently. Ask them.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma



Re: [GENERAL] Joins and links

From
David Warnock
Date:
Leon,

I have a few questions abo8ut what you are proposing.

My background was in non SQL dbms (DataFlex) which supported recnums
which as you pointed out had a number of advantages. However, recnums
also have a significant number of problems. Some features like
replication have significant dificulties. Others such as exporting and
re-loading your data also need special work-arounds.

A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
to choose one index and have the database table sorted by this index.
Then you don't need a separate index for that sort-order. It means that
almost any index can work like a recnum and avoid looking in both the
index and the data. I am trying to remember the name of this feature but
cannot at the moment.

Regards

Dave

--
David Warnock
Sundayta Ltd

Re: [GENERAL] Joins and links

From
Bruce Momjian
Date:
> Leon,
>
> I have a few questions abo8ut what you are proposing.
>
> My background was in non SQL dbms (DataFlex) which supported recnums
> which as you pointed out had a number of advantages. However, recnums
> also have a significant number of problems. Some features like
> replication have significant dificulties. Others such as exporting and
> re-loading your data also need special work-arounds.
>
> A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
> to choose one index and have the database table sorted by this index.
> Then you don't need a separate index for that sort-order. It means that
> almost any index can work like a recnum and avoid looking in both the
> index and the data. I am trying to remember the name of this feature but
> cannot at the moment.
>

We have CLUSTER command, but it is something that has to be run manually.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Joins and links

From
Maarten Boekhold
Date:

David Warnock wrote:
> A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
> to choose one index and have the database table sorted by this index.
> Then you don't need a separate index for that sort-order. It means that
> almost any index can work like a recnum and avoid looking in both the
> index and the data. I am trying to remember the name of this feature but
> cannot at the moment.

CLUSTER ?

--

Maarten Boekhold, boekhold@tibco.com
TIBCO Finance Technology Inc.
The Atrium
Strawinskylaan 3051
1077 ZX Amsterdam, The Netherlands
tel: +31 20 3012158, fax: +31 20 3012358
http://www.tibco.com

Re: [GENERAL] Joins and links

From
David Warnock
Date:
Maarten Boekhold wrote:
>
> David Warnock wrote:
> > A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
> > to choose one index and have the database table sorted by this index.
> > Then you don't need a separate index for that sort-order. It means that
> > almost any index can work like a recnum and avoid looking in both the
> > index and the data. I am trying to remember the name of this feature but
> > cannot at the moment.
>
> CLUSTER ?

Thats the one. Thanks.

Dave

Re: [GENERAL] Joins and links

From
David Warnock
Date:
Bruce,

I did not know Postgresql had that. I have just looked at the docs and
it seems that the postgresql CLUSTER is similar to the feature in MS SQL
Server but at present is rather more limited as

a) It is static whereas CLUSTER can be set as an attribute on an index
in MS SQL Server and then ther index is not created separately the table
is kept permanently sorted by the index. This obviously is very very
slow if you add to the middle of the index and wasteful of space if you
delete rows from the table. However, for a sequential ID it is supposed
to work well.

b) as the Postgresql CLUSTER is static it does not replace the index.

I should say at this point that I never actually used the CLUSTER
feature in MS SQL Server as we decided not to use that product after
evaluating it. So I have no practical experience to know how much of the
speed improvement wanted by Leon it would deliver.

Dave
--
David Warnock
Sundayta Ltd

Re: [GENERAL] Joins and links

From
Bruce Momjian
Date:
>
>
> David Warnock wrote:
> > A solution I have seen in some sql dbms (eg MS SQL Server) is to be able
> > to choose one index and have the database table sorted by this index.
> > Then you don't need a separate index for that sort-order. It means that
> > almost any index can work like a recnum and avoid looking in both the
> > index and the data. I am trying to remember the name of this feature but
> > cannot at the moment.
>
> CLUSTER ?
>

man cluster.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Joins and links

From
Bruce Momjian
Date:
> Bruce,
>
> I did not know Postgresql had that. I have just looked at the docs and
> it seems that the postgresql CLUSTER is similar to the feature in MS SQL
> Server but at present is rather more limited as

It is in the FAQ under Performance too.

> a) It is static whereas CLUSTER can be set as an attribute on an index
> in MS SQL Server and then ther index is not created separately the table
> is kept permanently sorted by the index. This obviously is very very
> slow if you add to the middle of the index and wasteful of space if you
> delete rows from the table. However, for a sequential ID it is supposed
> to work well.

Well, sometimes it is better to have control over when this happens.
What is why cluster is nice, and we don't have time to add dynamic
cluster right now.

>
> b) as the Postgresql CLUSTER is static it does not replace the index.
>
> I should say at this point that I never actually used the CLUSTER
> feature in MS SQL Server as we decided not to use that product after
> evaluating it. So I have no practical experience to know how much of the
> speed improvement wanted by Leon it would deliver.

See the performance FAQ.  If you look in
pgsql/contrib/fulltextindex/README, we mention it too because without
it, fulltext indexing is slow.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re[2]: [GENERAL] Joins and links

From
Leon
Date:
Hello David,

Monday, July 05, 1999 you wrote:

D> I should say at this point that I never actually used the CLUSTER
D> feature in MS SQL Server as we decided not to use that product after
D> evaluating it. So I have no practical experience to know how much of the
D> speed improvement wanted by Leon it would deliver.

Not much. Because the main idea is to eliminate index scans
entirely, whereas CLUSTER is merely "CLUSTER will help because once the index
identifies the heap page for the first row that matches, all other rows that
match are probably already on the same heap page, saving disk accesses and
speeding up the query." - this is at best a few percent gain and means
nothing if the database is entirely in memory (as it often is).

Best regards, Leon



Re: [GENERAL] Joins and links

From
David Warnock
Date:
Bruce,

Thanks for your informative messages.

> Well, sometimes it is better to have control over when this happens.
> What is why cluster is nice, and we don't have time to add dynamic
> cluster right now.

I personally would not see it as a high priority.

My only reason for suggesting it was as a possible way to help provide
more performance for Leon without adding the full record number support
that he was asking for.

Dave

--
David Warnock
Sundayta Ltd

Re: [GENERAL] Joins and links

From
Bruce Momjian
Date:
> Bruce,
>
> Thanks for your informative messages.
>
> > Well, sometimes it is better to have control over when this happens.
> > What is why cluster is nice, and we don't have time to add dynamic
> > cluster right now.
>
> I personally would not see it as a high priority.
>
> My only reason for suggesting it was as a possible way to help provide
> more performance for Leon without adding the full record number support
> that he was asking for.
>

In fact, you were mentioning that inserting into the middle is slow, but
that sequential adding to the end is good, but in fact, heap already
does this, doesn't it?  I guess if you only add occasionally, it is OK.
Also, our no-over-write table structure had a tendency to mess up that
ordering because updated rows do not go into the same place as the
original row.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Joins and links

From
David Warnock
Date:
Leon,

I agree that the static CLUSTER that Postgresql currently supports will
not help you much. When I suggested looking for a CLUSTER type feature I
only knew of dynamic clustering that removed the need for an index.

The discussion is not going anywhere as static clustering will not help
you and dynamic clustering is not about to be added.



If you are interested in other solutions that do not involve adding
record number support (which I personally still feel to be a mistake in
a set orientated dbms) then have you considered an application server
linked to triggers.

For some applications is is mposible for an application server to
maintain the latest reports on-line, recalculating as required by a
trigger notifying it of relevant changes.
Then reporting comes instantly from the app server.

If there are a large number of different reports or the reports have a
lot of selections and options this may not be possible (but a half way
house might still be by using a slightly flattened table structure for
reporting).

Regards

Dave

--
David Warnock
Sundayta Ltd

Re: [GENERAL] Joins and links

From
David Warnock
Date:
Bruce,

It is amazing when you get responses written this fast (so that the
reponse arrives before the copy of the message from the list).

> In fact, you were mentioning that inserting into the middle is slow, but
> that sequential adding to the end is good,

Yes this is what I was told about the way MS SQL Server does clustering.

> but in fact, heap already does this, doesn't it?

heap? I am not sure what you mean.

> I guess if you only add occasionally, it is OK.
> Also, our no-over-write table structure had a tendency to mess up that
> ordering because updated rows do not go into the same place as the
> original row.

I have just been thinking a bit more and have realised that the
multi-generational architecture of 6.5 (which I have used in Interbase)
means that probably both clustering (in thr dynamic sense) and full
record number support as request by Leon are impractical.

It seems to me that record number relationships will fail completely if
there can be more than one version of a record. (well even if they are
forced to work they will lose some/all of their speed advantage).

Dynamically clustered indexes might still work but unless tables are
appended to only with no inserts or updates then maintainig the table in
index order when there can be multiple version of each row would be very
slow.

Dave

--
David Warnock
Sundayta Ltd

Re: [GENERAL] Joins and links

From
Bruce Momjian
Date:
> Bruce,
>
> It is amazing when you get responses written this fast (so that the
> reponse arrives before the copy of the message from the list).

Yep.

> > but in fact, heap already does this, doesn't it?
>
> heap? I am not sure what you mean.

Heap is our base table structure.  Records are always inserted on the
end of the heap file.  Vacuum removes old, superceeded rows.

>
> > I guess if you only add occasionally, it is OK.
> > Also, our no-over-write table structure had a tendency to mess up that
> > ordering because updated rows do not go into the same place as the
> > original row.
>
> I have just been thinking a bit more and have realised that the
> multi-generational architecture of 6.5 (which I have used in Interbase)
> means that probably both clustering (in thr dynamic sense) and full
> record number support as request by Leon are impractical.

Yes, would be a big problem.  Most commercial databases have found that
the network data model is impractical in most cases.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re[2]: [GENERAL] Joins and links

From
Leon
Date:
Hello David,

Monday, July 05, 1999 you wrote:

D> If you are interested in other solutions that do not involve adding
D> record number support (which I personally still feel to be a mistake in
D> a set orientated dbms)

Why? There will be no such field as "record number", the only
place where it can exist is the field which references another
table. I can quite share your feeling about wrongness of
physical-oriented things in abstract tables, but don't
plain old indices deal with physical record numbers? We could
do the same - hide the value stored in such field and only
offer the user ability to use it in queries without knowing
the value.

D>  then have you considered an application server
D> linked to triggers.

Unfortunately, every day user demands new types of reports
for financial analysis. And nobody knows what will be user's
wish tomorrow.

And, besides, it is not only my personal wish. What I am
proposing is huge (dozen-fold) performance gain on widespread
tasks. If you implement this, happy users will erect a gold
monument to Postgres development team.

Best regards, Leon



Re: Re[2]: [GENERAL] Joins and links

From
Bruce Momjian
Date:
> And, besides, it is not only my personal wish. What I am
> proposing is huge (dozen-fold) performance gain on widespread
> tasks. If you implement this, happy users will erect a gold
> monument to Postgres development team.

We(Vadim) did MVCC, and I haven't seen any monuments yet.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re[2]: [GENERAL] Joins and links

From
Leon
Date:
Hello Bruce,

Monday, July 05, 1999 you wrote:

>> I have just been thinking a bit more and have realised that the
>> multi-generational architecture of 6.5 (which I have used in Interbase)
>> means that probably both clustering (in thr dynamic sense) and full
>> record number support as request by Leon are impractical.

B> Yes, would be a big problem.  Most commercial databases have found that
B> the network data model is impractical in most cases.

That's exacly why such powerful tool as SQL is incomparably slower
than plain dbf in most cases. Ignoring network data model was
a big mistake.

Best regards, Leon



Re[2]: [GENERAL] Joins and links

From
Leon
Date:
Hello David,

Monday, July 05, 1999 you wrote:

D> I have just been thinking a bit more and have realised that the
D> multi-generational architecture of 6.5 (which I have used in Interbase)
D> means that probably both clustering (in thr dynamic sense) and full
D> record number support as request by Leon are impractical.

D> It seems to me that record number relationships will fail completely if
D> there can be more than one version of a record.

Maybe it is a silly question, but what are "more than one version
of a record"? In my opinion record is a atomic unique entity.
Isn't it?

D> (well even if they are
D> forced to work they will lose some/all of their speed advantage).


Best regards, Leon



Re: [GENERAL] Joins and links

From
Clark Evans
Date:
Leon wrote:
> Why? There will be no such field as "record number", the only
> place where it can exist is the field which references another
> table. I can quite share your feeling about wrongness of
> physical-oriented things in abstract tables, but don't
> plain old indices deal with physical record numbers? We could
> do the same - hide the value stored in such field and only
> offer the user ability to use it in queries without knowing
> the value.

Leon,

In my understanding, pointer based approaches like you
are recommending have been implemented in several prototype
objected oriented databases.  They have been shown to be
orders of magnitude slower than set oriented techniques,thus
many OO databases are implemented as wrappers over
relational systems!

In general, the best way to handle stuff like this for reports
is to cashe small tables which are joined (like product lookups)
in memory to make the queries run much faster.  To do this,
your design has to be smart, by seperating those tuples which
are "active" products from those "inactive" products so that
the database can cashe the active records and not the inactive
records.  Perhaps something like:

1.  CREATE VIEW PRODUCT AS ( SELECT * FROM PRODUCT_ACTIVE_CASHED
    UNION ALL SELECT * FROM PRODUCT_INACTIVE);

2.  SELECT ORDER_NO, PRODUCT_NAME FROM ORDER_LINE, PRODUCT WHERE
    PRODUCT.PRODUCT = ORDER_LINE.PRODUCT and ORDER_LINE.ORDER = 120;

Would be a general like solution, where orders with active
products are brought up quickly since the join is done
in memory, but orders with inactive products take much
longer, since the query on the active table is a cashe
miss, leaving a disk access on the inactive table.

Perhaps there are several other nicer ways do to this, from my
understanding a HASH based cashe could allow frequently accesed
tuples to be cahsed in memory?  ... anyway, I'm no expert.

A more traditional method (which I use all the time), is to
have canned reports that are pre-generated using common
conditions.  These are then saved on a web server and
updated daily.  It is a bit less accurate, but often for 99%
of the purposes, day old information is just fine....

Hope this helps!

;) Clark

Re[2]: [GENERAL] Joins and links

From
Leon
Date:
Hello Clark,

Monday, July 05, 1999 you wrote:

C> In my understanding, pointer based approaches like you
C> are recommending have been implemented in several prototype
C> objected oriented databases.  They have been shown to be
C> orders of magnitude slower than set oriented techniques,thus
C> many OO databases are implemented as wrappers over
C> relational systems!

I can't guess where you got such information. Contrarily,
I know at least one (commercial) network database server which
orders of magnitude faster than ANY SQL server. It simply no
match to them. That experience is exactly what made me write
to Postgres mailing list. As I wrote (maybe to hackers' list)
pointer lookup takes ideally three CPU commands - read,
multiply, lookup, whereas index scan takes dozens of them and
puts a strain on optimizer's intellectual abilities, and
as we have seen it can hardly choose the optimum way of
performing a join. In pointer-field case optimizer can be quite
dumb, because there is only one way to perform a query.

Best regards, Leon



Re: Re[2]: [GENERAL] Joins and links

From
Bruce Momjian
Date:
> Hello David,
>
> Monday, July 05, 1999 you wrote:
>
> D> I have just been thinking a bit more and have realised that the
> D> multi-generational architecture of 6.5 (which I have used in Interbase)
> D> means that probably both clustering (in thr dynamic sense) and full
> D> record number support as request by Leon are impractical.
>
> D> It seems to me that record number relationships will fail completely if
> D> there can be more than one version of a record.
>
> Maybe it is a silly question, but what are "more than one version
> of a record"? In my opinion record is a atomic unique entity.
> Isn't it?

Read how MVCC works in the manuals.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re[4]: [GENERAL] Joins and links

From
Leon
Date:
Hello Bruce,

Tuesday, July 06, 1999 you wrote:

>>
>> Maybe it is a silly question, but what are "more than one version
>> of a record"? In my opinion record is a atomic unique entity.
>> Isn't it?

B> Read how MVCC works in the manuals.

Ah, you mean MVCC! That's what I replied to Tom Lane:

> This problem can be solved. An offhand solution is to have
> an additional system field which will point to new tuple left after
> update. It is filled at the same time as the original tuple is
> marked invalid.  So the scenario is as follows: we follow the link,
> and if we find that in the tuple where we arrived this system field
> is not NULL, we go to (the same table of course) where it is pointing
> to. Sure VACUUM will eliminate these. Performance penalty is small.

Best regards, Leon



Re: [GENERAL] Joins and links

From
Chris Bitmead
Date:
Hi Leon,

In the long wars between the object databases, Versant which has logical
record ids, and ObjectStore which has physical record ids, Versant has
consistently beaten ObjectStore in the sort of queries that financial
people are likely to do. (On graphic design type apps, ObjectStore tends
to win).

Versants object ids actually go through an index to get to the physical
location. Admittedly a VERY highly optimised index, but an index
nevertheless.

What this says to me is that Postgres's concept of oids are ok as is. If
your queries are too slow either the optimiser is not using an index
when it should, or else the indexing mechanism is not fast enough. I
suspect it would be nice, from an object database perspective if a
special case oid-index index type was written.

Physical record ids, have lots of problems. You can't re-organise space
dynamically so you have to take your database off-line for a while to
totally re-organise it. You lose space because it can't be re-used
dynamically. There are problems with backup.

I'm writing a web page on what would be needed to make Postgres into an
Object database.....
http://www.tech.com.au/postgres

Leon wrote:
>
> Hello all!
>
>  You probably remember me - recently I complained about speed
>  of joins in Postgres. After a short investigation the way was
>  found in which Postgres's optimizer can do the right job. It
>  was constructive discussion. Now I want to tell you what could
>  make Postgres better and faster. And what will make us (our
>  development group) happy. Maybe I am bothering someone, if
>  I do - tell me that.
>
>  Let me begin.
>
>  First of all, some accounting reports need to be delivered
>  very fast - within minutes or so. And what's bad is that
>  quite a few of these reports are quite time-consuming and search
>  intensive. In particular, internals of these reports include
>  a lot of joins on tables.
>
>  Secondly, almost all of accounting information naturally
>  fits into network data model, which can be implemented very
>  efficiently.
>
>  This stuff described here is not accounting-specific, it
>  can be found in every database which uses master-detail
>  tables and other such types of relations.
>
>  So. How is join being performed in such cases? Although I am
>  not an expert, I can imagine the way: first there is an (index)
>  scan on first table, and then an (index) scan on the second.
>  It is the best way, reality could be much worse as we have seen.
>
>  How can we radically improve performance in such cases? There
>  is a simple and quite obvious way. (For you not to think that
>  I am hallucinating I will tell you that there exist some
>  real servers that offer such features I am talking about)
>  We should make a real reference in one table to another! That
>  means there could be special data type called, say, "link",
>  which is a physical record number in the foreign table.
>
>  Queries could look like this:
>
>  table1:
>  a int4
>  b link (->table2)
>
>  table2:
>  c int4
>  recnum (system auxiliary field, really a record number in the table)
>
>  select * from table2 where table1.a > 5 and table1.b = table2.recnum
>
>  Such joins can fly really fast, as practice shows :)
>  Just consider: the thing table1.b = table2.recnum is a READY-MADE
>  join, so server doesn't have to build anything on top of that. It
>  can simply perform lookup through link, and since it is a physical
>  record number, this is done with the efficiency of C pointers! Thus
>  performance gain is ENORMOUS.
>
>  And it simplifies the optimizer, because it doesn't have to decide
>  anything about whether to use indices and such like. The join is
>  performed always the same way, and it is the best way.
>
>  This feature, being implemented, could bring Postgres ahead
>  of most commercial servers, so proving creative abilities of
>  free software community. Let us make a step in the future!
>
> Best regards,
>  Leon

Re: [GENERAL] Joins and links

From
Vadim Mikheev
Date:
Leon wrote:
>
> Ah, you mean MVCC! That's what I replied to Tom Lane:
>
> > This problem can be solved. An offhand solution is to have
> > an additional system field which will point to new tuple left after
> > update. It is filled at the same time as the original tuple is
> > marked invalid.  So the scenario is as follows: we follow the link,
> > and if we find that in the tuple where we arrived this system field
> > is not NULL, we go to (the same table of course) where it is pointing
> > to. Sure VACUUM will eliminate these. Performance penalty is small.

Old tuple version points to new version right now -:).
O how else could we handle updates in read committed mode
(new tuple version are not visible to concurrent update).

Let's go to hackers list.
But also let's relax for some more time, Leon... -:)

Vadim