Thread: Firebird 1.0 released

Firebird 1.0 released

From

"Christopher Kings-Lynne"

Date:

16 April 2002, 00:35:21

The Firebird guys have gotten around to releasing 1.0.  If you read this
front page spiel, you'll notice that they use MVCC, but with an overwriting
storage manager.

http://www.ibphoenix.com/ibp_act_db.html

The relevant extract:

"Multi-version concurrency control uses back versions of modified and
deleted records to maintain a consistent view of data for read transactions.
Each record version is tagged with the identifier of the transaction that
created it. When a record is modified, the old version of the record is
reduced to a "delta record" - a set of differences from the new version -
and written to a new location, ordinarily on the same page where it was
originally stored. Then the new record overwrites the old. The new record
points to the old record. Unless the values of indexed fields are changed,
there's no need to update the index. Even if the values have changed, the
old values remain in the index to keep the record available to older
transactions.

The transaction identifier also permits update transactions to recognize
updates by concurrent transactions and allows Firebird to dispense with
write locks on records. When a transaction encounters a record updated by a
concurrent transaction, it waits for the other transaction to complete. If
the competing transaction commits, the waiting transaction gets an error. If
the competing transaction rolls back, the waiting transaction succeeds. If
the competing transaction attempts to update a record that the waiting
transaction has modified, a deadlock exists and one or the other will
receive an error.

Multi-version concurrency replaces a before-image (rollback) log with
versions stored in the database. When a transaction fails, its changes
remain in the database. The next transaction that reads that record
recognizes that the record version is invalid. Depending on the version of
Firebird and architecture, that transaction either replaces the invalid
record version with its back version or invokes a garbage collect thread. "

Chris

Re: Firebird 1.0 released

From

Tom Lane

Date:

16 April 2002, 01:11:26

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> The Firebird guys have gotten around to releasing 1.0.  If you read this
> front page spiel, you'll notice that they use MVCC, but with an overwriting
> storage manager.

Yup.  I've had a couple of long chats with Ann Harrison at the recent
"OSDB summit" meetings.  I think we each came away enlightened about the
other implementation, but not in any large hurry to change our own.

I did steal at least one idea from her, though.  (rummages in CVS logs)
ah, here's a hit:

2001-09-29 19:49  tgl
* src/backend/access/nbtree/nbtinsert.c: Tweak btree page splitlogic so that when splitting a page that is rightmost on
itstreelevel, we split 2/3 to the left and 1/3 to the new right page,rather than the even split we use elsewhere.  The
ideais that whenfaced with a steadily increasing series of inserted keys (such assequence or timestamp values), we'll
endup with a btree that'sabout 2/3ds full not 1/2 full, which is much closer to the desiredsteady-state load for a
btree.   Per suggestion from Ann Harrison ofIBPhoenix.

        regards, tom lane

Re: Firebird 1.0 released

From

Denis Perchine

Date:

16 April 2002, 07:08:37

Hi,

I was interested in this:
Firebird's indexes are very dense because they compress both the prefix and 
the suffix of each key. Suffix compression is simply the elimination of 
trailing blanks or zeros, depending on the data type. Suffix compression is 
performed on each segment of a segmented key. Prefix compression removes the 
leading bytes that match the previous key. Thus a duplicate value has no key 
stored at all. Dense storage in indexes minimizes the depth of the btrees, 
eliminating the advantage of other index types for most data.

Do we do this? How feasible is this?

On Tuesday 16 April 2002 00:35, Christopher Kings-Lynne wrote:
> The Firebird guys have gotten around to releasing 1.0.  If you read this
> front page spiel, you'll notice that they use MVCC, but with an overwriting
> storage manager.

--
Denis

Is this a better MVCC.

From

mlw

Date:

16 April 2002, 08:14:28

I just had an interesting idea. It sounds too easy to beleve, but hear me out
and correct me if I'm wrong.

Currently, during update, PostgreSQL takes the existing record, modifyies it,
and adds it as a new row. The previous record has a pointer to the new version.
If the row is updated twice, the original row is hit first, followed by the
next version, then the last version. Do I understand this correctly?

Now, what if we did it another way, copy the old version of the row into the
new row and update the tuple in place? (Space permitting, of course.) That way,
performance does not degrade over time, also Vacuum should be easier and less
combersome because it simply lops off the end of the list, and mark tuples
which are not in any transaction path.

Is this a lot of work, is it inherently wrong?

Re: Is this a better MVCC.

From

Tom Lane

Date:

16 April 2002, 09:15:44

mlw <markw@mohawksoft.com> writes:
> Now, what if we did it another way, copy the old version of the row into the
> new row and update the tuple in place?

I don't think we can get away with moving the extant tuple.  If we did,
a concurrent scan that should have found the old tuple might miss it.
(This is why VACUUM FULL needs exclusive lock to move tuples.)

It's fairly unclear whether this would actually buy any performance
gain, anyway.  In the case of a seqscan I don't see that it makes any
difference on average, and in the case of an indexscan what matters is
the index ordering not the physical location.  (In this connection,
btree indexes already do the "right thing", cf comments for
_bt_insertonpg.)
        regards, tom lane

Re: Firebird 1.0 released

From

Tom Lane

Date:

16 April 2002, 09:47:46

Denis Perchine <dyp@perchine.com> writes:
> I was interested in this:
> Firebird's indexes are very dense because they compress both the prefix and 
> the suffix of each key. Suffix compression is simply the elimination of 
> trailing blanks or zeros, depending on the data type. Suffix compression is 
> performed on each segment of a segmented key. Prefix compression removes the 
> leading bytes that match the previous key. Thus a duplicate value has no key 
> stored at all. Dense storage in indexes minimizes the depth of the btrees, 
> eliminating the advantage of other index types for most data.

> Do we do this? How feasible is this?

No, and it seems very bogus to me.  With a storage scheme like that,
you could not do a random-access search --- the only way to know the
true value of each key is if you are doing a linear scan of the page,
so that you can keep track of the previous key value by dead reckoning.
(This assumes that the first key on each page is stored in full;
otherwise it's even worse.)

Another problem is that key inserts and deletes get more complex since
you have to recompute the following tuple.  Life will get interesting
if the following tuple expands; you might have to split the page to hold
it.  (Hmm, is it possible for a *delete* to force a page split under the
Firebird scheme?  Not sure.)

The actual value of leading-byte suppression seems very data-dependent
to me, anyway.  For example, for an integer key column on a
little-endian machine, leading-byte suppression would buy you nothing at
all.  (Perhaps Firebird's implementation has enough datatype-specific
knowledge to trim integer keys at the proper end; I don't know.  But I
wouldn't want to see us try to push datatype dependencies into our index
access methods.)
        regards, tom lane

Re: Is this a better MVCC.

From

Lincoln Yeoh

Date:

16 April 2002, 10:48:54

On 7.1.x it definitely gets slower even for indexscans. e.g. 60 updates/sec 
dropping to 30 then to 20 over time.

Is this fixed for 7.2?

If not, is it possible to make the pointer point to the latest row instead 
of the most obsolete one, and having the newer rows point to the older 
ones, instead of the other way round (which seems to be happening with 
7.1)? I suppose this could make updates slower - have to update indexes? 
But selects would be faster (other than cases where there are a lot of 
uncommitted updates outstanding).

If that is not possible (or updating the index too painful), how about 
having the first pointer point to first row which then points to latest 
row, which then points to subsequent older rows. That way the miss penalty 
is reduced.

It seems reasonable to me that the newer rows should be more visible- 
unless more people update rows and then rollback rather than update and 
then commit.

I'm missing something out right? :)

Regards,
Link.

At 09:15 AM 4/16/02 -0400, Tom Lane wrote:
>mlw <markw@mohawksoft.com> writes:
> > Now, what if we did it another way, copy the old version of the row 
> into the
> > new row and update the tuple in place?
>
>I don't think we can get away with moving the extant tuple.  If we did,
>a concurrent scan that should have found the old tuple might miss it.
>(This is why VACUUM FULL needs exclusive lock to move tuples.)
>
>It's fairly unclear whether this would actually buy any performance
>gain, anyway.  In the case of a seqscan I don't see that it makes any
>difference on average, and in the case of an indexscan what matters is
>the index ordering not the physical location.  (In this connection,
>btree indexes already do the "right thing", cf comments for
>_bt_insertonpg.)

Re: Firebird 1.0 released

From

Richard Tucker

Date:

16 April 2002, 19:14:33

While it is true that you can't do binary searches on compressed indexes you
may get a large payoff with compressed indexes since the index fits in fewer
pages and so may be more efficiently cached in the buffer pool.  Even a
small reduction in io load may compensate for the higher computational
demands of a compressed index.

Note also that insertion of a key can never cause following entries to
increase in size, only remain the same or decrease.  Deletion of an entry
may cause the following entry to increase in size but never more than the
size of the entry deleted so deletes can't cause page splits.
-regards
richt

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Tuesday, April 16, 2002 9:48 AM
To: Denis Perchine
Cc: Hackers
Subject: Re: [HACKERS] Firebird 1.0 released

Denis Perchine <dyp@perchine.com> writes:
> I was interested in this:
> Firebird's indexes are very dense because they compress both the prefix
and
> the suffix of each key. Suffix compression is simply the elimination of
> trailing blanks or zeros, depending on the data type. Suffix compression
is
> performed on each segment of a segmented key. Prefix compression removes
the
> leading bytes that match the previous key. Thus a duplicate value has no
key
> stored at all. Dense storage in indexes minimizes the depth of the btrees,
> eliminating the advantage of other index types for most data.

> Do we do this? How feasible is this?

No, and it seems very bogus to me.  With a storage scheme like that,
you could not do a random-access search --- the only way to know the
true value of each key is if you are doing a linear scan of the page,
so that you can keep track of the previous key value by dead reckoning.
(This assumes that the first key on each page is stored in full;
otherwise it's even worse.)

Another problem is that key inserts and deletes get more complex since
you have to recompute the following tuple.  Life will get interesting
if the following tuple expands; you might have to split the page to hold
it.  (Hmm, is it possible for a *delete* to force a page split under the
Firebird scheme?  Not sure.)

The actual value of leading-byte suppression seems very data-dependent
to me, anyway.  For example, for an integer key column on a
little-endian machine, leading-byte suppression would buy you nothing at
all.  (Perhaps Firebird's implementation has enough datatype-specific
knowledge to trim integer keys at the proper end; I don't know.  But I
wouldn't want to see us try to push datatype dependencies into our index
access methods.)
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster