Thread: Musings

Musings

From
mlw
Date:
It is sunday morning and I have been musing about some PostgreSQL issues. As
some of you are aware, my dot com, dot died, and I am working on a business
plan for a consulting company which, amongst other things, will feature
PostgreSQL. As I am working on the various aspects, some issue pop up about
PostgreSQL.

Please don't take any of these personally, they are only my observations, if
you say they are non issues I would rather just accept that we disagree than
get into a nasty fight. They *are* issues to a corporate acceptance, I have
been challenged by IT people about them.

(1) Major version upgrade. This is a hard one, having to dump out and restore a
database to go from 7.1 to 7.2 or 7.2 to 7.3 is really a hard sell. If a
customer has a very large database, this represents a large amount of
down-time. If they are running on an operating system with file-size
limitations it is not an easy task. It also means that they have to have
additional storage which amount to at least a copy of the whole database.

(2) Update behavior, the least recently updated (LRU) tuple order in storage is
a problem. To have performance degrade as it does from updates is hard to
explain to a customer, and quite honestly, tells me I can not recommend
PostgreSQL for an environment in which the primary behavior is updating data. 

[Index] --> [Target]->[LRU]->[1]->[2]->[3]->[MRU]

update tbl set foo = x where bar = y

The most recently updated (MRU) tuple, becomes [4] and the new tuple becomes
the MRU tuple.

[Index] --> [Target]->[LRU]->[1]->[2]->[3]->[4]->[MRU]

The above represents what PostgreSQL seems to currently do. Correct me if I'm
wrong. (I would love to be wrong here.) If we break the list at the beginning
and put the MRU tuple right after the target tuple (target tuple is the one
which the index points to), say like this:

[Index] --> [Target]->[MRU]->[1]->[2]->[3]->[LRU]

update tbl set foo = x where bar = y

[Index] --> [Target]->[MRU]->[4]->[3]->[2]->[1]->[LRU]

Again, the MRU becomes [4] but, rather than scanning each obsolete tuple to
find the end, the target tuple's next value is the MRU. 

If updates and deletes could be handled this way, that would limit the update
and select performance degradation between vacuums.


Re: Musings

From
cbbrowne@cbbrowne.com
Date:
On Sun, 05 May 2002 10:01:57 EDT, the world broke into rejoicing as
mlw <markw@mohawksoft.com>  said:
> It is sunday morning and I have been musing about some PostgreSQL issues. As
> some of you are aware, my dot com, dot died, and I am working on a business
> plan for a consulting company which, amongst other things, will feature
> PostgreSQL. As I am working on the various aspects, some issue pop up about
> PostgreSQL.
> 
> Please don't take any of these personally, they are only my observations, if
> you say they are non issues I would rather just accept that we disagree than
> get into a nasty fight. They *are* issues to a corporate acceptance, I have
> been challenged by IT people about them.
> 
> (1) Major version upgrade. This is a hard one, having to dump out and
> restore a database to go from 7.1 to 7.2 or 7.2 to 7.3 is really a
> hard sell. If a customer has a very large database, this represents a
> large amount of down-time. If they are running on an operating system
> with file-size limitations it is not an easy task. It also means that
> they have to have additional storage which amount to at least a copy
> of the whole database.

All of these things are true, and what you should throw back at the IT
people is the question:
 "So what do you do when you upgrade from Oracle 7 to Oracle 8?  How  about the process of doing major Informix
upgrades? Sybase?  Does it  not involve some appreciable amounts of down-time?"
 

There may well be possible improvements to the PostgreSQL upgrade
process; "zero-downtime, zero-extra space upgrades" do not seem likely
to be amongst those things.

The last time I did an SAP upgrade, there were _five days_ of
down-time.  Not 15 minutes, not "none," but rather a figure rather close
to a week.

For the IT guys to have sour grapes over upgrades requiring some time
and disk space is unsurprising; for them to pretend it is only a problem
with PostgreSQL is just dishonest.
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://www.cbbrowne.com/info/
"Marketing Division, Sirius Cybernetics Corp: A bunch of mindless
jerks who'll be the first against the wall when the revolution comes."
-- The Hitchhiker's Guide to the Galaxy


Re: Musings

From
Gavin Sherry
Date:
On Sun, 5 May 2002 cbbrowne@cbbrowne.com wrote:

> On Sun, 05 May 2002 10:01:57 EDT, the world broke into rejoicing as
> mlw <markw@mohawksoft.com>  said:
> > It is sunday morning and I have been musing about some PostgreSQL issues. As
> > some of you are aware, my dot com, dot died, and I am working on a business
> > plan for a consulting company which, amongst other things, will feature
> > PostgreSQL. As I am working on the various aspects, some issue pop up about
> > PostgreSQL.
> > 
> > Please don't take any of these personally, they are only my observations, if
> > you say they are non issues I would rather just accept that we disagree than
> > get into a nasty fight. They *are* issues to a corporate acceptance, I have
> > been challenged by IT people about them.
> > 
> > (1) Major version upgrade. This is a hard one, having to dump out and
> > restore a database to go from 7.1 to 7.2 or 7.2 to 7.3 is really a
> > hard sell. If a customer has a very large database, this represents a
> > large amount of down-time. If they are running on an operating system
> > with file-size limitations it is not an easy task. It also means that
> > they have to have additional storage which amount to at least a copy
> > of the whole database.
> 
> All of these things are true, and what you should throw back at the IT
> people is the question:
> 
>   "So what do you do when you upgrade from Oracle 7 to Oracle 8?  How
>    about the process of doing major Informix upgrades?  Sybase?  Does it
>    not involve some appreciable amounts of down-time?"


This is most definately the wrong way of thinking about this. I'm not
saying that Mark sets a simple task, but the goals of Postgres should
never be limited to the other products out there. 

Gavin



Re: Musings

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
> [Index] --> [Target]->[LRU]->[1]->[2]->[3]->[MRU]

This diagram is entirely unrelated to reality.  See, eg,
http://archives.postgresql.org/pgsql-hackers/2002-05/msg00012.php
        regards, tom lane


Re: Musings

From
mlw
Date:
Tom Lane wrote:
> 
> mlw <markw@mohawksoft.com> writes:
> > [Index] --> [Target]->[LRU]->[1]->[2]->[3]->[MRU]
> 

RE: http://archives.postgresql.org/pgsql-hackers/2002-05/msg00030.php

There are a few variations, but it seems I am making the same assumptions as
Lincln Yeoh. So, you are saying that when a search for a specific tuple
happens, you have to hit every version of the tuple, no matter what? It isn't a
linked list?

I guess I don't understand. Why does it have to visit all of them? If ordering
them from newest tom oldest, and then take the first transaction ID that it
smaller then current transaction id, doesn't that work?


Re: Musings

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
> I guess I don't understand. Why does it have to visit all of them?

Because it doesn't have any way to know in advance which one(s) are
visible to it.

> If ordering
> them from newest tom oldest, and then take the first transaction ID that it
> smaller then current transaction id, doesn't that work?

No.  For starters, we couldn't guarantee that insertion order is the
same as transaction commit order.  Even if we did, your assumption
that commit order is the same as visibility is too simplistic.  And
none of this works if the index isn't unique.
        regards, tom lane


Re: Musings

From
Tom Lane
Date:
I said:
> none of this works if the index isn't unique.

On the other hand --- if the index *is* unique, and we are checking
equality on all columns (a fairly easily checked condition), then we
know we should retrieve at most one visible tuple.  So, without making
any incorrect assumptions, we could terminate the indexscan after the
first successful match.  Hmm ... you might be right that there's a
cheap win to be had there.  I still think that we also need to do
something with propagating tuple deadness flags into the index, though.
        regards, tom lane


Re: Musings

From
cbbrowne@cbbrowne.com
Date:
On Mon, 06 May 2002 00:50:25 +1000, the world broke into rejoicing as
Gavin Sherry <swm@linuxworld.com.au>  said:
> On Sun, 5 May 2002 cbbrowne@cbbrowne.com wrote:
> > On Sun, 05 May 2002 10:01:57 EDT, the world broke into rejoicing as
> > mlw <markw@mohawksoft.com>  said:
> > > It is sunday morning and I have been musing about some PostgreSQL issues. As
> > > some of you are aware, my dot com, dot died, and I am working on a business
> > > plan for a consulting company which, amongst other things, will feature
> > > PostgreSQL. As I am working on the various aspects, some issue pop up about
> > > PostgreSQL.
> > > 
> > > Please don't take any of these personally, they are only my observations, if
> > > you say they are non issues I would rather just accept that we disagree than
> > > get into a nasty fight. They *are* issues to a corporate acceptance, I have
> > > been challenged by IT people about them.
> > > 
> > > (1) Major version upgrade. This is a hard one, having to dump out and
> > > restore a database to go from 7.1 to 7.2 or 7.2 to 7.3 is really a
> > > hard sell. If a customer has a very large database, this represents a
> > > large amount of down-time. If they are running on an operating system
> > > with file-size limitations it is not an easy task. It also means that
> > > they have to have additional storage which amount to at least a copy
> > > of the whole database.
> > 
> > All of these things are true, and what you should throw back at the IT
> > people is the question:
> > 
> >   "So what do you do when you upgrade from Oracle 7 to Oracle 8?  How
> >    about the process of doing major Informix upgrades?  Sybase?  Does it
> >    not involve some appreciable amounts of down-time?"

> This is most definately the wrong way of thinking about this. I'm not
> saying that Mark sets a simple task, but the goals of Postgres should
> never be limited to the other products out there.

Apparently you decided to fire back an email before bothering to read
the paragraph that followed, which read:
 There may well be possible improvements to the PostgreSQL upgrade process; "zero-downtime, zero-extra space upgrades"
donot seem likely to be amongst those things.
 

Yes, there may well be improvements possible.  I'd think it unlikely
that they'd emerge today or tomorrow, and I think it's silly to assume
that all responses must necessarily be of a technical nature.

IT guys that are firing shots to the effect of "We expect zero time
upgrades" are more than likely playing some other agenda than merely
"we'd like instant upgrades."

For them to expect instant upgrades when _much_ more expensive systems
offer nothing of the sort suggests to me that the _true_ agenda has
nothing to do with upgrade time, and everything to do with FUD.

If that's the case, and I expect FUD is in play in this sort of
situation, then the purely technical response of "we might try that
someday" is a Dead Loss of an answer.

If they refuse to move from Oracle to PostgreSQL because PostgreSQL has
no "instant transparent upgrade" scheme as compared to Oracle, which
_also_ has no "instant transparent upgrade," then do you realistically
think that the lack of a "instant transparent upgrade" has ANYTHING to
do with the choice?

I'm merely suggesting that suitable questions head back to determine if
the question is an honest one, or if it's merely FUD.
--
(reverse (concatenate 'string "gro.gultn@" "enworbbc"))
http://www.cbbrowne.com/info/lisp.html
When man stands on toilet, man is high on pot. -Confucius


Re: Musings

From
mlw
Date:
Tom Lane wrote:
> No.  For starters, we couldn't guarantee that insertion order is the
> same as transaction commit order.  Even if we did, your assumption
> that commit order is the same as visibility is too simplistic.  And
> none of this works if the index isn't unique.

Ahh, I get it, (again, correct me if I am wrong) multiple references in a
non-unique index are handled the same way as multiple versions of the same
tuple. When an index entry is found, presumably, all the tuples are loaded, all
the unique "rows" are identified and the latest "visible" version of each of
them are returned.

I wonder, is there some way inexpensive ordering up front on updates can help
increase select performance? A very good problem indeed.


Re: Musings

From
Tom Lane
Date:
I said:
> On the other hand --- if the index *is* unique, and we are checking
> equality on all columns (a fairly easily checked condition), then we
> know we should retrieve at most one visible tuple.  So, without making
> any incorrect assumptions, we could terminate the indexscan after the
> first successful match.  Hmm ... you might be right that there's a
> cheap win to be had there.

I tried this out on a quick-hack basis of just teaching IndexNext to
terminate an indexscan once it's gotten a single visible tuple out of
a unique index.  It works pretty much as expected, but I didn't see any
noticeable improvement in pgbench speed.  Investigation with gprof
led to the following conclusions:

1. pgbench spends an awful lot of its time in _bt_check_unique, which
I hadn't touched.  (AFAICS it couldn't be sped up anyway with this
technique, since in the non-error case it won't find any visible
tuples.)  I think the only real hope for speeding up _bt_check_unique
is to mark dead index entries so that we can avoid repeated heap_fetches.

2. When I said that new index entries would be visited first because
they're inserted to the left of existing entries of the same key,
I forgot that that's only true when there's room for them there.
The comments in nbtinsert.c give a more complete picture:
*    NOTE: if the new key is equal to one or more existing keys, we can*    legitimately place it anywhere in the
seriesof equal keys --- in fact,*    if the new key is equal to the page's "high key" we can place it on*    the next
page.   If it is equal to the high key, and there's not room*    to insert the new tuple on the current page without
splitting,then*    we can move right hoping to find more free space and avoid a split.*    (We should not move right
indefinitely,however, since that leads to*    O(N^2) insertion behavior in the presence of many equal keys.)*    Once
wehave chosen the page to put the key on, we'll insert it before*    any existing equal keys because of the way
_bt_binsrch()works.
 

If we repeatedly update the same tuple (keeping its index key the same),
after awhile the first btree page containing that index key will fill
up, and subsequently we will tend to insert duplicate entries somewhere
in the middle of the multiple-page sequence of duplicates.  We could
guarantee that the newest tuple is visited first only if we were
prepared to split the first btree page in the above-described case,
rather than looking for subsequent pages with sufficient room to insert
the index entry.  This seems like a bad idea --- it would guarantee
inefficient space usage in the index, because as soon as we had a series
of duplicate keys spanning multiple pages, we would *never* attempt to
insert into the middle of that series, and thus freed space within the
series of pages would go forever unused.

So my conclusion is that this idea is probably worth doing, but it's not
earthshaking by itself.

The major difficulty with doing it in a non-hack fashion is that there
isn't a clean place to insert the logic.  index_getnext and subroutines
cannot apply the test, because they don't know anything about tuple
visibility (they don't get passed the snapshot being used).  So without
restructuring, we'd have to teach all the couple-dozen callers of
index_getnext what to do.

I have been thinking for awhile that we need to restructure the
indexscan API, however.  There is no good reason why index_getnext
*shouldn't* be responsible for time qual checking, and if it were
then it could correctly apply the one-returned-tuple rule.  Even
more important, it would then become possible for index_getnext to
also be the place that detects completely-dead tuples and notifies
the index AMs to mark those index entries as uninteresting.

Basically I'd like to make index_getnext have an API essentially the
same as heap_getnext.  There are a few callers that actually want
index_getnext's behavior (fetch index tuples but not heap tuples),
but we could create an alternative subroutine for them to use.

A more detailed proposal will follow by and by...
        regards, tom lane


Re: Musings

From
Thomas Lockhart
Date:
...
> (1) Major version upgrade. This is a hard one, having to dump out and restore a
> database to go from 7.1 to 7.2 or 7.2 to 7.3 is really a hard sell.

Hmm, maybe it would be more acceptable if we charged $40k per license,
but refunded $40k if you *want* to dump/reload. Gets that motivation
level up a bit... ;)
                    - Thomas


Re: Musings

From
Gavin Sherry
Date:
On Sun, 5 May 2002 cbbrowne@cbbrowne.com wrote:

> On Sun, 05 May 2002 10:01:57 EDT, the world broke into rejoicing as
> mlw <markw@mohawksoft.com>  said:
> > It is sunday morning and I have been musing about some PostgreSQL issues. As
> > some of you are aware, my dot com, dot died, and I am working on a business
> > plan for a consulting company which, amongst other things, will feature
> > PostgreSQL. As I am working on the various aspects, some issue pop up about
> > PostgreSQL.
> > 
> > Please don't take any of these personally, they are only my observations, if
> > you say they are non issues I would rather just accept that we disagree than
> > get into a nasty fight. They *are* issues to a corporate acceptance, I have
> > been challenged by IT people about them.
> > 
> > (1) Major version upgrade. This is a hard one, having to dump out and
> > restore a database to go from 7.1 to 7.2 or 7.2 to 7.3 is really a
> > hard sell. If a customer has a very large database, this represents a
> > large amount of down-time. If they are running on an operating system
> > with file-size limitations it is not an easy task. It also means that
> > they have to have additional storage which amount to at least a copy
> > of the whole database.
> 
> All of these things are true, and what you should throw back at the IT
> people is the question:
> 
>   "So what do you do when you upgrade from Oracle 7 to Oracle 8?  How
>    about the process of doing major Informix upgrades?  Sybase?  Does it
>    not involve some appreciable amounts of down-time?"


This is most definately the wrong way of thinking about this. I'm not
saying that Mark sets a simple task, but the goals of Postgres should
never be limited to the other products out there. 

Gavin