Thread: Let's drop two obsolete features which are bear-traps for novices

Let's drop two obsolete features which are bear-traps for novices

From
Josh Berkus
Date:
PostgreSQL has two bits of obsolete, incomplete functionality which
entrap and frustrate new users in large numbers.  Both of these features
share the following characteristics:

* added more than 10 years ago
* have the same names as useful features from other databases
* were never finished and lack critical functionality
* have not seen significant work in the last 4 releases

Every other day on IRC I run into a newbie who has used one of these
features under the mistaken impression that it is useful, and then had
to be guided in how to get their data out of this broken feature at some
length.  Unknown are the number of users who didn't ask for help but
simply chose to use a different database instead.

Of course, I'm talking about the MONEY type and hash indexes (not the
hash ops class, which is useful, just the index type).  It's time to put
both of these features out to pasture.  Certainly neither of theise
features would be accepted into PostgreSQL today given the shape they're in.

Having these broken features around is like leaving an armed bear-trap
in a public park.

Now, I know the first thing someone will do is jump up and claim that
they were just about to fix WAL-logging on hash indexes, or add casts to
the money type.  But if that hasn't happened in the last 5 years, it's
not going to happen.

We'd be doing our users a huge favor by just removing them in 9.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Of course, I'm talking about the MONEY type and hash indexes (not the
> hash ops class, which is useful, just the index type).  It's time to put
> both of these features out to pasture.  Certainly neither of theise
> features would be accepted into PostgreSQL today given the shape they're in.

I don't care one way or the other about the money type, but I will defend
hash indexes, especially seeing that we've already added a pretty
in-your-face warning as of 9.5:

regression=# create table foo(f1 int);
CREATE TABLE
regression=# create index on foo using hash (f1);
WARNING:  hash indexes are not WAL-logged and their use is discouraged
CREATE INDEX

> Now, I know the first thing someone will do is jump up and claim that
> they were just about to fix WAL-logging on hash indexes,

I don't know if/when that will happen as such, but Simon was making noises
about writing code to treat hash indexes as unlogged automatically, which
would more or less fix the worst risks.  That's not just a special case
for hash indexes, but any index AM that lacks WAL support, as third-party
AMs might well do.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Eric Ridge
Date:
On Fri, Oct 31, 2014 at 6:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't know if/when that will happen as such, but Simon was making noises
about writing code to treat hash indexes as unlogged automatically, which
would more or less fix the worst risks.  That's not just a special case
for hash indexes, but any index AM that lacks WAL support, as third-party
AMs might well do.

As someone writing a 3rd-party AM, literally right this moment, do you have a link to that thread?  While I follow this list fairly closely I don't remember seeing this.  I'd love to understand the thoughts around handling extension-based AMs.

eric

Re: Let's drop two obsolete features which are bear-traps for novices

From
Amit Langote
Date:
Hi,

On Sat, Nov 1, 2014 at 1:21 PM, Eric Ridge <eebbrr@gmail.com> wrote:
> On Fri, Oct 31, 2014 at 6:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> I don't know if/when that will happen as such, but Simon was making noises
>> about writing code to treat hash indexes as unlogged automatically, which
>> would more or less fix the worst risks.  That's not just a special case
>> for hash indexes, but any index AM that lacks WAL support, as third-party
>> AMs might well do.
>
>
> As someone writing a 3rd-party AM, literally right this moment, do you have
> a link to that thread?  While I follow this list fairly closely I don't
> remember seeing this.  I'd love to understand the thoughts around handling
> extension-based AMs.
>

You are looking for this:

http://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com

Thanks,
Amit



Re: Let's drop two obsolete features which are bear-traps for novices

From
Josh Berkus
Date:
On 10/31/2014 03:07 PM, Tom Lane wrote:
> I don't care one way or the other about the money type, but I will defend
> hash indexes, especially seeing that we've already added a pretty
> in-your-face warning as of 9.5:
> 
> regression=# create table foo(f1 int);
> CREATE TABLE
> regression=# create index on foo using hash (f1);
> WARNING:  hash indexes are not WAL-logged and their use is discouraged
> CREATE INDEX

Yes, and I'm arguing that is the wrong decision.  If hash indexes are
"discouraged", then they shouldn't be in core in the first place.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 10:18:03 -0700, Josh Berkus wrote:
> On 10/31/2014 03:07 PM, Tom Lane wrote:
> > I don't care one way or the other about the money type, but I will defend
> > hash indexes, especially seeing that we've already added a pretty
> > in-your-face warning as of 9.5:
> > 
> > regression=# create table foo(f1 int);
> > CREATE TABLE
> > regression=# create index on foo using hash (f1);
> > WARNING:  hash indexes are not WAL-logged and their use is discouraged
> > CREATE INDEX
> 
> Yes, and I'm arguing that is the wrong decision.  If hash indexes are
> "discouraged", then they shouldn't be in core in the first place.

Last time we discussed it there were people (IIRC Andrew was one of
them) commenting that they use hash indexes *precisely* because they're
not WAL logged and that they can live with the dangers that creates. I
don't think that's sufficient justification for introducing the feature
at all. But it's nothing new that removing a feature has to fit quite
different criteria than adding one.

So, by that argument we could remove hash indexes once we have unlogged
indexes on logged tables. But then there's no need to remove them
anymore...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 10/31/2014 03:07 PM, Tom Lane wrote:
>> I don't care one way or the other about the money type, but I will defend
>> hash indexes, especially seeing that we've already added a pretty
>> in-your-face warning as of 9.5:
>> 
>> regression=# create table foo(f1 int);
>> CREATE TABLE
>> regression=# create index on foo using hash (f1);
>> WARNING:  hash indexes are not WAL-logged and their use is discouraged
>> CREATE INDEX

> Yes, and I'm arguing that is the wrong decision.  If hash indexes are
> "discouraged", then they shouldn't be in core in the first place.

There's an awful lot of stuff in core that probably shouldn't be there,
but we've not made a move to rip it out, especially not if there wasn't
a way to make it an external module.

In the case of hash indexes, because we still have to have the hash
opclasses in core, there's no way that it could be pushed out as an
extension module even if we otherwise had full support for AMs as
extensions.  So what I hear you proposing is "let's break this so
thoroughly that it *can't* be fixed".  I'm not on board with that.
I think the WARNING will do just fine to discourage novices who are
not familiar with the state of the hash AM.  In the meantime, we
could push forward with the idea of making hash indexes automatically
unlogged, so that recovering from a crash wouldn't be quite so messy/
dangerous.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-01 10:18:03 -0700, Josh Berkus wrote:
>> Yes, and I'm arguing that is the wrong decision.  If hash indexes are
>> "discouraged", then they shouldn't be in core in the first place.

> Last time we discussed it there were people (IIRC Andrew was one of
> them) commenting that they use hash indexes *precisely* because they're
> not WAL logged and that they can live with the dangers that creates. I
> don't think that's sufficient justification for introducing the feature
> at all. But it's nothing new that removing a feature has to fit quite
> different criteria than adding one.

> So, by that argument we could remove hash indexes once we have unlogged
> indexes on logged tables. But then there's no need to remove them
> anymore...

Yeah.  When we last discussed this, the difficulty was around how to make
that combination work.  An unlogged index on an unlogged table is no
problem: the init-fork mechanism is able to make them both go to empty
after a crash.  But for an unlogged index on a logged table, just making
the index go to empty is not the correct recovery action.

We'd been wondering how to make crash recovery change the index's pg_index
entry into not indisvalid/indisready status.  That's hard to do.  But
maybe we don't have to.  What about having the init-fork mechanism restore
a hash index into a state with the metapage marked as invalid?  hashinsert
etc could simply do nothing when they see this metapage state.
hashgettuple could throw an error saying the index isn't usable until it's
been REINDEXed.

This is not quite as nice as an indisvalid-based solution, because it
requires some extra cooperation from the index AM's code.  But setting up
an init fork requires work from the index AM anyway, so that objection
doesn't seem terribly strong to me.

Thoughts?
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 13:58:02 -0400, Tom Lane wrote:
> Yeah.  When we last discussed this, the difficulty was around how to make
> that combination work.  An unlogged index on an unlogged table is no
> problem: the init-fork mechanism is able to make them both go to empty
> after a crash.  But for an unlogged index on a logged table, just making
> the index go to empty is not the correct recovery action.

Just removing the relfilenode would probably be better than what we have
today. It sure is pretty unsatisfactory to get a ERROR after a
crash/immediate restart, but it's far better than getting wrong results
back (or even crash).

> We'd been wondering how to make crash recovery change the index's pg_index
> entry into not indisvalid/indisready status.  That's hard to do.

It really is hard.

What I've been wondering about whether we should be boring and do it on
backend startup. Roughly similar to the way the relcache init file is
handled. The first backend that starts up for a certain database* scans
pg_index and rechecks all unlogged indexes of logged tables and marks
them as invalid.

* There's a couple ways we could detect that. The most trivial is to
just have a 'initialized' marker file in the database whose presence we
test on startup. That's probably acceptable costwise in comparison to
all the other stuff happening at backend init. We could get more fancy,
but it's probably not needed.

> But
> maybe we don't have to.  What about having the init-fork mechanism restore
> a hash index into a state with the metapage marked as invalid?  hashinsert
> etc could simply do nothing when they see this metapage state.
> hashgettuple could throw an error saying the index isn't usable until it's
> been REINDEXed.

> This is not quite as nice as an indisvalid-based solution, because it
> requires some extra cooperation from the index AM's code.  But setting up
> an init fork requires work from the index AM anyway, so that objection
> doesn't seem terribly strong to me.

The most annoying thing I see with that kind of approach is that we'd
need to read the metapage pretty early during planning. The second most
annoying is that it's harder to for the user to detect which indexes are
in that state - we'd probably need to provide a SQL level function to
detect it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/01/2014 01:26 PM, Andres Freund wrote:
> On 2014-11-01 10:18:03 -0700, Josh Berkus wrote:
>> On 10/31/2014 03:07 PM, Tom Lane wrote:
>>> I don't care one way or the other about the money type, but I will defend
>>> hash indexes, especially seeing that we've already added a pretty
>>> in-your-face warning as of 9.5:
>>>
>>> regression=# create table foo(f1 int);
>>> CREATE TABLE
>>> regression=# create index on foo using hash (f1);
>>> WARNING:  hash indexes are not WAL-logged and their use is discouraged
>>> CREATE INDEX
>> Yes, and I'm arguing that is the wrong decision.  If hash indexes are
>> "discouraged", then they shouldn't be in core in the first place.
> Last time we discussed it there were people (IIRC Andrew was one of
> them) commenting that they use hash indexes *precisely* because they're
> not WAL logged and that they can live with the dangers that creates. I
> don't think that's sufficient justification for introducing the feature
> at all. But it's nothing new that removing a feature has to fit quite
> different criteria than adding one.
>
> So, by that argument we could remove hash indexes once we have unlogged
> indexes on logged tables. But then there's no need to remove them
> anymore...
>



Yes, although there might not be much reason to use them either. I think 
Tom's first step of making hash indexes automatically unlogged makes 
sense. Longer term I'd like to see unlogged as an option for all AMs - 
especially btree. It makes plenty of sense to me to be able to make the 
data resilient even if the indexes, which can after all be recreated in 
the unlikely event of a crash, are not.

cheers

andrew




Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 14:13:02 -0400, Andrew Dunstan wrote:
> Yes, although there might not be much reason to use them either. I think
> Tom's first step of making hash indexes automatically unlogged makes sense.
> Longer term I'd like to see unlogged as an option for all AMs - especially
> btree.

If we implement unlogged indexes, it should definitely work for all
builtin indexes.

> It makes plenty of sense to me to be able to make the data resilient
> even if the indexes, which can after all be recreated in the unlikely event
> of a crash, are not.

While I agree that it can be useful, I think the fact that replicas/HS
can't use such index makes them far less useful.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/01/2014 01:58 PM, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2014-11-01 10:18:03 -0700, Josh Berkus wrote:
>>> Yes, and I'm arguing that is the wrong decision.  If hash indexes are
>>> "discouraged", then they shouldn't be in core in the first place.
>> Last time we discussed it there were people (IIRC Andrew was one of
>> them) commenting that they use hash indexes *precisely* because they're
>> not WAL logged and that they can live with the dangers that creates. I
>> don't think that's sufficient justification for introducing the feature
>> at all. But it's nothing new that removing a feature has to fit quite
>> different criteria than adding one.
>> So, by that argument we could remove hash indexes once we have unlogged
>> indexes on logged tables. But then there's no need to remove them
>> anymore...
> Yeah.  When we last discussed this, the difficulty was around how to make
> that combination work.  An unlogged index on an unlogged table is no
> problem: the init-fork mechanism is able to make them both go to empty
> after a crash.  But for an unlogged index on a logged table, just making
> the index go to empty is not the correct recovery action.
>
> We'd been wondering how to make crash recovery change the index's pg_index
> entry into not indisvalid/indisready status.  That's hard to do.  But
> maybe we don't have to.  What about having the init-fork mechanism restore
> a hash index into a state with the metapage marked as invalid?  hashinsert
> etc could simply do nothing when they see this metapage state.
> hashgettuple could throw an error saying the index isn't usable until it's
> been REINDEXed.
>
> This is not quite as nice as an indisvalid-based solution, because it
> requires some extra cooperation from the index AM's code.  But setting up
> an init fork requires work from the index AM anyway, so that objection
> doesn't seem terribly strong to me.
>
> Thoughts?



Isn't the planner still going to try to use the index in that case? If 
it's not then I'd be OK with it, but if it's going to make the table 
largely unusable until it's reindexed that would be rather sad.

cheers

andrew



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-01 13:58:02 -0400, Tom Lane wrote:
>> maybe we don't have to.  What about having the init-fork mechanism restore
>> a hash index into a state with the metapage marked as invalid?  hashinsert
>> etc could simply do nothing when they see this metapage state.
>> hashgettuple could throw an error saying the index isn't usable until it's
>> been REINDEXed.

> The most annoying thing I see with that kind of approach is that we'd
> need to read the metapage pretty early during planning.

No, I was specifically *not* proposing that.  What I proposed was if the
planner picks the index for use in a query, you get an error.

Yeah, if we were trying to duplicate the behavior of indisvalid, there'd
need to be a way to detect the invalid index at plan time and not use it.
But I'm not sure that that's actually an improvement from the user's
standpoint: what they'd see is queries suddenly, and silently, performing
a lot worse than they expect.  An explicit complaint about the necessary
REINDEX seems more user-friendly from where I sit.

However, if the consensus is that silently ignoring the index is the best
behavior, I would not be too concerned about the cost of checking the
metapage to see if the index is valid.  A hash index's metapage would tend
to stay pinned in shared buffers anyway, because it's necessarily touched
on each use of the index.  If that opinion proves wrong, the AM could take
steps to cache the state in the index's relcache entry (btree already
maintains similar cached state).

> The second most
> annoying is that it's harder to for the user to detect which indexes are
> in that state - we'd probably need to provide a SQL level function to
> detect it.

Sure, we could do that.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 14:19:22 -0400, Andrew Dunstan wrote:
> Isn't the planner still going to try to use the index in that case? If it's
> not then I'd be OK with it, but if it's going to make the table largely
> unusable until it's reindexed that would be rather sad.

Both the planner (for querying) and the executor (to avoid inserting
tuples into the index) would have to query the state of such indexes. I
don't think it can reasonably work otherwise.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-01 14:19:22 -0400, Andrew Dunstan wrote:
>> Isn't the planner still going to try to use the index in that case? If it's
>> not then I'd be OK with it, but if it's going to make the table largely
>> unusable until it's reindexed that would be rather sad.

> Both the planner (for querying) and the executor (to avoid inserting
> tuples into the index) would have to query the state of such indexes. I
> don't think it can reasonably work otherwise.

The executor doesn't need to know anything, since the AM can trivially
make aminsert be a no-op if the index is internally invalid.  The planner
only needs to know something if we think that silently being slow for a
query meant to search the index is better than throwing an error reminding
the user that the index needs to be reindexed.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 14:23:07 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-11-01 13:58:02 -0400, Tom Lane wrote:
> >> maybe we don't have to.  What about having the init-fork mechanism restore
> >> a hash index into a state with the metapage marked as invalid?  hashinsert
> >> etc could simply do nothing when they see this metapage state.
> >> hashgettuple could throw an error saying the index isn't usable until it's
> >> been REINDEXed.
> 
> > The most annoying thing I see with that kind of approach is that we'd
> > need to read the metapage pretty early during planning.
> 
> No, I was specifically *not* proposing that.  What I proposed was if the
> planner picks the index for use in a query, you get an error.

Ugh. I think that'll be pretty ugly. At the very least we'd need to
provide a way to mark such indexes as 'actually invalid' in the sense of
indisready. It'll not only block queries that sensibly would end up
using that index, but also all modifications - making the relation
essentially read only.

If we just want this, we can just remove the main fork of such relations
on startup, and not bother with index specific stuff.

> Yeah, if we were trying to duplicate the behavior of indisvalid, there'd
> need to be a way to detect the invalid index at plan time and not use it.
> But I'm not sure that that's actually an improvement from the user's
> standpoint: what they'd see is queries suddenly, and silently, performing
> a lot worse than they expect.  An explicit complaint about the necessary
> REINDEX seems more user-friendly from where I sit.

A REINDEX is imo unlikely to be acceptable. It takes long (why would you
bother on a small table?) and locks the relation/indexes.

> However, if the consensus is that silently ignoring the index is the best
> behavior, I would not be too concerned about the cost of checking the
> metapage to see if the index is valid.  A hash index's metapage would tend
> to stay pinned in shared buffers anyway, because it's necessarily touched
> on each use of the index.  If that opinion proves wrong, the AM could take
> steps to cache the state in the index's relcache entry (btree already
> maintains similar cached state).

We also could just put it in the generic relcache code...

Yea.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> A REINDEX is imo unlikely to be acceptable. It takes long (why would you
> bother on a small table?) and locks the relation/indexes.

I think the goalposts just took a vacation to Acapulco.

What exactly do you think is going to make a crashed unlogged index valid
again without a REINDEX?  Certainly the people who are currently using
hash indexes in the way Andrew describes are expecting to have to REINDEX
them after a crash.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 14:39:21 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > A REINDEX is imo unlikely to be acceptable. It takes long (why would you
> > bother on a small table?) and locks the relation/indexes.
> 
> I think the goalposts just took a vacation to Acapulco.

I think that might be caused by a misunderstanding.

> What exactly do you think is going to make a crashed unlogged index valid
> again without a REINDEX?  Certainly the people who are currently using
> hash indexes in the way Andrew describes are expecting to have to REINDEX
> them after a crash.

Obviously that individual index needs to be recreated. What I mean is
that I don't think it'll be acceptable that the table essentially can't
be queried before that's done. The situations in which I'd found
unlogged indexes useful is where there's some indexes are critical for
the OLTP business (those would continue to be logged), but some other
large ones are for things that aren't absolutely essential. Reports and
such.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/01/2014 02:34 PM, Andres Freund wrote:
>> Yeah, if we were trying to duplicate the behavior of indisvalid, there'd
>> need to be a way to detect the invalid index at plan time and not use it.
>> But I'm not sure that that's actually an improvement from the user's
>> standpoint: what they'd see is queries suddenly, and silently, performing
>> a lot worse than they expect.  An explicit complaint about the necessary
>> REINDEX seems more user-friendly from where I sit.
> A REINDEX is imo unlikely to be acceptable. It takes long (why would you
> bother on a small table?) and locks the relation/indexes.


It's a bit of a pity we don't have REINDEX CONCURRENTLY.


cheers

andrew



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-01 14:39:21 -0400, Tom Lane wrote:
>> What exactly do you think is going to make a crashed unlogged index valid
>> again without a REINDEX?  Certainly the people who are currently using
>> hash indexes in the way Andrew describes are expecting to have to REINDEX
>> them after a crash.

> Obviously that individual index needs to be recreated. What I mean is
> that I don't think it'll be acceptable that the table essentially can't
> be queried before that's done. The situations in which I'd found
> unlogged indexes useful is where there's some indexes are critical for
> the OLTP business (those would continue to be logged), but some other
> large ones are for things that aren't absolutely essential. Reports and
> such.

Sure.  And as long as you aren't issuing queries that would want to scan
the crashed index, it won't matter either way.  The question is whether
you'd rather that your "inessential reporting queries" fail without the
broken index, or that they take extreme amounts of time/resources.
I don't think it's obvious that the first alternative is bad.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 14:45:47 -0400, Andrew Dunstan wrote:
> 
> On 11/01/2014 02:34 PM, Andres Freund wrote:
> >>Yeah, if we were trying to duplicate the behavior of indisvalid, there'd
> >>need to be a way to detect the invalid index at plan time and not use it.
> >>But I'm not sure that that's actually an improvement from the user's
> >>standpoint: what they'd see is queries suddenly, and silently, performing
> >>a lot worse than they expect.  An explicit complaint about the necessary
> >>REINDEX seems more user-friendly from where I sit.
> >A REINDEX is imo unlikely to be acceptable. It takes long (why would you
> >bother on a small table?) and locks the relation/indexes.
> 
> 
> It's a bit of a pity we don't have REINDEX CONCURRENTLY.

We essentially don't have it because people opined towards the end of
9.4 that a brief (as in two pg_class updates) AccessExclusive lock
window makes the feature moot. I still think that's quite heavily
disregarding the practial reality.
Luckily opinion seems to have shifted a bit again.

It'd also be really helpful if REINDEX CONCURRENTLY had a way to only
reindex invalid indexes. But that probably is just a smop.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> It's a bit of a pity we don't have REINDEX CONCURRENTLY.

Well, that's an entirely different thread.  Let's not get distracted
by that topic.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 14:48:20 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-11-01 14:39:21 -0400, Tom Lane wrote:
> >> What exactly do you think is going to make a crashed unlogged index valid
> >> again without a REINDEX?  Certainly the people who are currently using
> >> hash indexes in the way Andrew describes are expecting to have to REINDEX
> >> them after a crash.
> 
> > Obviously that individual index needs to be recreated. What I mean is
> > that I don't think it'll be acceptable that the table essentially can't
> > be queried before that's done. The situations in which I'd found
> > unlogged indexes useful is where there's some indexes are critical for
> > the OLTP business (those would continue to be logged), but some other
> > large ones are for things that aren't absolutely essential. Reports and
> > such.
> 
> Sure.  And as long as you aren't issuing queries that would want to scan
> the crashed index, it won't matter either way.  The question is whether
> you'd rather that your "inessential reporting queries" fail without the
> broken index, or that they take extreme amounts of time/resources.
> I don't think it's obvious that the first alternative is bad.

In some of these cases the unlogged index would still be used for a
subset of the OLTP workload, e.g. because they're smaller. We e.g. have
a client that has smaller (as in 50GB instead of 600GB) indexes for rows
of a certain type in the table, but also one that spans the whole thing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/01/2014 02:39 PM, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> A REINDEX is imo unlikely to be acceptable. It takes long (why would you
>> bother on a small table?) and locks the relation/indexes.
> I think the goalposts just took a vacation to Acapulco.
>
> What exactly do you think is going to make a crashed unlogged index valid
> again without a REINDEX?  Certainly the people who are currently using
> hash indexes in the way Andrew describes are expecting to have to REINDEX
> them after a crash.
>
>             


That's certainly true. They were warned of the risks and found them 
acceptable.

The real question here is whether the table should continue to be usable 
in a degraded state until it's reindexed.

cheers

andrew



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 14:56:35 -0400, Andrew Dunstan wrote:
> 
> On 11/01/2014 02:39 PM, Tom Lane wrote:
> >Andres Freund <andres@2ndquadrant.com> writes:
> >>A REINDEX is imo unlikely to be acceptable. It takes long (why would you
> >>bother on a small table?) and locks the relation/indexes.
> >I think the goalposts just took a vacation to Acapulco.
> >
> >What exactly do you think is going to make a crashed unlogged index valid
> >again without a REINDEX?  Certainly the people who are currently using
> >hash indexes in the way Andrew describes are expecting to have to REINDEX
> >them after a crash.

> That's certainly true. They were warned of the risks and found them
> acceptable.
> 
> The real question here is whether the table should continue to be usable in
> a degraded state until it's reindexed.

One argument in that direction imo is HS. We certainly would just
generally ignore unlogged indexes for querying while InRecovery, right?
Because otherwise HS would become pretty useless. And I think it'd be
pretty wierd if things worked on HS and not on the primary (or the HS
after promotion).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-01 14:48:20 -0400, Tom Lane wrote:
>> Sure.  And as long as you aren't issuing queries that would want to scan
>> the crashed index, it won't matter either way.  The question is whether
>> you'd rather that your "inessential reporting queries" fail without the
>> broken index, or that they take extreme amounts of time/resources.
>> I don't think it's obvious that the first alternative is bad.

> In some of these cases the unlogged index would still be used for a
> subset of the OLTP workload, e.g. because they're smaller. We e.g. have
> a client that has smaller (as in 50GB instead of 600GB) indexes for rows
> of a certain type in the table, but also one that spans the whole thing.

Meh.  There are no existing use-cases where anyone would dare use a hash
index in that way, because they couldn't be sure it would be valid.

However, it seems like we're arguing over not much.  We'd certainly need a
new AM entry point to test whether the index is internally valid, so as to
support a SQL-level function that can report that.  This argument boils
down to whether or not the planner should expend a few cycles to call
that, which in the end is unlikely to cost enough to notice (especially if
the AM arranges to cache that state in indexes' relcache entries).
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> The real question here is whether the table should continue to be usable 
> in a degraded state until it's reindexed.

It certainly will be, as long as your notion of "usable in a degraded
state" doesn't include issuing queries that would prefer to use the broken
index.  The discussion is about what exactly should happen if you do that.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> One argument in that direction imo is HS. We certainly would just
> generally ignore unlogged indexes for querying while InRecovery, right?
> Because otherwise HS would become pretty useless. And I think it'd be
> pretty wierd if things worked on HS and not on the primary (or the HS
> after promotion).

I don't see how HS has anything to do with this discussion.  We would
certainly have the index marked as unlogged in the catalogs, and we
would therefore not use it while InRecovery.  Once you promote a
slave to live, it would be in the same state as a post-crash-recovery
master.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 15:11:40 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > One argument in that direction imo is HS. We certainly would just
> > generally ignore unlogged indexes for querying while InRecovery, right?
> > Because otherwise HS would become pretty useless. And I think it'd be
> > pretty wierd if things worked on HS and not on the primary (or the HS
> > after promotion).
> 
> I don't see how HS has anything to do with this discussion.  We would
> certainly have the index marked as unlogged in the catalogs, and we
> would therefore not use it while InRecovery.

Consider:
SELECT * FROM tbl WHERE active AND value = $1;
that can be satisfied by two indexes. One on (value), and an unlogged
index on (value) WHERE active. While in HS only the logged one will be
used.  But if we don't silently ignore invalid unlogged indexes, hell
will break loose after promition because suddenly the predicated index
will be used in plans.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-01 15:11:40 -0400, Tom Lane wrote:
>> I don't see how HS has anything to do with this discussion.

> Consider:
> SELECT * FROM tbl WHERE active AND value = $1;
> that can be satisfied by two indexes. One on (value), and an unlogged
> index on (value) WHERE active. While in HS only the logged one will be
> used.  But if we don't silently ignore invalid unlogged indexes, hell
> will break loose after promition because suddenly the predicated index
> will be used in plans.

This example is utterly unrelated to HS: the same hazard would exist
on the master after a crash, with or without HS.

With current usage of hash indexes, nobody would ever construct such an
arrangement, or at least they would not expect to be able to start up
querying without having reindexed first.  My proposal to throw errors
was based on the assumption that unlogged indexes would only be used to
support a readily identified subset of inessential queries.  If we think
examples like the above are things people would actually want to do, then
yeah we would want to silently skip broken indexes.  I personally find
this example less than compelling and think it's far more likely that
we'll get complaints about the system *not* complaining when a broken
index would be needed.  However, it's only one line of code either way,
so I'm not going to argue about it further.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 15:33:27 -0400, Tom Lane wrote:
> With current usage of hash indexes, nobody would ever construct such an
> arrangement

Do you think this should only be implemented for hash indexes? I'd think
we'd do it for all existing index AMs?

I'm not all that excited about unlogged hash indexes. Yes, that'll
remove a annoying hazard, but I probably won't use them anyway. I am
somewhat excited about the more general unlogged indexes feature.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-01 15:33:27 -0400, Tom Lane wrote:
>> With current usage of hash indexes, nobody would ever construct such an
>> arrangement

> Do you think this should only be implemented for hash indexes? I'd think
> we'd do it for all existing index AMs?

I think we should eventually get to that point, perhaps.  I'd want to do
it first for hash, since that's what direly needs it.  I can see the value
of doing it for btree as well, but I'm less sure that the other index
types would have enough usage to justify the work.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Peter Geoghegan
Date:
On Sat, Nov 1, 2014 at 12:40 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> I'm not all that excited about unlogged hash indexes. Yes, that'll
> remove a annoying hazard, but I probably won't use them anyway. I am
> somewhat excited about the more general unlogged indexes feature.

I strongly agree.


-- 
Peter Geoghegan



Re: Let's drop two obsolete features which are bear-traps for novices

From
Josh Berkus
Date:
All,

While there's argument about hash indexes, it looks like nobody minds if
the MONEY type goes bye-bye.  So, time for a patch ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Let's drop two obsolete features which are bear-traps for novices

From
David Rowley
Date:
On Sun, Nov 2, 2014 at 12:59 PM, Josh Berkus <josh@agliodbs.com> wrote:
All,

While there's argument about hash indexes, it looks like nobody minds if
the MONEY type goes bye-bye.  So, time for a patch ...


Will the patch move the feature to a contrib module? 

Or will our release notes state that all MONEY columns must be changed to an appropriate NUMERIC type pre-upgrade?

Regards

David Rowley

Re: Let's drop two obsolete features which are bear-traps for novices

From
Andres Freund
Date:
On 2014-11-01 16:59:35 -0700, Josh Berkus wrote:
> All,
> 
> While there's argument about hash indexes, it looks like nobody minds if
> the MONEY type goes bye-bye.  So, time for a patch ...

FWIW there have been somewhat recent patches for money and it was
undeprecated not too long ago. So apparently there's users for it out
there. I personally would never use money, but I doubt the gain is worth
the cost of breaking existing setups without a warning period.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Jaime Casanova
Date:
<p dir="ltr"><br /> El nov 2, 2014 7:54 AM, "Andres Freund" <<a
href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>escribió:<br /> ><br /> > On 2014-11-01
16:59:35-0700, Josh Berkus wrote:<br /> > > All,<br /> > ><br /> > > While there's argument about
hashindexes, it looks like nobody minds if<br /> > > the MONEY type goes bye-bye.  So, time for a patch ...<br />
><br/> > FWIW there have been somewhat recent patches for money and it was<br /> > undeprecated not too long
ago.So apparently there's users for it out<br /> > there. I personally would never use money, but I doubt the gain
isworth<br /> > the cost of breaking existing setups without a warning period.<br /> ><p dir="ltr">Not knowing
howdifficult it could be maybe a fair compromise is to move MONEY datatype to a contrib. And documenting its
limitations.<pdir="ltr">--<br /> Jaime Casanova<br /> 2ndQuadrant Consultant<br /> PostgreSQL Development, 24x7
Support,Training & Services 

Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/02/2014 10:01 AM, Jaime Casanova wrote:
>
>
> El nov 2, 2014 7:54 AM, "Andres Freund" <andres@2ndquadrant.com
> <mailto:andres@2ndquadrant.com>> escribió:
> >
> > On 2014-11-01 16:59:35 -0700, Josh Berkus wrote:
> > > All,
> > >
> > > While there's argument about hash indexes, it looks like nobody
> minds if
> > > the MONEY type goes bye-bye.  So, time for a patch ...
> >
> > FWIW there have been somewhat recent patches for money and it was
> > undeprecated not too long ago. So apparently there's users for it out
> > there. I personally would never use money, but I doubt the gain is worth
> > the cost of breaking existing setups without a warning period.
> >
>
> Not knowing how difficult it could be maybe a fair compromise is to
> move MONEY datatype to a contrib. And documenting its limitations.
>
>


That's pretty much dead in the water, I think. Builtin types and
functions have Oid values in different ranges from non-builtin types
such as those in contrib. It's one reason we have no chance of bringing
hstore into core as people have previously asked for. And for the same
reason I think moving a core type out to contrib won't work. In any
case, contrib shouldn't be a rubbish heap of old deprecated features.

cheers

andrew



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 11/02/2014 10:01 AM, Jaime Casanova wrote:
>> Not knowing how difficult it could be maybe a fair compromise is to 
>> move MONEY datatype to a contrib. And documenting its limitations.

> That's pretty much dead in the water, I think. Builtin types and 
> functions have Oid values in different ranges from non-builtin types 
> such as those in contrib. It's one reason we have no chance of bringing 
> hstore into core as people have previously asked for. And for the same 
> reason I think moving a core type out to contrib won't work.

Well, the OID compatibility issue could be dodged by saying that we can't
do a pg_upgrade (in-place upgrade) of a database containing MONEY
columns.  In fact, we might be able to just reject databases containing
MONEY[] (array) columns, which seems like it might be only a minor hazard.
Either way, requiring a dump/reload for upgrade is surely a better answer
for users of the type than just summarily screwing them.

> In any 
> case, contrib shouldn't be a rubbish heap of old deprecated features.

There's a fair amount of contrib that was never anything else, so I don't
agree with that reasoning too much.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
I wrote:
> Either way, requiring a dump/reload for upgrade is surely a better answer
> for users of the type than just summarily screwing them.

BTW, after reflecting a bit more I'm less than convinced that this
datatype is completely useless.  Even if you prefer to store currency
values in numeric columns, casting to or from money provides a way to
accept or emit values in whatever monetary format the LC_MONETARY locale
setting specifies.  That seems like a useful feature, and it's one you
could not easily duplicate using to_char/to_number (not to mention that
those functions aren't without major shortcomings of their own).

A quick trawl through the git history for cash.c indicates that most of
the bug fixes it's gotten recently have concerned doing formatting
correctly for non-US-like LC_MONETARY settings, so I think there are
indeed people out there who care about that capability.

Certainly we could package that capability in another way besides a
datatype, but I think we should actually do so (and give users a release
or two to switch over) before we consider nuking the type.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/02/2014 11:53 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 11/02/2014 10:01 AM, Jaime Casanova wrote:
>>> Not knowing how difficult it could be maybe a fair compromise is to
>>> move MONEY datatype to a contrib. And documenting its limitations.
>> That's pretty much dead in the water, I think. Builtin types and
>> functions have Oid values in different ranges from non-builtin types
>> such as those in contrib. It's one reason we have no chance of bringing
>> hstore into core as people have previously asked for. And for the same
>> reason I think moving a core type out to contrib won't work.
> Well, the OID compatibility issue could be dodged by saying that we can't
> do a pg_upgrade (in-place upgrade) of a database containing MONEY
> columns.  In fact, we might be able to just reject databases containing
> MONEY[] (array) columns, which seems like it might be only a minor hazard.
> Either way, requiring a dump/reload for upgrade is surely a better answer
> for users of the type than just summarily screwing them.


Well, OK, yes, if we're prepared to abandon pg_upgrade-ability.

>
>> In any
>> case, contrib shouldn't be a rubbish heap of old deprecated features.
> There's a fair amount of contrib that was never anything else, so I don't
> agree with that reasoning too much.
>
>             


Maybe my memory is failing. What in contrib is stuff that used to be in 
core?

cheers

andrew



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 11/02/2014 11:53 AM, Tom Lane wrote:
>> Well, the OID compatibility issue could be dodged by saying that we can't
>> do a pg_upgrade (in-place upgrade) of a database containing MONEY
>> columns.  In fact, we might be able to just reject databases containing
>> MONEY[] (array) columns, which seems like it might be only a minor hazard.
>> Either way, requiring a dump/reload for upgrade is surely a better answer
>> for users of the type than just summarily screwing them.

> Well, OK, yes, if we're prepared to abandon pg_upgrade-ability.

Not following your point?  Removing the type entirely would certainly
break pg_upgrade-ability as well.

>>> In any
>>> case, contrib shouldn't be a rubbish heap of old deprecated features.

>> There's a fair amount of contrib that was never anything else, so I don't
>> agree with that reasoning too much.

> Maybe my memory is failing. What in contrib is stuff that used to be in 
> core?

Nothing that I recall at the moment, but there is certainly plenty of
stuff of dubious quality in there.  I'd argue that chkpass, intagg,
intarray, isn, spi, and xml2 are all in worse shape than the money type.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/02/2014 02:41 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 11/02/2014 11:53 AM, Tom Lane wrote:
>>> Well, the OID compatibility issue could be dodged by saying that we can't
>>> do a pg_upgrade (in-place upgrade) of a database containing MONEY
>>> columns.  In fact, we might be able to just reject databases containing
>>> MONEY[] (array) columns, which seems like it might be only a minor hazard.
>>> Either way, requiring a dump/reload for upgrade is surely a better answer
>>> for users of the type than just summarily screwing them.
>> Well, OK, yes, if we're prepared to abandon pg_upgrade-ability.
> Not following your point?  Removing the type entirely would certainly
> break pg_upgrade-ability as well.
>
>

I'm not entirely convinced that we should remove it.

cheers

andrew



Re: Let's drop two obsolete features which are bear-traps for novices

From
Michael Banck
Date:
Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
> BTW, after reflecting a bit more I'm less than convinced that this
> datatype is completely useless.  Even if you prefer to store currency
> values in numeric columns, casting to or from money provides a way to
> accept or emit values in whatever monetary format the LC_MONETARY locale
> setting specifies.  That seems like a useful feature, and it's one you
> could not easily duplicate using to_char/to_number (not to mention that
> those functions aren't without major shortcomings of their own).

As an additional datapoint, Vitesse Data changed the DB schema from
NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
modification to data types is easy to understand -- money and double
types are faster than Numeric (and no one on this planet has a bank
account that overflows the money type, not any time soon)."[1] And
"Replaced NUMERIC fields representing currency with MONEY"[2].

Not sure whether they modified/optimized PostgreSQL with respect to the
MONEY data type and/or how much performance that gained, so CCing CK Tan
as well.


Michael

[1] http://vitesse-timing-on.blogspot.de/2014/10/running-tpch-on-postgresql-part-1.html
[2] http://vitessedata.com/benchmark/

-- 
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer




Re: Let's drop two obsolete features which are bear-traps for novices

From
Jim Nasby
Date:
On 11/1/14, 1:45 PM, Andrew Dunstan wrote:
> On 11/01/2014 02:34 PM, Andres Freund wrote:
>>> Yeah, if we were trying to duplicate the behavior of indisvalid, there'd
>>> need to be a way to detect the invalid index at plan time and not use it.
>>> But I'm not sure that that's actually an improvement from the user's
>>> standpoint: what they'd see is queries suddenly, and silently, performing
>>> a lot worse than they expect.  An explicit complaint about the necessary
>>> REINDEX seems more user-friendly from where I sit.
>> A REINDEX is imo unlikely to be acceptable. It takes long (why would you
>> bother on a small table?) and locks the relation/indexes.
>
>
> It's a bit of a pity we don't have REINDEX CONCURRENTLY.

Reviews welcome: https://commitfest.postgresql.org/action/patch_view?id=1563
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Let's drop two obsolete features which are bear-traps for novices

From
David Fetter
Date:
On Mon, Nov 03, 2014 at 01:54:09PM +0100, Michael Banck wrote:
> Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
> > BTW, after reflecting a bit more I'm less than convinced that this
> > datatype is completely useless.  Even if you prefer to store currency
> > values in numeric columns, casting to or from money provides a way to
> > accept or emit values in whatever monetary format the LC_MONETARY locale
> > setting specifies.  That seems like a useful feature, and it's one you
> > could not easily duplicate using to_char/to_number (not to mention that
> > those functions aren't without major shortcomings of their own).
> 
> As an additional datapoint, Vitesse Data changed the DB schema from
> NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
> modification to data types is easy to understand -- money and double
> types are faster than Numeric (and no one on this planet has a bank
> account that overflows the money type, not any time soon)."[1] And
> "Replaced NUMERIC fields representing currency with MONEY"[2].
> 
> Not sure whether they modified/optimized PostgreSQL with respect to the
> MONEY data type and/or how much performance that gained, so CCing CK Tan
> as well.

How does our NUMERIC type's performance compare to other systems'
precise types?  I realize that some of those systems might have
restrictions on publishing numbers they don't authorize, but they
might have pushed some authorized numbers if those numbers make them
look good.

Also, just a general three-state comparison, "we beat them handily,"
"we're somewhere near them" or "we're not even close" would be enough
to work from.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Let's drop two obsolete features which are bear-traps for novices

From
Alvaro Herrera
Date:
David Fetter wrote:
> On Mon, Nov 03, 2014 at 01:54:09PM +0100, Michael Banck wrote:

> > As an additional datapoint, Vitesse Data changed the DB schema from
> > NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
> > modification to data types is easy to understand -- money and double
> > types are faster than Numeric (and no one on this planet has a bank
> > account that overflows the money type, not any time soon)."[1] And
> > "Replaced NUMERIC fields representing currency with MONEY"[2].
> > 
> > Not sure whether they modified/optimized PostgreSQL with respect to the
> > MONEY data type and/or how much performance that gained, so CCing CK Tan
> > as well.
> 
> How does our NUMERIC type's performance compare to other systems'
> precise types?  I realize that some of those systems might have
> restrictions on publishing numbers they don't authorize, but they
> might have pushed some authorized numbers if those numbers make them
> look good.

There is a real advantage of money over numeric in the performance
front.  I haven't measured it, but suffice to say that money uses
integer operations which map almost directly to CPU instructions,
whereas numeric needs to decode from our varlena base-10000 digit
format, operate on digits at a time, then encode back.  No matter how
much you optimize numeric, it's never going to outperform stuff that
runs practically on bare electrons.  As far as I recall, some TPCH
queries run aggregations on currency columns.

Now, whether this makes a measurable difference or not in TPCH terms, I
have no idea.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
Jeff Janes
Date:
On Sat, Nov 1, 2014 at 10:26 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-11-01 10:18:03 -0700, Josh Berkus wrote:
> On 10/31/2014 03:07 PM, Tom Lane wrote:
> > I don't care one way or the other about the money type, but I will defend
> > hash indexes, especially seeing that we've already added a pretty
> > in-your-face warning as of 9.5:
> >
> > regression=# create table foo(f1 int);
> > CREATE TABLE
> > regression=# create index on foo using hash (f1);
> > WARNING:  hash indexes are not WAL-logged and their use is discouraged
> > CREATE INDEX
>
> Yes, and I'm arguing that is the wrong decision.  If hash indexes are
> "discouraged", then they shouldn't be in core in the first place.

Last time we discussed it there were people (IIRC Andrew was one of
them) commenting that they use hash indexes *precisely* because they're
not WAL logged and that they can live with the dangers that creates. I
don't think that's sufficient justification for introducing the feature
at all. But it's nothing new that removing a feature has to fit quite
different criteria than adding one.

So, by that argument we could remove hash indexes once we have unlogged
indexes on logged tables. But then there's no need to remove them
anymore...

I would object to removing hash indexes as long as the alternative way to index oversized value is to write all my SQL to look like:

select count(*) from foo where substr(x,1,2700)=substr($1,1,2700) and x=$1

Now, if the planner were smart enough to realize that x=$1 implies substr(x,1,2700)=substr($1,1,2700), that might be a different matter.  But it is not.

Or, if there were a way to create a view on foo which would do this implication automatically, but again as far as I know there is not a way to do that either.

Cheers,

Jeff

Re: Let's drop two obsolete features which are bear-traps for novices

From
Jim Nasby
Date:
On 11/3/14, 1:34 PM, Alvaro Herrera wrote:
> David Fetter wrote:
>> On Mon, Nov 03, 2014 at 01:54:09PM +0100, Michael Banck wrote:
>>> As an additional datapoint, Vitesse Data changed the DB schema from
>>> NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
>>> modification to data types is easy to understand -- money and double
>>> types are faster than Numeric (and no one on this planet has a bank
>>> account that overflows the money type, not any time soon)."[1] And
>>> "Replaced NUMERIC fields representing currency with MONEY"[2].
>>>
>>> Not sure whether they modified/optimized PostgreSQL with respect to the
>>> MONEY data type and/or how much performance that gained, so CCing CK Tan
>>> as well.
>>
>> How does our NUMERIC type's performance compare to other systems'
>> precise types?  I realize that some of those systems might have
>> restrictions on publishing numbers they don't authorize, but they
>> might have pushed some authorized numbers if those numbers make them
>> look good.
>
> There is a real advantage of money over numeric in the performance
> front.  I haven't measured it, but suffice to say that money uses
> integer operations which map almost directly to CPU instructions,
> whereas numeric needs to decode from our varlena base-10000 digit
> format, operate on digits at a time, then encode back.  No matter how
> much you optimize numeric, it's never going to outperform stuff that
> runs practically on bare electrons.  As far as I recall, some TPCH
> queries run aggregations on currency columns.
>
> Now, whether this makes a measurable difference or not in TPCH terms, I
> have no idea.

The performance of our numeric vs Oracle's was a common complaint when I was at EnterpriseDB (in 2007).

Perhaps numeric's performance could be greatly improved in cases where the precision is low enough to map to an
int/bigint.That would get us closer to eliminating money as well as give other uses a big win. Of course, how to do
thatand not break pg_upgrade is a huge question...
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tatsuo Ishii
Date:
> The performance of our numeric vs Oracle's was a common complaint when
> I was at EnterpriseDB (in 2007).
> 
> Perhaps numeric's performance could be greatly improved in cases where
> the precision is low enough to map to an int/bigint. That would get us
> closer to eliminating money as well as give other uses a big win. Of
> course, how to do that and not break pg_upgrade is a huge question...

Just out of curiosity, why is Oracle's NUMBER (I assume you are
talking about this) so fast?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Let's drop two obsolete features which are bear-traps for novices

From
Kevin Grittner
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> There is a real advantage of money over numeric in the performance
> front.  I haven't measured it, but suffice to say that money uses
> integer operations which map almost directly to CPU instructions,
> whereas numeric needs to decode from our varlena base-10000 digit
> format, operate on digits at a time, then encode back.  No matter how
> much you optimize numeric, it's never going to outperform stuff that
> runs practically on bare electrons.  As far as I recall, some TPCH
> queries run aggregations on currency columns.
>
> Now, whether this makes a measurable difference or not in TPCH terms, I
> have no idea.

Best of 3 tries on each statement...

A count(*) as a baseline:

test=# do $$ begin perform count(*) from generate_series(1,10000000); end; $$;
DO
Time: 3260.920 ms

A sum of money:

test=# do $$ begin perform sum('10000.01'::money) from generate_series(1,10000000); end; $$;
DO
Time: 3572.709 ms

A sum of numeric:

test=# do $$ begin perform sum('10000.01'::numeric) from generate_series(1,10000000); end; $$;
DO
Time: 4805.559 ms

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Let's drop two obsolete features which are bear-traps for novices

From
David Fetter
Date:
On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
> > The performance of our numeric vs Oracle's was a common complaint when
> > I was at EnterpriseDB (in 2007).
> > 
> > Perhaps numeric's performance could be greatly improved in cases where
> > the precision is low enough to map to an int/bigint. That would get us
> > closer to eliminating money as well as give other uses a big win. Of
> > course, how to do that and not break pg_upgrade is a huge question...
> 
> Just out of curiosity, why is Oracle's NUMBER (I assume you are
> talking about this) so fast?

I suspect that what happens is that NUMBER is stored as a native type
(int2, int4, int8, int16) that depends on its size and then cast to
the next upward thing as needed, taking any performance hits at that
point.  The documentation hints (38 decimal places) at a 128-bit
internal representation as the maximum.  I don't know what happens
when you get past what 128 bits can represent.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Let's drop two obsolete features which are bear-traps for novices

From
Josh Berkus
Date:
On 11/02/2014 11:41 AM, Tom Lane wrote:
> Nothing that I recall at the moment, but there is certainly plenty of
> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
> intarray, isn, spi, and xml2 are all in worse shape than the money type.

Why are we holding on to xml2 again?

FWIW, I'd be fine with moving ISN, intarray and intagg to PGXN.  In
fact, I think it would be better for them to be there.  And they're not
core types, so there shouldn't be an issue with that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Let's drop two obsolete features which are bear-traps for novices

From
Michael Paquier
Date:
<div dir="ltr">On Sun, Nov 2, 2014 at 2:30 AM, Tom Lane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us"
target="_blank">tgl@sss.pgh.pa.us</a>></span>wrote:<br /><div class="gmail_extra"><div
class="gmail_quote"><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">Inthe case of hash indexes, because we still have to have the hash<br /> opclasses in core,
there'sno way that it could be pushed out as an<br /> extension module even if we otherwise had full support for AMs
as<br/> extensions.  So what I hear you proposing is "let's break this so<br /> thoroughly that it *can't* be fixed". 
I'mnot on board with that.<br /> I think the WARNING will do just fine to discourage novices who are<br /> not familiar
withthe state of the hash AM.  In the meantime, we<br /> could push forward with the idea of making hash indexes
automatically<br/> unlogged, so that recovering from a crash wouldn't be quite so messy/<br /> dangerous.<br
/></blockquote></div>Thereis as well another way: finally support WAL-logging for hash indexes.<br />-- <br /><div
class="gmail_signature">Michael<br/></div></div></div> 

Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 11/02/2014 11:41 AM, Tom Lane wrote:
>> Nothing that I recall at the moment, but there is certainly plenty of
>> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
>> intarray, isn, spi, and xml2 are all in worse shape than the money type.

> Why are we holding on to xml2 again?

IIRC, there's some xpath-related functionality in there that's not
yet available in core (and needs some redesign before it'd ever get
accepted into core, so there's not a real quick fix to be had).
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> There is as well another way: finally support WAL-logging for hash indexes.

Sure, but that's a bit easier said than done.  I think Robert looked into
that a year or two back and found some stumbling blocks that it wasn't
obvious how to surmount.  I do hope it will happen eventually, but we
should assume that it's a long-term answer not a short-term one.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Feng Tian
Date:
Hi,

This is Feng from Vitesse.   Performance different between Money and Numeric is *HUGE*.   For TPCH Q1, the performance difference is 5x for stock postgres, and ~20x for vitesse.  

Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s, use Numeric (15, 2) is ~53s.  

Kevin, 
 test=# do $$ begin perform sum('10000.01'::numeric) from generate_series(1,10000000); end; $$;

This may not reflect the difference of the two data type.   One aggregate is not where most of the time is spent.  TPCH Q1 has many more computing.

















On Mon, Nov 3, 2014 at 4:54 AM, Michael Banck <michael.banck@credativ.de> wrote:
Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
> BTW, after reflecting a bit more I'm less than convinced that this
> datatype is completely useless.  Even if you prefer to store currency
> values in numeric columns, casting to or from money provides a way to
> accept or emit values in whatever monetary format the LC_MONETARY locale
> setting specifies.  That seems like a useful feature, and it's one you
> could not easily duplicate using to_char/to_number (not to mention that
> those functions aren't without major shortcomings of their own).

As an additional datapoint, Vitesse Data changed the DB schema from
NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
modification to data types is easy to understand -- money and double
types are faster than Numeric (and no one on this planet has a bank
account that overflows the money type, not any time soon)."[1] And
"Replaced NUMERIC fields representing currency with MONEY"[2].

Not sure whether they modified/optimized PostgreSQL with respect to the
MONEY data type and/or how much performance that gained, so CCing CK Tan
as well.


Michael

[1] http://vitesse-timing-on.blogspot.de/2014/10/running-tpch-on-postgresql-part-1.html
[2] http://vitessedata.com/benchmark/

--
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Let's drop two obsolete features which are bear-traps for novices

From
Craig Ringer
Date:
On 11/03/2014 03:41 AM, Tom Lane wrote:
> Nothing that I recall at the moment, but there is certainly plenty of
> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
> intarray, isn, spi, and xml2 are all in worse shape than the money type.

What's wrong with intarray?

I'm aware of issues with the rest (and I think citext has a few of its
own too) but I didn't think intarray had any real issues.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Let's drop two obsolete features which are bear-traps for novices

From
CK Tan
Date:
Josh,

Do you have a list of what needs to be done to keep the MONEY type?
What is wrong with it?

Thanks,
-cktan

On Mon, Nov 3, 2014 at 10:30 PM, Feng Tian <ftian@vitessedata.com> wrote:
> Hi,
>
> This is Feng from Vitesse.   Performance different between Money and Numeric
> is *HUGE*.   For TPCH Q1, the performance difference is 5x for stock
> postgres, and ~20x for vitesse.
>
> Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s, use Numeric
> (15, 2) is ~53s.
>
> Kevin,
>  test=# do $$ begin perform sum('10000.01'::numeric) from
> generate_series(1,10000000); end; $$;
>
> This may not reflect the difference of the two data type.   One aggregate is
> not where most of the time is spent.  TPCH Q1 has many more computing.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Nov 3, 2014 at 4:54 AM, Michael Banck <michael.banck@credativ.de>
> wrote:
>>
>> Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
>> > BTW, after reflecting a bit more I'm less than convinced that this
>> > datatype is completely useless.  Even if you prefer to store currency
>> > values in numeric columns, casting to or from money provides a way to
>> > accept or emit values in whatever monetary format the LC_MONETARY locale
>> > setting specifies.  That seems like a useful feature, and it's one you
>> > could not easily duplicate using to_char/to_number (not to mention that
>> > those functions aren't without major shortcomings of their own).
>>
>> As an additional datapoint, Vitesse Data changed the DB schema from
>> NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
>> modification to data types is easy to understand -- money and double
>> types are faster than Numeric (and no one on this planet has a bank
>> account that overflows the money type, not any time soon)."[1] And
>> "Replaced NUMERIC fields representing currency with MONEY"[2].
>>
>> Not sure whether they modified/optimized PostgreSQL with respect to the
>> MONEY data type and/or how much performance that gained, so CCing CK Tan
>> as well.
>>
>>
>> Michael
>>
>> [1]
>> http://vitesse-timing-on.blogspot.de/2014/10/running-tpch-on-postgresql-part-1.html
>> [2] http://vitessedata.com/benchmark/
>>
>> --
>> Michael Banck
>> Projektleiter / Berater
>> Tel.: +49 (2161) 4643-171
>> Fax:  +49 (2161) 4643-100
>> Email: michael.banck@credativ.de
>>
>> credativ GmbH, HRB Mönchengladbach 12080
>> USt-ID-Nummer: DE204566209
>> Hohenzollernstr. 133, 41061 Mönchengladbach
>> Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



Re: Let's drop two obsolete features which are bear-traps for novices

From
Albe Laurenz
Date:
David Fetter wrote:
> On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
>> Just out of curiosity, why is Oracle's NUMBER (I assume you are
>> talking about this) so fast?
> 
> I suspect that what happens is that NUMBER is stored as a native type
> (int2, int4, int8, int16) that depends on its size and then cast to
> the next upward thing as needed, taking any performance hits at that
> point.  The documentation hints (38 decimal places) at a 128-bit
> internal representation as the maximum.  I don't know what happens
> when you get past what 128 bits can represent.

No, Oracle stores NUMBERs as variable length field (up to 22 bytes),
where the first byte encodes the sign and the comma position and the
remaining bytes encode the digits, each byte representing two digits
in base-100 notation (see Oracle Metalink note 1007641.6).

So it's not so different from PostgreSQL.
No idea why their arithmetic should be faster.

Yours,
Laurenz Albe

Re: Let's drop two obsolete features which are bear-traps for novices

From
"ktm@rice.edu"
Date:
On Tue, Nov 04, 2014 at 11:44:22AM +0900, Michael Paquier wrote:
> On Sun, Nov 2, 2014 at 2:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > In the case of hash indexes, because we still have to have the hash
> > opclasses in core, there's no way that it could be pushed out as an
> > extension module even if we otherwise had full support for AMs as
> > extensions.  So what I hear you proposing is "let's break this so
> > thoroughly that it *can't* be fixed".  I'm not on board with that.
> > I think the WARNING will do just fine to discourage novices who are
> > not familiar with the state of the hash AM.  In the meantime, we
> > could push forward with the idea of making hash indexes automatically
> > unlogged, so that recovering from a crash wouldn't be quite so messy/
> > dangerous.
> >
> There is as well another way: finally support WAL-logging for hash indexes.

+1

Ken



Re: Let's drop two obsolete features which are bear-traps for novices

From
Kevin Grittner
Date:
Feng Tian <ftian@vitessedata.com> wrote:

> Performance different between Money and Numeric is *HUGE*.  For
> TPCH Q1, the performance difference is 5x for stock postgres, and
> ~20x for vitesse.
>
> Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s,
> use Numeric (15, 2) is ~53s.
>
>> test=# do $$ begin perform sum('10000.01'::numeric) from generate_series(1,10000000); end; $$;
>
> This may not reflect the difference of the two data type.   One
> aggregate is not where most of the time is spent.  TPCH Q1 has
> many more computing.

That's why I gave the count(*) baseline.  If you subtract that from
the sum() timings for money and numeric, numeric takes 5x as long
as money in my quick-and-dirty benchmark. I have no problem
believing that some applications can see that level of performance
hit from using numeric instead of money.  It's a little surprising
to me to see more than a factor of five slowdown from using numeric
instead of money; it leaves me a little curious how that happens.
Perhaps that workload is more conducive to keeping those money
amounts in registers and doing register arithmetic.

In any event, I'm against removing or re-deprecating the money
type.  There are some problems with money; there are other problems
with numeric.  If the docs are failing to make the trade-offs
clear, we should fix the docs.  And we can always look at improving
either or both types.  The fact that numeric is 5x to 20x slower
than money in important applications, combined with the fact that
it performs far worse than a similar type in another product,
suggest that numeric could stand a serious optimization pass.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andreas Karlsson
Date:
On 11/02/2014 06:41 PM, Tom Lane wrote:
> I wrote:
>> Either way, requiring a dump/reload for upgrade is surely a better answer
>> for users of the type than just summarily screwing them.
>
> BTW, after reflecting a bit more I'm less than convinced that this
> datatype is completely useless.  Even if you prefer to store currency
> values in numeric columns, casting to or from money provides a way to
> accept or emit values in whatever monetary format the LC_MONETARY locale
> setting specifies.  That seems like a useful feature, and it's one you
> could not easily duplicate using to_char/to_number (not to mention that
> those functions aren't without major shortcomings of their own).

I personally find that one of the money type's missfeatures, since the 
currency symbol does not vary just per language/locale but also for 
which currency you are working with. For databases with multiple 
currencies the output format of money is just confusing.

Andreas




Re: Let's drop two obsolete features which are bear-traps for novices

From
Merlin Moncure
Date:
On Tue, Nov 4, 2014 at 8:16 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> In any event, I'm against removing or re-deprecating the money
> type.  There are some problems with money; there are other problems
> with numeric.  If the docs are failing to make the trade-offs
> clear, we should fix the docs.  And we can always look at improving
> either or both types.  The fact that numeric is 5x to 20x slower
> than money in important applications, combined with the fact that
> it performs far worse than a similar type in another product,
> suggest that numeric could stand a serious optimization pass.

Money should stay.  It is the only fixed point integer based numeric
that comes with the database.  Should a non locale dependent fixed
point type that offers the same performance come along, then you have
a good argument to deprecate.

numeric is definitely a dog which the fundamental problem IMNSHO.  I'm
guessing optimization paths are going to revolve around utilizing
binary integer ops in cases where it's known to be safe to do so.

merlin



Re: Let's drop two obsolete features which are bear-traps for novices

From
Andrew Dunstan
Date:
On 11/03/2014 10:11 PM, Tom Lane wrote:
> Josh Berkus<josh@agliodbs.com>  writes:
>> On 11/02/2014 11:41 AM, Tom Lane wrote:
>>> Nothing that I recall at the moment, but there is certainly plenty of
>>> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
>>> intarray, isn, spi, and xml2 are all in worse shape than the money type.
>> Why are we holding on to xml2 again?
> IIRC, there's some xpath-related functionality in there that's not
> yet available in core (and needs some redesign before it'd ever get
> accepted into core, so there's not a real quick fix to be had).
>
>             

Yes, xpath_table is badly broken, as Robert Haas documented and I 
expanded on a while back. See for example 
<http://www.postgresql.org/message-id/4D6BCC1E.3010406@dunslane.net> I 
don't have any time of my own to work on this any time soon. But I know 
that it has users, so just throwing it out would upset some people.

cheers

andrew




Re: Let's drop two obsolete features which are bear-traps for novices

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> On 11/03/2014 03:41 AM, Tom Lane wrote:
>> Nothing that I recall at the moment, but there is certainly plenty of
>> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
>> intarray, isn, spi, and xml2 are all in worse shape than the money type.

> What's wrong with intarray?

The single biggest practical problem with it is that it creates new
definitions of the <@ and @> operators, which cause confusion with the
core operators of those names.  For example, we've repeatedly seen bug
reports along the lines of "why does this query not use this index"
because of people trying to use the contrib @> operator with a core GIN
index or vice versa.  (In fairness, I think intarray might be older
than the core operators, but that doesn't make this less of a problem.)

Another thing that grates on me is the mostly-arbitrary restriction
to non-null-containing arrays.  That's an implementation artifact rather
than something semantically required by the operations.

More generally, it seems like a grab bag of not terribly well designed
features, and the features that do seem well designed seem like they
ought to be more generic than just for int4 arrays.  So to me it feels
like proof-of-concept experimentation rather than a production-grade
thing that we could feel good about moving into core.
        regards, tom lane



Re: Let's drop two obsolete features which are bear-traps for novices

From
Josh Berkus
Date:
On 11/04/2014 07:33 AM, Tom Lane wrote:
> More generally, it seems like a grab bag of not terribly well designed
> features, and the features that do seem well designed seem like they
> ought to be more generic than just for int4 arrays.  So to me it feels
> like proof-of-concept experimentation rather than a production-grade
> thing that we could feel good about moving into core.

Yah, as a user of intarray, I think it belongs as an external extension.Heck, I think it might get more attention as an
externalextension, and
 
if it were external then folks could fork it and create versions that
don't have operator conflicts.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Let's drop two obsolete features which are bear-traps for novices

From
David Fetter
Date:
On Tue, Nov 04, 2014 at 08:30:21AM +0000, Laurenz Albe wrote:
> David Fetter wrote:
> > On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
> >> Just out of curiosity, why is Oracle's NUMBER (I assume you are
> >> talking about this) so fast?
> > 
> > I suspect that what happens is that NUMBER is stored as a native
> > type (int2, int4, int8, int16) that depends on its size and then
> > cast to the next upward thing as needed, taking any performance
> > hits at that point.  The documentation hints (38 decimal places)
> > at a 128-bit internal representation as the maximum.  I don't know
> > what happens when you get past what 128 bits can represent.
> 
> No, Oracle stores NUMBERs as variable length field (up to 22 bytes),
> where the first byte encodes the sign and the comma position and the
> remaining bytes encode the digits, each byte representing two digits
> in base-100 notation (see Oracle Metalink note 1007641.6).

Thanks for clearing that up, and sorry for spreading misinformed
guesses.

> So it's not so different from PostgreSQL.
> No idea why their arithmetic should be faster.

I have an idea, but this time, I think it's right.  They have at least
one team of people whose job it is to make sure that it is fast.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate