Thread: logical column ordering

logical column ordering

From
Alvaro Herrera
Date:
So I've been updating my very old patch to allow logical and physical
column reordering.  Here's a WIP first cut for examination.  There are
plenty of rough edges here; most importantly there is no UI at all for
column reordering other than direct UPDATEs of pg_attribute, which most
likely falls afoul of cache invalidation problems.  For now I'm focusing
on correct processing when columns are moved logically; I haven't yet
started to see how to move columns physically, but I think that part is
much more localized than the logical one.

Just as a reminder, this effort is an implementation of ideas that have
been discussed previously; in particular, see these threads:

http://www.postgresql.org/message-id/20414.1166719407@sss.pgh.pa.us (2006)
http://www.postgresql.org/message-id/6843.1172126270@sss.pgh.pa.us (2007)
http://www.postgresql.org/message-id/23035.1227659434@sss.pgh.pa.us (2008)

To recap, this is based on the idea of having three numbers for each
attribute rather than a single attnum; the first of these is attnum (a
number that uniquely identifies an attribute since its inception and may
or may not have any relationship to its storage position and the place
it expands to through user interaction).  The second is attphysnum,
which indicates where it is stored in the physical structure.  The third
is attlognum, which indicates where it expands in "*", where must its
values be placed in COPY or VALUES lists, etc --- the logical position
as the user sees it.

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as "*" (in "select *") are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,
TupleDescGetSortedAttrs() which computes a new Attribute array and
caches it in the TupleDesc for later uses; this array points to the
same elements in the normal attribute list but is order by attlognum.

Additionally there are a number of places that iterate on such target
lists and use the iterator as the attribute number; those were modified
to have a separate attribute number as attnum within the loop.

Another place that needs tweaking is heapam.c, which must construct a
physical tuple from Datum/nulls arrays (heap_form_tuple).  In some cases
the input arrays are sorted in logical column order.  I have opted to
add a flag that indicates whether the array is in logical order; if it
is the routines compute the correct physical order.  (Actually as I
mentioned above I still haven't made any effort to make sure they work
in the case that attnum differs from attphysnum, but this should be
reasonably contained changes.)


The part where I stopped just before sending the current state is this
error message:

alvherre=# select * from quux where (a,c) in ( select a,c from quux );
select * from quux where (a,c) in ( select a,c from quux );
ERROR:  failed to find unique expression in subplan tlist

I'm going to see about it while I get feedback on the rest of this patch; in
particular, extra test cases that fail to work when columns have been
moved around are welcome, so that I can add them to the regress test.
What I have now is the basics I'm building as I go along.  The
regression tests show examples of some logical column renumbering (which
can be done after the table already contains some data) but none of
physical column renumbering (which can only be done when the table is
completely empty.)  My hunch is that the sample foo, bar, baz, quux
tables should present plenty of opportunities to display brokenness in
the planner and executor.


PS: Phil Currier allegedly had a patch back in 2007-2008 that did this,
or something very similar ... though he never posted a single bit of it,
and then he vanished without a trace.  If he's still available it would
be nice to see his WIP patch, even if outdated, as it might serve as
inspiration and let us know what other places need tweaking.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: logical column ordering

From
Jim Nasby
Date:
On 12/9/14, 11:41 AM, Alvaro Herrera wrote:
> I'm going to see about it while I get feedback on the rest of this patch; in
> particular, extra test cases that fail to work when columns have been
> moved around are welcome, so that I can add them to the regress test.
> What I have now is the basics I'm building as I go along.  The
> regression tests show examples of some logical column renumbering (which
> can be done after the table already contains some data) but none of
> physical column renumbering (which can only be done when the table is
> completely empty.)  My hunch is that the sample foo, bar, baz, quux
> tables should present plenty of opportunities to display brokenness in
> the planner and executor.

The ideal case would be to do something like randomizing logical and physical ordering as tables are created throughout
theentire test suite (presumably as an option). That should work for physical ordering, but presumably it would
pointlesslyblow up on logical ordering because the expected output is hard-coded.
 

Perhaps instead of randomizing logical ordering we could force that to be the same ordering in which fields were
suppliedand actually randomize attnum?
 

In particular, I'm thinking that in DefineRelation we can randomize stmt->tableElts before merging in inheritance
attributes.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
Josh Berkus
Date:
On 12/09/2014 09:41 AM, Alvaro Herrera wrote:
> The first thing where this matters is tuple descriptor expansion in
> parse analysis; at this stage, things such as "*" (in "select *") are
> turned into a target list, which must be sorted according to attlognum.
> To achieve this I added a new routine to tupledescs,

The two other major cases are:

INSERT INTO table SELECT|VALUES ...

COPY table FROM|TO ...

... although copy should just be a subclass of SELECT.

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order.  If you're doing COPY to
create CSV files for output, then you want the columns in logical order.If you're doing COPY for pg_dump, then you want
themin physical order
 
for faster dump/reload.  So we're almost certainly going to need to have
an option for COPY.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Andrew Dunstan
Date:
On 12/09/2014 06:19 PM, Josh Berkus wrote:
> On 12/09/2014 09:41 AM, Alvaro Herrera wrote:
>> The first thing where this matters is tuple descriptor expansion in
>> parse analysis; at this stage, things such as "*" (in "select *") are
>> turned into a target list, which must be sorted according to attlognum.
>> To achieve this I added a new routine to tupledescs,
> The two other major cases are:
>
> INSERT INTO table SELECT|VALUES ...
>
> COPY table FROM|TO ...
>
> ... although copy should just be a subclass of SELECT.
>
> Question on COPY, though: there's reasons why people would want COPY to
> dump in either physical or logical order.  If you're doing COPY to
> create CSV files for output, then you want the columns in logical order.
>   If you're doing COPY for pg_dump, then you want them in physical order
> for faster dump/reload.  So we're almost certainly going to need to have
> an option for COPY.
>
>


I seriously doubt it, although I could be wrong. Unless someone can show 
a significant performance gain from using physical order, which would be 
a bit of a surprise to me, I would just stick with logical ordering as 
the default.

cheers

andrew




Re: logical column ordering

From
Alvaro Herrera
Date:
Andrew Dunstan wrote:
> 
> On 12/09/2014 06:19 PM, Josh Berkus wrote:
> >On 12/09/2014 09:41 AM, Alvaro Herrera wrote:
> >>The first thing where this matters is tuple descriptor expansion in
> >>parse analysis; at this stage, things such as "*" (in "select *") are
> >>turned into a target list, which must be sorted according to attlognum.
> >>To achieve this I added a new routine to tupledescs,
> >The two other major cases are:
> >
> >INSERT INTO table SELECT|VALUES ...
> >
> >COPY table FROM|TO ...

Yes, both are covered.

> >... although copy should just be a subclass of SELECT.

It is not.  There's one part of COPY that goes through SELECT
processing, but only when the "table" being copied is a subselect.
Normal COPY does not use the same code path.

> >Question on COPY, though: there's reasons why people would want COPY to
> >dump in either physical or logical order.  If you're doing COPY to
> >create CSV files for output, then you want the columns in logical order.
> >  If you're doing COPY for pg_dump, then you want them in physical order
> >for faster dump/reload.  So we're almost certainly going to need to have
> >an option for COPY.
> 
> I seriously doubt it, although I could be wrong. Unless someone can show a
> significant performance gain from using physical order, which would be a bit
> of a surprise to me, I would just stick with logical ordering as the
> default.

Well, we have an optimization that avoids a projection step IIRC by
using the "physical tlist" instead of having to build a tailored one.  I
guess the reason that's there is because somebody did measure an
improvement.  Maybe it *is* worth having as an option for pg_dump ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Question on COPY, though: there's reasons why people would want COPY to
> dump in either physical or logical order.  If you're doing COPY to
> create CSV files for output, then you want the columns in logical order.
>  If you're doing COPY for pg_dump, then you want them in physical order
> for faster dump/reload.  So we're almost certainly going to need to have
> an option for COPY.

This is complete nonsense, Josh, or at least it is until you can provide
some solid evidence to believe that column ordering would make any
noticeable performance difference in COPY.  I know of no reason to believe
that the existing user-defined-column-ordering option makes any difference.
        regards, tom lane



Re: logical column ordering

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Andrew Dunstan wrote:
>> I seriously doubt it, although I could be wrong. Unless someone can show a
>> significant performance gain from using physical order, which would be a bit
>> of a surprise to me, I would just stick with logical ordering as the
>> default.

> Well, we have an optimization that avoids a projection step IIRC by
> using the "physical tlist" instead of having to build a tailored one.  I
> guess the reason that's there is because somebody did measure an
> improvement.  Maybe it *is* worth having as an option for pg_dump ...

The physical tlist thing is there because it's demonstrable that
ExecProject() takes nonzero time.  COPY does not go through ExecProject
though.  What's more, it already has code to deal with a user-specified
column order, and nobody's ever claimed that that code imposes a
measurable performance overhead.
        regards, tom lane



Re: logical column ordering

From
Robert Haas
Date:
On Wed, Dec 10, 2014 at 12:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Andrew Dunstan wrote:
>>> I seriously doubt it, although I could be wrong. Unless someone can show a
>>> significant performance gain from using physical order, which would be a bit
>>> of a surprise to me, I would just stick with logical ordering as the
>>> default.
>
>> Well, we have an optimization that avoids a projection step IIRC by
>> using the "physical tlist" instead of having to build a tailored one.  I
>> guess the reason that's there is because somebody did measure an
>> improvement.  Maybe it *is* worth having as an option for pg_dump ...
>
> The physical tlist thing is there because it's demonstrable that
> ExecProject() takes nonzero time.  COPY does not go through ExecProject
> though.  What's more, it already has code to deal with a user-specified
> column order, and nobody's ever claimed that that code imposes a
> measurable performance overhead.

Also, if we're adding options to use the physical rather than the
logical column ordering in too many places, that's probably a sign
that we need to rethink this whole concept.  The concept of a logical
column ordering doesn't have much meaning if you're constantly forced
to fall back to some other column ordering whenever you want good
performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: logical column ordering

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Wed, Dec 10, 2014 at 12:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >> Andrew Dunstan wrote:
> >>> I seriously doubt it, although I could be wrong. Unless someone can show a
> >>> significant performance gain from using physical order, which would be a bit
> >>> of a surprise to me, I would just stick with logical ordering as the
> >>> default.
> >
> >> Well, we have an optimization that avoids a projection step IIRC by
> >> using the "physical tlist" instead of having to build a tailored one.  I
> >> guess the reason that's there is because somebody did measure an
> >> improvement.  Maybe it *is* worth having as an option for pg_dump ...
> >
> > The physical tlist thing is there because it's demonstrable that
> > ExecProject() takes nonzero time.  COPY does not go through ExecProject
> > though.  What's more, it already has code to deal with a user-specified
> > column order, and nobody's ever claimed that that code imposes a
> > measurable performance overhead.
> 
> Also, if we're adding options to use the physical rather than the
> logical column ordering in too many places, that's probably a sign
> that we need to rethink this whole concept.  The concept of a logical
> column ordering doesn't have much meaning if you're constantly forced
> to fall back to some other column ordering whenever you want good
> performance.

FWIW I have no intention to add options for physical/logical ordering
anywhere.  All users will see is that tables will follow the same
(logical) order everywhere.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Robert Haas
Date:
On Wed, Dec 10, 2014 at 9:25 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> FWIW I have no intention to add options for physical/logical ordering
> anywhere.  All users will see is that tables will follow the same
> (logical) order everywhere.

Just to be clear, I wasn't in any way attending to say that the patch
had a problem in this area.  I was just expressing concern about the
apparent rush to judgement on whether converting between physical and
logical column ordering would be expensive.  I certainly think that's
something that we should test - for example, we might want to consider
whether there are cases where you could maybe convince the executor to
spend a lot of time pointlessly reorganizing tuples in ways that
wouldn't happen today.  But I have no particular reason to think that
any issues we hit there issues won't be solvable.

To the extent that I have any concern about the patch at this point,
it's around stability.  I would awfully rather see something like this
get committed at the beginning of a development cycle than the end.
It's quite possible that I'm being more nervous than is justified, but
given that we're *still* fixing bugs related to dropped-column
handling (cf. 9b35ddce93a2ef336498baa15581b9d10f01db9c from July of
this year) which was added in July 2002, maybe not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: logical column ordering

From
Stephen Frost
Date:
Robert, all,

* Robert Haas (robertmhaas@gmail.com) wrote:
> To the extent that I have any concern about the patch at this point,
> it's around stability.  I would awfully rather see something like this
> get committed at the beginning of a development cycle than the end.

I tend to agree with this; we have a pretty bad habit of bouncing
around big patches until the end and then committing them.  That's not
as bad when the patch has been getting reviews and feedback over a few
months (or years) but this particular patch isn't even code-complete at
this point, aiui.

> It's quite possible that I'm being more nervous than is justified, but
> given that we're *still* fixing bugs related to dropped-column
> handling (cf. 9b35ddce93a2ef336498baa15581b9d10f01db9c from July of
> this year) which was added in July 2002, maybe not.

I'm not quite sure that I see how that's relevant.  Bugs will happen,
unfortunately, no matter how much review is done of a given patch.  That
isn't to say that we shouldn't do any review, but it's a trade-off.
This change, at least, strikes me as less likely to have subtle bugs
in it as compared to the dropped column case.
Thanks,
    Stephen

Re: logical column ordering

From
Robert Haas
Date:
On Wed, Dec 10, 2014 at 11:22 AM, Stephen Frost <sfrost@snowman.net> wrote:
> I'm not quite sure that I see how that's relevant.  Bugs will happen,
> unfortunately, no matter how much review is done of a given patch.  That
> isn't to say that we shouldn't do any review, but it's a trade-off.
> This change, at least, strikes me as less likely to have subtle bugs
> in it as compared to the dropped column case.

Yeah, that's possible.  They seem similar to me because they both
introduce new ways for the tuple as it is stored on disk to be
different from what must be shown to the user.  But I don't really
know how well that translates to what needs to be changed on a code
level.  If we're basically going back over all the same places that
needed special handling for attisdropped, then the likelihood of bugs
may be quite a bit lower than it was for that patch, because now we
know (mostly!) which places require attisdropped handling and we can
audit them all to make sure they handle this, too.  But if it's a
completely different set of places that need to be touched, then I
think there's a lively possibility for bugs of omission.

Ultimately, I think this is mostly going to be a question of what
Alvaro feels comfortable with; he's presumably going to have a better
sense of where and to what extent there might be bugs lurking than any
of the rest of us.  But the point is worth considering, because I
think we would probably all agree that having a release that is stable
and usable right out of the gate is more important than having any
single feature get into that release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: logical column ordering

From
Andres Freund
Date:
On 2014-12-10 12:06:11 -0500, Robert Haas wrote:
> Ultimately, I think this is mostly going to be a question of what
> Alvaro feels comfortable with; he's presumably going to have a better
> sense of where and to what extent there might be bugs lurking than any
> of the rest of us.  But the point is worth considering, because I
> think we would probably all agree that having a release that is stable
> and usable right out of the gate is more important than having any
> single feature get into that release.

Sure, 9.4 needs to be out of the gate. I don't think anybody doubts
that.

But the scheduling of commits with regard to the 9.5 schedule actually
opens a relevant question: When are we planning to release 9.5? Because
If we try ~ one year from now it's a whole different ballgame than if we
try to go back to september. And I think there's pretty good arguments
for both.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: logical column ordering

From
Stephen Frost
Date:
* Andres Freund (andres@2ndquadrant.com) wrote:
> But the scheduling of commits with regard to the 9.5 schedule actually
> opens a relevant question: When are we planning to release 9.5? Because
> If we try ~ one year from now it's a whole different ballgame than if we
> try to go back to september. And I think there's pretty good arguments
> for both.

This should really be on its own thread for discussion...  I'm leaning,
at the moment at least, towards the September release schedule.  I agree
that having a later release would allow us to get more into it, but
there's a lot to be said for the consistency we've kept up over the past
few years with a September (our last non-September release was 8.4).

I'd certainly vote against planning for a mid-December release as, at
least in my experience, it's a bad idea to try and do December (or
January 1..) major releases.  October might work, but that's not much of
a change from September.  Late January or February would probably work
but that's quite a shift from September and don't think it'd be
particularly better.  Worse, I'm nervous we might get into a habit of
longer and longer releases.  Having yearly releases, imv at least, is
really good for the project and those who depend on it.  New features
are available pretty quickly to end-users and people can plan around our
schedule pretty easily (eg- plan to do DB upgrades in January/February).
Thanks,
    Stephen

Re: logical column ordering

From
Josh Berkus
Date:
On 12/10/2014 05:14 PM, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
>> > But the scheduling of commits with regard to the 9.5 schedule actually
>> > opens a relevant question: When are we planning to release 9.5? Because
>> > If we try ~ one year from now it's a whole different ballgame than if we
>> > try to go back to september. And I think there's pretty good arguments
>> > for both.
> This should really be on its own thread for discussion...  I'm leaning,
> at the moment at least, towards the September release schedule.  I agree
> that having a later release would allow us to get more into it, but
> there's a lot to be said for the consistency we've kept up over the past
> few years with a September (our last non-September release was 8.4).

Can we please NOT discuss this in the thread about someone's patch?  Thanks.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Josh Berkus
Date:
On 12/09/2014 09:11 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Question on COPY, though: there's reasons why people would want COPY to
>> dump in either physical or logical order.  If you're doing COPY to
>> create CSV files for output, then you want the columns in logical order.
>>  If you're doing COPY for pg_dump, then you want them in physical order
>> for faster dump/reload.  So we're almost certainly going to need to have
>> an option for COPY.
> 
> This is complete nonsense, Josh, or at least it is until you can provide
> some solid evidence to believe that column ordering would make any
> noticeable performance difference in COPY.  I know of no reason to believe
> that the existing user-defined-column-ordering option makes any difference.


Chill, dude, chill.  When we have a patch, I'll do some performance
testing, and we'll see if it's something we care about or not. There's
no reason to be belligerent about it.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



9.5 release scheduling (was Re: logical column ordering)

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 12/10/2014 05:14 PM, Stephen Frost wrote:
>> * Andres Freund (andres@2ndquadrant.com) wrote:
>>> But the scheduling of commits with regard to the 9.5 schedule actually
>>> opens a relevant question: When are we planning to release 9.5? Because
>>> If we try ~ one year from now it's a whole different ballgame than if we
>>> try to go back to september. And I think there's pretty good arguments
>>> for both.

>> This should really be on its own thread for discussion...  I'm leaning,
>> at the moment at least, towards the September release schedule.  I agree
>> that having a later release would allow us to get more into it, but
>> there's a lot to be said for the consistency we've kept up over the past
>> few years with a September (our last non-September release was 8.4).

> Can we please NOT discuss this in the thread about someone's patch?  Thanks.

Quite.  So, here's a new thread.

MHO is that, although 9.4 has slipped more than any of us would like,
9.5 development launched right on time in August.  So I don't see a
good reason to postpone 9.5 release just because 9.4 has slipped.
I think we should stick to the schedule agreed to in Ottawa last spring.

Comments?
        regards, tom lane



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Josh Berkus
Date:
On 12/10/2014 09:35 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 12/10/2014 05:14 PM, Stephen Frost wrote:
>>> * Andres Freund (andres@2ndquadrant.com) wrote:
>>>> But the scheduling of commits with regard to the 9.5 schedule actually
>>>> opens a relevant question: When are we planning to release 9.5? Because
>>>> If we try ~ one year from now it's a whole different ballgame than if we
>>>> try to go back to september. And I think there's pretty good arguments
>>>> for both.
> 
>>> This should really be on its own thread for discussion...  I'm leaning,
>>> at the moment at least, towards the September release schedule.  I agree
>>> that having a later release would allow us to get more into it, but
>>> there's a lot to be said for the consistency we've kept up over the past
>>> few years with a September (our last non-September release was 8.4).
> 
>> Can we please NOT discuss this in the thread about someone's patch?  Thanks.
> 
> Quite.  So, here's a new thread.
> 
> MHO is that, although 9.4 has slipped more than any of us would like,
> 9.5 development launched right on time in August.  So I don't see a
> good reason to postpone 9.5 release just because 9.4 has slipped.
> I think we should stick to the schedule agreed to in Ottawa last spring.
> 
> Comments?

If anything, it seems like a great reason to try to get 9.5 out BEFORE
we open the tree for 9.6/10.0/Cloud++.  While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

So far, I haven't seen any features for 9.5 which would delay a more
timely release the way we did for 9.4.  Anybody know of a bombshell
someone's going to drop on us for CF5?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Peter Geoghegan
Date:

On Wed, Dec 10, 2014 at 10:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
So far, I haven't seen any features for 9.5 which would delay a more
timely release the way we did for 9.4.  Anybody know of a bombshell
someone's going to drop on us for CF5?

I had wondered about that myself. What about jsquery? Is that something that is planned for submission some time in the current cycle?

FWIW, I strongly doubt that I'll find time to work on anything like that for 9.5.

--
Peter Geoghegan

Re: 9.5 release scheduling (was Re: logical column ordering)

From
Robert Haas
Date:
On Thu, Dec 11, 2014 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Quite.  So, here's a new thread.
>
> MHO is that, although 9.4 has slipped more than any of us would like,
> 9.5 development launched right on time in August.  So I don't see a
> good reason to postpone 9.5 release just because 9.4 has slipped.
> I think we should stick to the schedule agreed to in Ottawa last spring.
>
> Comments?

I'm fine with that, but in the spirit of playing the devil's advocate:

1. At the development meeting, Simon argued for the 5CF schedule for
this release, with CF5 not starting until February, as a way of making
sure that there was time after the release of 9.4 to get feedback from
actual users in time to do something about it for 9.5.  If anything,
we're going to end up being significantly worse off in that regard
than we would have been, because we're releasing in late December
instead of early September; an extra month of time to get patches in
does not make up for a release that was delayed nearly three months.

2. It's not clear that we're going to have a particularly-impressive
list of major features for 9.5.  So far we've got RLS and BRIN. I
expect that GROUPING SETS is far enough along that it should be
possible to get it in before development ends, and there are a few
performance patches pending (Andres's lwlock scalability patches,
Rahila's work on compressing full-page writes) that I think will
probably make the grade.  But after that it seems to me that it gets
pretty thin on the ground.  Are we going to bill commit timestamp
tracking - with replication node ID tracking as the real goal, despite
the name - as a major feature, or DDL deparsing if that goes in, as
major features?  As useful as they may be for BDR, they don't strike
me as things we can publicize as major features independent of BDR.
And it's getting awfully late for any other major work that people are
thinking of to start showing up.

Now, against all that, if we don't get back on our usual release
schedule then (a) it will look like we're losing momentum, which I'm
actually afraid may be true rather than merely a perception, and (b)
people whose stuff did get in will have to wait longer to see it
released.  So, I'm not sure waiting is any better.  But there are
certainly some things not to like about where we are.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Robert Haas
Date:
On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <josh@agliodbs.com> wrote:
> While there were technical
> issues, 9.4 dragged a considerable amount because most people were
> ignoring it in favor of 9.5 development.

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.  And it became pretty clear early on in that
discussion that it was not going to be resolved until Tom signed off
on some proposed fix.  Which I think points to the issue Bruce
mentioned about how busy many of our key contributors are.  I could
have easily spent four times as much time doing reviewing and
committing over the last few months, but I'm not willing to work an 80
or 100 hour week to make that happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> While there were technical
>> issues, 9.4 dragged a considerable amount because most people were
>> ignoring it in favor of 9.5 development.

> I think 9.4 dragged almost entirely because of one issue: the
> compressibility of JSONB.

Meh.  While we certainly weren't very speedy about resolving that,
I don't think that issue deserves all or even most of the blame.
I agree with Josh: the problem really was that people were not focusing
on getting 9.4 tested and releasable.  One way in which that lack of
focus manifested was not having any urgency about resolving JSONB ...
but it struck me as a systemic problem and not that specific issue's
fault.

I'd finger two underlying issues here:

1. As Bruce points out in a nearby thread, we've been in commitfest mode
more or less continuously since August.  That inherently sucks away
developer mindshare from testing already-committed stuff.

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it.  Personally, I'm very worried that
there are other such bugs in 9.4.  But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

I don't know what to do about either of those things, but I predict
future release cycles will be just as bad unless we can fix them.
        regards, tom lane



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Bruce Momjian
Date:
On Thu, Dec 11, 2014 at 10:37:32AM -0500, Robert Haas wrote:
> 2. It's not clear that we're going to have a particularly-impressive
> list of major features for 9.5.  So far we've got RLS and BRIN. I
> expect that GROUPING SETS is far enough along that it should be
> possible to get it in before development ends, and there are a few
> performance patches pending (Andres's lwlock scalability patches,
> Rahila's work on compressing full-page writes) that I think will
> probably make the grade.  But after that it seems to me that it gets
> pretty thin on the ground.  Are we going to bill commit timestamp
> tracking - with replication node ID tracking as the real goal, despite
> the name - as a major feature, or DDL deparsing if that goes in, as
> major features?  As useful as they may be for BDR, they don't strike
> me as things we can publicize as major features independent of BDR.
> And it's getting awfully late for any other major work that people are
> thinking of to start showing up.

How bad is the 9.5 feature list going to be compared to the 9.4 one that
had JSONB, but also a lot of infrastructure additions.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Dec 11, 2014 at 10:37:32AM -0500, Robert Haas wrote:
>> 2. It's not clear that we're going to have a particularly-impressive
>> list of major features for 9.5.

> How bad is the 9.5 feature list going to be compared to the 9.4 one that
> had JSONB, but also a lot of infrastructure additions.

Well, whatever the list ends up being, "let's wait until we have some
more features" isn't a tenable scheduling policy.  We learned years ago
the folly of delaying a release until not-quite-ready feature X was ready.
Are we going to delay 9.5 until not-even-proposed-yet features are ready?

More abstractly, there's a lot of value in having a predictable release
schedule.  That's going to mean that some release cycles are thin on
user-visible features, even if just as much work went into them.  It's
the nature of the game.
        regards, tom lane



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Robert Haas
Date:
On Thu, Dec 11, 2014 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think 9.4 dragged almost entirely because of one issue: the
>> compressibility of JSONB.
>
> Meh.  While we certainly weren't very speedy about resolving that,
> I don't think that issue deserves all or even most of the blame.
> I agree with Josh: the problem really was that people were not focusing
> on getting 9.4 tested and releasable.  One way in which that lack of
> focus manifested was not having any urgency about resolving JSONB ...
> but it struck me as a systemic problem and not that specific issue's
> fault.
>
> I'd finger two underlying issues here:
>
> 1. As Bruce points out in a nearby thread, we've been in commitfest mode
> more or less continuously since August.  That inherently sucks away
> developer mindshare from testing already-committed stuff.

The problem is that, on the one hand, we have a number of serious
problems with things that got committed and turned out to have
problems - the multixact stuff, and JSONB, in particular - and on the
other hand, we are lacking in adequate committer bandwidth to properly
handle all of the new patches that come in.  We can fix the first
problem by tightening up on the requirements for committing things,
but that exacerbates the second problem.  Or we can fix the second
problem by loosening up on the requirements for commit, but that
exacerbates the first problem.  Promoting more or fewer committers is
really the same trade-off: if you're very careful about who you
promote, you'll get better people but not as many of them, so less
will get done but with fewer mistakes; if you're more generous in
handing out commit bits, you reduce the bottleneck to stuff getting
done but, inevitably, you'll be trusting people in whom you have at
least slightly less confidence.  There's an inherent tension between
quality and rate of progress that we can't get rid of, and the fact
that some of our best people are busier than ever with things other
than PostgreSQL hacking is not helping - not only because less actual
review/commit happens, but because newcomers to the community don't
have as much contact with the more senior people who could help mentor
them if they only had the time.

> 2. The amount of pre-release testing we get from people outside the
> hard-core development crowd seems to be continuing to decrease.
> We were fortunate that somebody found the JSONB issue before it was
> too late to do anything about it.  Personally, I'm very worried that
> there are other such bugs in 9.4.  But I've given up hoping that any
> more testing will happen until we put out something that calls itself
> 9.4.0, which is why I voted to release in the core discussion about it.
>
> I don't know what to do about either of those things, but I predict
> future release cycles will be just as bad unless we can fix them.

I agree.  We have had a few people, Jeff Janes perhaps foremost among
them, who have done a lot of really useful testing, but overall it
does feel pretty thin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.5 release scheduling (was Re: logical column ordering)

From
David G Johnston
Date:
Tom Lane-2 wrote
> Robert Haas <

> robertmhaas@

> > writes:
>> On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <

> josh@

> > wrote:
>>> While there were technical
>>> issues, 9.4 dragged a considerable amount because most people were
>>> ignoring it in favor of 9.5 development.
> 
>> I think 9.4 dragged almost entirely because of one issue: the
>> compressibility of JSONB.
> 
> 2. The amount of pre-release testing we get from people outside the
> hard-core development crowd seems to be continuing to decrease.
> We were fortunate that somebody found the JSONB issue before it was
> too late to do anything about it.  Personally, I'm very worried that
> there are other such bugs in 9.4.  But I've given up hoping that any
> more testing will happen until we put out something that calls itself
> 9.4.0, which is why I voted to release in the core discussion about it.

The compressibility properties of a new type seem like something that should
be mandated before it is committed - it shouldn't require good fortune that



--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5830115.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: 9.5 release scheduling (was Re: logical column ordering)

From
David G Johnston
Date:
David G Johnston wrote
> 
> Tom Lane-2 wrote
>> Robert Haas <

>> robertmhaas@

>> > writes:
>>> On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <

>> josh@

>> > wrote:
>>>> While there were technical
>>>> issues, 9.4 dragged a considerable amount because most people were
>>>> ignoring it in favor of 9.5 development.
>> 
>>> I think 9.4 dragged almost entirely because of one issue: the
>>> compressibility of JSONB.
>> 
>> 2. The amount of pre-release testing we get from people outside the
>> hard-core development crowd seems to be continuing to decrease.
>> We were fortunate that somebody found the JSONB issue before it was
>> too late to do anything about it.  Personally, I'm very worried that
>> there are other such bugs in 9.4.  But I've given up hoping that any
>> more testing will happen until we put out something that calls itself
>> 9.4.0, which is why I voted to release in the core discussion about it.
> The compressibility properties of a new type seem like something that
> should be mandated before it is committed - it shouldn't require good
> fortune that

... someone thought to test it out.  Is there anywhere to effectively add
that to maximize the chance someone remembers for next time?

3. Effort and brain power was also diverted to fixing 9.3 this time around.

I don't have any answers but if a release is thin on features but thick on
review and testing I wouldn't complain.  That testing, especially applied to
existing releases, would have considerable benefit to those still using
older supported releases instead of only benefitting those who can upgrade
to the latest and greatest.  An existing user on an older release is just,
if not more, important than the potential user who is looking for new
features before deciding to migrate or begin using PostgreSQL.

David J.




--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5830116.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Heikki Linnakangas
Date:
On 12/11/2014 06:59 PM, Robert Haas wrote:
> On Thu, Dec 11, 2014 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think 9.4 dragged almost entirely because of one issue: the
>>> compressibility of JSONB.
>>
>> Meh.  While we certainly weren't very speedy about resolving that,
>> I don't think that issue deserves all or even most of the blame.
>> I agree with Josh: the problem really was that people were not focusing
>> on getting 9.4 tested and releasable.  One way in which that lack of
>> focus manifested was not having any urgency about resolving JSONB ...
>> but it struck me as a systemic problem and not that specific issue's
>> fault.
>>
>> I'd finger two underlying issues here:
>>
>> 1. As Bruce points out in a nearby thread, we've been in commitfest mode
>> more or less continuously since August.  That inherently sucks away
>> developer mindshare from testing already-committed stuff.
>
> The problem is that, on the one hand, we have a number of serious
> problems with things that got committed and turned out to have
> problems - the multixact stuff, and JSONB, in particular - and on the
> other hand, we are lacking in adequate committer bandwidth to properly
> handle all of the new patches that come in.  We can fix the first
> problem by tightening up on the requirements for committing things,
> but that exacerbates the second problem.  Or we can fix the second
> problem by loosening up on the requirements for commit, but that
> exacerbates the first problem.

There is a third option: reject more patches, more quickly, with less 
discussion. IOW, handle new patches "less properly".

The commitfest is good at making sure that no patch completely falls off 
the radar. That's also a problem. Before the commitfest process, many 
patches were not actively rejected, but they just fell to the side if no 
committer was interested enough in it to pick it up, review, and commit.

There are a lot of patches in the October commitfest that I personally 
don't care about, and if I was a dictator I would just drop as "not 
worth the trouble to review". Often a patch just doesn't feel quite 
right, or would require some work to clean up, but you can't immediately 
point to anything outright wrong with it. It takes some effort to review 
such a patch enough to give feedback on it, if you want more meaningful 
feedback than "This smells bad".

I imagine that it's the same for everyone else. Many of the patches that 
sit in the commitfest for weeks are patches that no-one really cares 
much about. I'm not sure what to do about that. It would be harsh to 
reject a patch just because no-one's gotten around to reviewing it, and 
if we start doing that, it's not clear what the point of a commitfest is 
any more.

Perhaps we should change the process so that it is the patch author's 
responsibility to find a reviewer, and a committer, for the patch. If 
you can't cajole anyone to review your patch, it's a sign that no-one 
cares about it, and the patch is rejected by default. Or put a quick 
+1/-1 voting option to each patch in the commitfest, to get a quick 
gauge of how the committers feel about it.

- Heikki




Re: 9.5 release scheduling (was Re: logical column ordering)

From
Peter Geoghegan
Date:
On Thu, Dec 11, 2014 at 7:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> 2. It's not clear that we're going to have a particularly-impressive
> list of major features for 9.5.  So far we've got RLS and BRIN. I
> expect that GROUPING SETS is far enough along that it should be
> possible to get it in before development ends, and there are a few
> performance patches pending (Andres's lwlock scalability patches,
> Rahila's work on compressing full-page writes) that I think will
> probably make the grade.  But after that it seems to me that it gets
> pretty thin on the ground.

I'm slightly surprised that you didn't mention abbreviated keys in
that list of performance features. You're reviewing that patch; how do
you feel about it now?

> Are we going to bill commit timestamp
> tracking - with replication node ID tracking as the real goal, despite
> the name - as a major feature, or DDL deparsing if that goes in, as
> major features?  As useful as they may be for BDR, they don't strike
> me as things we can publicize as major features independent of BDR.
> And it's getting awfully late for any other major work that people are
> thinking of to start showing up.

Version 1.0 of INSERT ... ON CONFLICT UPDATE was posted in August -
when development launched. It still doesn't have a reviewer, and it
isn't actually in evidence that someone else has so much as downloaded
and applied the patch (I'm sure someone has done that much, but the
fact is that all the feedback that I've received this year concerns
the semantics/syntax, which you can form an opinion on by just looking
at the extensive documentation and other supplementary material I've
written). It's consistently one of the most requested features, and
yet major aspects of the design, that permeate through every major
subsystem go unremarked on for months now. This feature is
*definitely* major feature list material, since people have been
loudly requesting it for over a decade, and yet no one mentions it in
this thread (only Bruce mentioned it in the other thread about the
effectiveness of the CF process). It's definitely in the top 2 or 3
most requested features, alongside much harder problems like parallel
query and comprehensive partitioning support.

If there is a lesson here for people that are working on major
features, or me personally, I don't know what it is -- if anyone else
knows, please tell me. I've bent over backwards to make the patch as
accessible as possible, and as easy to review as possible. I also
think its potential to destabilize the system (as major features go)
is only about average. What am I doing wrong here?

There is an enormous amount of supplementary documentation associated
with the patch:

https://wiki.postgresql.org/wiki/UPSERT
https://wiki.postgresql.org/wiki/Value_locking

Both of these pages are far larger than the Wiki page for RLS, for
example. The UPSERT wiki page is kept right up to date.

-- 
Peter Geoghegan



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Jeff Janes
Date:
On Thu, Dec 11, 2014 at 8:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it.  Personally, I'm very worried that
there are other such bugs in 9.4.  But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.


We are not particularly inviting of feedback for whatever testing has been done.

The definitive guide seems to be https://wiki.postgresql.org/wiki/HowToBetaTest, and says:

You can report tests by email. You can subscribe to any PostgreSQL mailing list from the subscription form.

  • pgsql-bugs: this is the preferred mailing list if you think you have found a bug in the beta. You can also use the Bug Reporting Form.
  • pgsql-hackers: bugs, questions, and successful test reports are welcome here if you are already subscribed to pgsql-hackers. Note that pgsql-hackers is a high-traffic mailing list with a lot of development discussion.

=========

So if you find a bug, you can report it on the bug reporting form (which doesn't have a drop-down entry for 9.4RC1).

If you have positive results rather than negative ones (or even complaints that are not actually bugs), you can subscribe to a mailing list which generates a lot of traffic which is probably over your head and not interesting to you.

Does the core team keep a mental list of items they want to see tested by the public, and they will spend their own time testing those things themselves if they don't hear back on some positive tests for them?

If we find reports of public testing that yields good results (or at least no bugs) to be useful, we should be more clear on how to go about doing it.  But are positive reports useful?  If I report a bug, I can write down the steps to reproduce it, and then follow my own instructions to make sure it does actually reproduce it.  If I find no bugs, it is just "I did a bunch of random stuff and nothing bad happened, that I noticed".
 
Chees,

Jeff

Re: 9.5 release scheduling (was Re: logical column ordering)

From
Robert Haas
Date:
On Thu, Dec 11, 2014 at 1:06 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Dec 11, 2014 at 7:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 2. It's not clear that we're going to have a particularly-impressive
>> list of major features for 9.5.  So far we've got RLS and BRIN. I
>> expect that GROUPING SETS is far enough along that it should be
>> possible to get it in before development ends, and there are a few
>> performance patches pending (Andres's lwlock scalability patches,
>> Rahila's work on compressing full-page writes) that I think will
>> probably make the grade.  But after that it seems to me that it gets
>> pretty thin on the ground.
>
> I'm slightly surprised that you didn't mention abbreviated keys in
> that list of performance features. You're reviewing that patch; how do
> you feel about it now?

I'm not sure it's where I think it needs to be yet, but yeah, I think
that will get in.  I thought of it after I hit send.

>> Are we going to bill commit timestamp
>> tracking - with replication node ID tracking as the real goal, despite
>> the name - as a major feature, or DDL deparsing if that goes in, as
>> major features?  As useful as they may be for BDR, they don't strike
>> me as things we can publicize as major features independent of BDR.
>> And it's getting awfully late for any other major work that people are
>> thinking of to start showing up.
>
> Version 1.0 of INSERT ... ON CONFLICT UPDATE was posted in August -
> when development launched. It still doesn't have a reviewer, and it
> isn't actually in evidence that someone else has so much as downloaded
> and applied the patch (I'm sure someone has done that much, but the
> fact is that all the feedback that I've received this year concerns
> the semantics/syntax, which you can form an opinion on by just looking
> at the extensive documentation and other supplementary material I've
> written). It's consistently one of the most requested features, and
> yet major aspects of the design, that permeate through every major
> subsystem go unremarked on for months now. This feature is
> *definitely* major feature list material, since people have been
> loudly requesting it for over a decade, and yet no one mentions it in
> this thread (only Bruce mentioned it in the other thread about the
> effectiveness of the CF process). It's definitely in the top 2 or 3
> most requested features, alongside much harder problems like parallel
> query and comprehensive partitioning support.

I'm not sure whether that patch is going to go anywhere.  There has
been so much arguing about different aspects of the design that felt
rather unproductive that I think most of the people who would be
qualified to commit it have grown a bit gun-shy.  That would be a good
problem to get fixed, but I don't have a proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Josh Berkus
Date:
On 12/11/2014 09:22 AM, Heikki Linnakangas wrote:

> I imagine that it's the same for everyone else. Many of the patches that
> sit in the commitfest for weeks are patches that no-one really cares
> much about. I'm not sure what to do about that. It would be harsh to
> reject a patch just because no-one's gotten around to reviewing it, and
> if we start doing that, it's not clear what the point of a commitfest is
> any more.

So the "nobody cares" argument is manifestly based on false assumptions.Are you contending that nobody cares about
UPSERTor GROUPING SETS?  In
 
my experience, the patches that sit for weeks on the CF fall into 4 groups:

1. Large complicated patches that only a handful of committers are even
capable of reviewing.

2. Obscure patches which require specialized knowledge our outside input
to review.

3. Inconsequential patches where the submitter doesn't work the social
process.

4. Patches with contentious spec or syntax.

5. Patches which everyone wants but have persistent hard-to-resolve issues.

Of these only (3) would fit into "nobody cares", and that's a pretty
small group.

There's also a chicken-and-egg problem there.  Say that we started not
reviewing DW features because "nobody cares".  Then the people who
contributed those features don't go on to become major contributors,
which means they won't review further DW patches.  Which means that
we've just closed off an entire use-case for PostgreSQL.  I'd think that
PostGIS would have taught us the value of the "nobody cares" fallacy.

Also, if we go back on the promise that "every patch gets a review",
then we're definitely headed towards no more new contributors.  As Noah
said at one developer meeting (to paraphrase), one of the few things
which keeps contributors persisting through our baroque,
poorly-documented, excessively political contribution process is the
promise that they'll get a fair shake for their invested time.  If we
drop that promise, we'll solve our workflow problem by cutting off the
flow of new patches entirely.

> Perhaps we should change the process so that it is the patch author's
> responsibility to find a reviewer, and a committer, for the patch. If
> you can't cajole anyone to review your patch, it's a sign that no-one
> cares about it, and the patch is rejected by default. Or put a quick
> +1/-1 voting option to each patch in the commitfest, to get a quick
> gauge of how the committers feel about it.

Again, that process would favor existing contributors and other folks
who know how to "work the Postgres community".  It would be effectively
the same as hanging up a sign which says "no new contributors wanted".
It would also be dramatically increasing the amount of politics around
submitted patches, which take up far more time than the technical work.

Overall, we're experiencing this issue because of a few predictable reasons:

1. a gradual increase in the number of submitted patches, especially
large patches

2. Tom Lane cutting back on the number of other people's patches he
reviews and revises.

3. Other major committers getting busier with their day jobs.

4. Failure to recruit, mentor and promote new committers at a rate
proportional to the number of new contributors or the size of our community.

5. Failure to adopt or develop automated systems to remove some of the
grunt work from patch submission and review.

6. Failure to adhere to any uniform standards of patch handing for the
Commitfests.

(2) out of these is actually the biggest thing we're seeing right now, I
think.   Tom was historically a one-man-patch-fixing machine, at one
time responsible for 70% of the patches committed to PostgreSQL.  This
was never a good thing for the community, even if it was a good thing
for the code base, becuase it discouraged others from stepping into a
senior committer role.  Now Tom has cut back, and others have to take up
the slack.  In the long run this will be good for our development
community; in the short run it's kind of painful.

I will also point out that some of the existing senior committers, who
are the heaviest burdened under the existing system, have also been the
most resistant to any change in the system.  You (Heikki) yourself
expressed a strong opposition to any attempt to recruit more reviewers.
So, given that you are inside the heart of the problem, do you have a
solution other than your proposal above that we simply stop accepting
new contributors?  Or is that your solution?  It would work, for some
definition of "work".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Josh Berkus
Date:
On 12/11/2014 08:59 AM, Tom Lane wrote:
> More abstractly, there's a lot of value in having a predictable release
> schedule.  That's going to mean that some release cycles are thin on
> user-visible features, even if just as much work went into them.  It's
> the nature of the game.

+ 1,000,000 from me.  ;-)

Frankly, BRIN, UPSERT and a couple other things are plenty for a
Postgres release.  Other SQL databases would be thrilled to have that
much ... can you name 3 major advances in the last MySQL release?

And given that I've seen nothing about jquery/VODKA since pgCon, I'm
expecting them for 9.6/whatever, not 9.5.  There's a whole longish
syntax discussion we haven't even started yet, let alone actual
technical review.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Heikki Linnakangas
Date:
On 12/11/2014 08:51 PM, Josh Berkus wrote:
> On 12/11/2014 09:22 AM, Heikki Linnakangas wrote:
>
>> I imagine that it's the same for everyone else. Many of the patches that
>> sit in the commitfest for weeks are patches that no-one really cares
>> much about. I'm not sure what to do about that. It would be harsh to
>> reject a patch just because no-one's gotten around to reviewing it, and
>> if we start doing that, it's not clear what the point of a commitfest is
>> any more.
>
> So the "nobody cares" argument is manifestly based on false assumptions.
>   Are you contending that nobody cares about UPSERT or GROUPING SETS?

No. I was thinking of patches like "Add IF NOT EXISTS to CREATE TABLE AS 
and CREATE MATERIALIZED VIEW", "event triggers: more DROP info", "Point 
to polygon distance operator" and "pgcrypto: support PGP signatures". 
And nobody was an exaggeration; clearly *someone* cares about those 
things, or they wouldn't have written a patch in the first place. But 
none of the committers care enough to pick them up. Even in the case 
that someone reviews them, it's often not because the reviewer is 
interested in the patch, it's just to help out with the process.

(Apologies to the authors of those patches; they were just the first few 
that caught my eye)

> There's also a chicken-and-egg problem there.  Say that we started not
> reviewing DW features because "nobody cares".  Then the people who
> contributed those features don't go on to become major contributors,
> which means they won't review further DW patches.  Which means that
> we've just closed off an entire use-case for PostgreSQL.  I'd think that
> PostGIS would have taught us the value of the "nobody cares" fallacy.
>
> Also, if we go back on the promise that "every patch gets a review",
> then we're definitely headed towards no more new contributors.  As Noah
> said at one developer meeting (to paraphrase), one of the few things
> which keeps contributors persisting through our baroque,
> poorly-documented, excessively political contribution process is the
> promise that they'll get a fair shake for their invested time.  If we
> drop that promise, we'll solve our workflow problem by cutting off the
> flow of new patches entirely.

Yeah, there is that.

>> Perhaps we should change the process so that it is the patch author's
>> responsibility to find a reviewer, and a committer, for the patch. If
>> you can't cajole anyone to review your patch, it's a sign that no-one
>> cares about it, and the patch is rejected by default. Or put a quick
>> +1/-1 voting option to each patch in the commitfest, to get a quick
>> gauge of how the committers feel about it.
>
> Again, that process would favor existing contributors and other folks
> who know how to "work the Postgres community".  It would be effectively
> the same as hanging up a sign which says "no new contributors wanted".
> It would also be dramatically increasing the amount of politics around
> submitted patches, which take up far more time than the technical work.

I was thinking that by getting candid feedback that none of the existing 
contributors are interested to look at your patch, the author could 
revise the patch to garner more interest, or perhaps promise to review 
someone else's patch in return. Right now, the patches just linger, and 
the author doesn't know why, what's going to happen or what to do about it.

> I will also point out that some of the existing senior committers, who
> are the heaviest burdened under the existing system, have also been the
> most resistant to any change in the system.  You (Heikki) yourself
> expressed a strong opposition to any attempt to recruit more reviewers.

Huh, I did? Can you elaborate?

> So, given that you are inside the heart of the problem, do you have a
> solution other than your proposal above that we simply stop accepting
> new contributors?  Or is that your solution?  It would work, for some
> definition of "work".

I don't have any good solutions, I'm afraid. It might help to ping the 
reviewers who have signed up more often, to make the reviews to happen 
more quickly. I did some nudge people in the August commitfest, but I 
felt that it didn't actually achieve much. The most efficient way to 
move the commitfest forward was to just review more patches myself.

That's one thought. Robert said the same thing about when he was the 
commitfest manager; he just reviewed most the patches himself in the 
end. And you mentioned that Tom used to review 70% of all incoming 
patches. How about we make that official? It's the commitfest manager's 
duty to review all the patches. He can recruit others if he can, but at 
the end of the day he's expected to take a look at every patch, and 
commit or return with some feedback.

The problem with that is that we'll have a hard time to find volunteers 
for that. But we only need to find one sucker for each commitfest. I can 
volunteer to do that once a year; if the other active committers do the 
same, we're covered.

- Heikki




Re: 9.5 release scheduling (was Re: logical column ordering)

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:

> That's one thought. Robert said the same thing about when he was the
> commitfest manager; he just reviewed most the patches himself in the end.
> And you mentioned that Tom used to review 70% of all incoming patches. How
> about we make that official? It's the commitfest manager's duty to review
> all the patches. He can recruit others if he can, but at the end of the day
> he's expected to take a look at every patch, and commit or return with some
> feedback.
> 
> The problem with that is that we'll have a hard time to find volunteers for
> that. But we only need to find one sucker for each commitfest. I can
> volunteer to do that once a year; if the other active committers do the
> same, we're covered.

Hm, I was commitfest manager once, too, and this exact same thing
happened.  Maybe we need to make that official, and give the sucker some
perk.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Peter Geoghegan
Date:
On Thu, Dec 11, 2014 at 11:52 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> The problem with that is that we'll have a hard time to find volunteers for
>> that. But we only need to find one sucker for each commitfest. I can
>> volunteer to do that once a year; if the other active committers do the
>> same, we're covered.
>
> Hm, I was commitfest manager once, too, and this exact same thing
> happened.  Maybe we need to make that official, and give the sucker some
> perk.

I should do it at least once, if only because I haven't experienced it.
-- 
Peter Geoghegan



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Peter Geoghegan
Date:
On Thu, Dec 11, 2014 at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Version 1.0 of INSERT ... ON CONFLICT UPDATE was posted in August -
>> when development launched. It still doesn't have a reviewer, and it
>> isn't actually in evidence that someone else has so much as downloaded
>> and applied the patch

> I'm not sure whether that patch is going to go anywhere.  There has
> been so much arguing about different aspects of the design that felt
> rather unproductive that I think most of the people who would be
> qualified to commit it have grown a bit gun-shy.  That would be a good
> problem to get fixed, but I don't have a proposal.

Really?

I have acted comprehensively on 100% of feedback to date on the
semantics/syntax of the ON CONFLICT UPDATE patch. The record reflects
that. I don't believe that the problem is that people are gun shy. We
haven't even had a real disagreement since last year, and last year's
discussion of value locking was genuinely very useful (I hate to say
it, but the foreign key locking patch might have considered the
possibility of "unprincipled deadlocks" more closely, as we saw
recently).

Lots of other systems have a comparable feature. Most recently, VoltDB
added a custom "UPSERT", even though they don't have half the SQL
features we do. It's a complicated feature, but it's not a
particularly complicated feature as big, impactful features go (like
LATERAL, or logical decoding, or the foreign key locking stuff). It's
entirely possible to get the feature in in the next few months, if
someone will work with me on it.

Even Heikki, who worked on this with me far more than everyone else,
found the value locking page a useful summary. He didn't even know
that there was a third design advocated by Simon and Andres until he
saw it! The discussion was difficult, but had a useful outcome,
because accounting for "unprincipled deadlocks" is a major source of
complexity. I have seen a lot of benefit from comprehensively
documenting stuff in one place (the wiki pages), but that has tapered
off.
-- 
Peter Geoghegan



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Alvaro Herrera
Date:
Peter Geoghegan wrote:
> On Thu, Dec 11, 2014 at 11:52 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> >> The problem with that is that we'll have a hard time to find volunteers for
> >> that. But we only need to find one sucker for each commitfest. I can
> >> volunteer to do that once a year; if the other active committers do the
> >> same, we're covered.
> >
> > Hm, I was commitfest manager once, too, and this exact same thing
> > happened.  Maybe we need to make that official, and give the sucker some
> > perk.
> 
> I should do it at least once, if only because I haven't experienced it.

We could make an slogan out of it: you've never experienced the
PostgreSQL development process if you haven't been a CFM at least once
in your life.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Bruce Momjian
Date:
On Thu, Dec 11, 2014 at 11:59:58AM -0500, Robert Haas wrote:
> The problem is that, on the one hand, we have a number of serious
> problems with things that got committed and turned out to have
> problems - the multixact stuff, and JSONB, in particular - and on the
> other hand, we are lacking in adequate committer bandwidth to properly
> handle all of the new patches that come in.  We can fix the first
> problem by tightening up on the requirements for committing things,
> but that exacerbates the second problem.  Or we can fix the second
> problem by loosening up on the requirements for commit, but that
> exacerbates the first problem.  Promoting more or fewer committers is
> really the same trade-off: if you're very careful about who you
> promote, you'll get better people but not as many of them, so less
> will get done but with fewer mistakes; if you're more generous in
> handing out commit bits, you reduce the bottleneck to stuff getting
> done but, inevitably, you'll be trusting people in whom you have at
> least slightly less confidence.  There's an inherent tension between
> quality and rate of progress that we can't get rid of, and the fact
> that some of our best people are busier than ever with things other
> than PostgreSQL hacking is not helping - not only because less actual
> review/commit happens, but because newcomers to the community don't
> have as much contact with the more senior people who could help mentor
> them if they only had the time.

Great outline of the tradeoffs involved in  being more aggressive about
committing. I do think the multixact and JSONB problems have spooked
some of us to be slower/more careful about committing.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Jeff Janes
Date:
On Thu, Dec 11, 2014 at 11:40 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 12/11/2014 08:51 PM, Josh Berkus wrote:
On 12/11/2014 09:22 AM, Heikki Linnakangas wrote:


Perhaps we should change the process so that it is the patch author's
responsibility to find a reviewer, and a committer, for the patch. If
you can't cajole anyone to review your patch, it's a sign that no-one
cares about it, and the patch is rejected by default. Or put a quick
+1/-1 voting option to each patch in the commitfest, to get a quick
gauge of how the committers feel about it.

Again, that process would favor existing contributors and other folks
who know how to "work the Postgres community".  It would be effectively
the same as hanging up a sign which says "no new contributors wanted".
It would also be dramatically increasing the amount of politics around
submitted patches, which take up far more time than the technical work.

I was thinking that by getting candid feedback that none of the existing contributors are interested to look at your patch, the author could revise the patch to garner more interest, or perhaps promise to review someone else's patch in return. Right now, the patches just linger, and the author doesn't know why, what's going to happen or what to do about it.

I agree.  Having your patch disappear into the void is not friendly at all.  But I don't think a commentless "-1" is the answer, either.  That might one of the few things worse than silence.  Even if the comment is just "This seems awfully complicated for a minimally useful feature" or "This will probably have unintended side effects in the executor that I don't have time to trace down, can someone make an argument for correctness", or even "this format is unreadable, I won't review this until it is fixed" then at least the author (and perhaps more important, junior contributors who are not the author) will know either what argument to make to lobby for it, what avenue to take when reviewing it, or whether to just forget it and work on something more important.


Cheers,

Jeff

Re: 9.5 release scheduling (was Re: logical column ordering)

From
Robert Haas
Date:
On Thu, Dec 11, 2014 at 3:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I agree.  Having your patch disappear into the void is not friendly at all.
> But I don't think a commentless "-1" is the answer, either.  That might one
> of the few things worse than silence.  Even if the comment is just "This
> seems awfully complicated for a minimally useful feature" or "This will
> probably have unintended side effects in the executor that I don't have time
> to trace down, can someone make an argument for correctness", or even "this
> format is unreadable, I won't review this until it is fixed" then at least
> the author (and perhaps more important, junior contributors who are not the
> author) will know either what argument to make to lobby for it, what avenue
> to take when reviewing it, or whether to just forget it and work on
> something more important.

Agreed!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Bruce Momjian
Date:
On Thu, Dec 11, 2014 at 10:04:43AM -0700, David G Johnston wrote:
> Tom Lane-2 wrote
> > Robert Haas <
> 
> > robertmhaas@
> 
> > > writes:
> >> On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <
> 
> > josh@
> 
> > > wrote:
> >>> While there were technical
> >>> issues, 9.4 dragged a considerable amount because most people were
> >>> ignoring it in favor of 9.5 development.
> > 
> >> I think 9.4 dragged almost entirely because of one issue: the
> >> compressibility of JSONB.
> > 
> > 2. The amount of pre-release testing we get from people outside the
> > hard-core development crowd seems to be continuing to decrease.
> > We were fortunate that somebody found the JSONB issue before it was
> > too late to do anything about it.  Personally, I'm very worried that
> > there are other such bugs in 9.4.  But I've given up hoping that any
> > more testing will happen until we put out something that calls itself
> > 9.4.0, which is why I voted to release in the core discussion about it.
> 
> The compressibility properties of a new type seem like something that should
> be mandated before it is committed - it shouldn't require good fortune that

Odd are the next problem will have nothing to do with compressibility 
--- we can't assume old failure will repeat themselves.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: 9.5 release scheduling (was Re: logical column ordering)

From
David Johnston
Date:
On Thu, Dec 11, 2014 at 2:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Dec 11, 2014 at 10:04:43AM -0700, David G Johnston wrote:
> Tom Lane-2 wrote
> > Robert Haas &lt;
>
> > robertmhaas@
>
> > &gt; writes:
> >> On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus &lt;
>
> > josh@
>
> > &gt; wrote:
> >>> While there were technical
> >>> issues, 9.4 dragged a considerable amount because most people were
> >>> ignoring it in favor of 9.5 development.
> >
> >> I think 9.4 dragged almost entirely because of one issue: the
> >> compressibility of JSONB.
> >
> > 2. The amount of pre-release testing we get from people outside the
> > hard-core development crowd seems to be continuing to decrease.
> > We were fortunate that somebody found the JSONB issue before it was
> > too late to do anything about it.  Personally, I'm very worried that
> > there are other such bugs in 9.4.  But I've given up hoping that any
> > more testing will happen until we put out something that calls itself
> > 9.4.0, which is why I voted to release in the core discussion about it.
>
> The compressibility properties of a new type seem like something that should
> be mandated before it is committed - it shouldn't require good fortune that

Odd are the next problem will have nothing to do with compressibility
--- we can't assume old failure will repeat themselves.



​tl;dr: assign two people, a manager/curator and a lead reviewer​.  Give the curator better tools and the responsibility to engage the community.  If the primary reviewer cannot review a patch in the current commitfest it can be marked "awaiting reviewer" and moved to the next CF for evaluation by the next pair of individuals.  At minimum, though, if it does get moved the manager AND reviewer need to comment on why it was not handled during their CF.  Also, formalize documentation targeting developers and reviewers just like the documentation for users has been formalized and committed to the repository.  That knowledge and history is probably more important that source code commit log and definitely should be more accessible to newcomers. 


​While true a checklist of things to look for and evaluate when adding a new type to the system still has value.  How new types interact, if at all, with TOAST seems like something that warrants explicit attention before commit, and there are probably others (e.g., OIDs, input/output function volatility and configuration, etc...)​.  Maybe this exists somewhere but it you are considering improvements to the commitfest application having an top-of-page & always-visible checklist that can bootstrapped based upon the patch/feature type and modified for specific nuances for the item in question seems like it would be valuable.

If this was in place before 9.3 then the whatever category the multi-xact patches fall into would want to have their template modified to incorporate the various research and testing - along with links to the discussions - the resulted from the various bug reports that were submitted.  It could even be structured to both be an interactive checklist as well as acting as curated documentation for developers and reviewers.  The wiki has some of this but if the goal is to encourage people to learn how to contribute to PostgreSQL it should receive a similar level of attention and quality control that our documentation for people wanting to learn how to use PostgreSQL receive.  But that is above and beyond the simple goal of having meaningful checklists attached to each of the major commit-fest items whose TODO items can be commented upon and serve as a reference for how close to commit a feature may be.  Comments can be as simple as a place for people to upload a psql script and say "this is what I did and everything seemed to work/fail in the way I expected - on this platform".

Curation is hard though so I get why easier - just provide links to the mailing list mainly - actions are what is currently being used.  Instead of the CF manager being a reviewer (which is a valid approach) having them be a curator and providing better tools geared toward that role (both to do the work and to share the results) seem like a better ideal.  Having that role a CF reviewer should maybe also be assigned.  The two people - reviewer and manager - would then be responsible for ensuring that reviews are happening and that the communication to and recruitment from the community is being handled - respectively.

Just some off-the-cuff thoughts...

David J.


Re: 9.5 release scheduling (was Re: logical column ordering)

From
Alvaro Herrera
Date:
FWIW I don't think any amount of process would have gotten multixact to
not have the copious bugs it had.  It was just too complex a patch,
doing ugly things to parts too deeply linked to the inner guts of the
server.  We might have spared a few with some extra testing (such as the
idiotic wraparound bugs, and perhaps the pg_upgrade problems), but most
of the rest of it would have taken a real production load to notice --
which is exactly what happened.

(Of course, review from Tom might have unearthed many of these bugs
before they hit final release, but we're saying we don't want to be
dependent on Tom's review for every patch so that doesn't count.)

Review is good, but (as history shows) some bugs can slip through even
extensive review such as the one multixacts got from Noah and Andres.
Had anyone put some real stress on the beta, we could have noticed some
of these bugs much earlier.  One of the problems we have is that our
serious users don't seem to take the beta period seriously.  All our
users are too busy for that, it seems, and they expect that "somebody
else will test the point-zero release", and that by the time it hits
point-one or point-two it will be bug-free, which is quite clearly not
the case.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Peter Geoghegan
Date:
On Thu, Dec 11, 2014 at 1:49 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Review is good, but (as history shows) some bugs can slip through even
> extensive review such as the one multixacts got from Noah and Andres.
> Had anyone put some real stress on the beta, we could have noticed some
> of these bugs much earlier.  One of the problems we have is that our
> serious users don't seem to take the beta period seriously.  All our
> users are too busy for that, it seems, and they expect that "somebody
> else will test the point-zero release", and that by the time it hits
> point-one or point-two it will be bug-free, which is quite clearly not
> the case.

Good, strategic stress testing has a big role to play here IMV. I have
had some difficulty generating interest in that, but I think it's
essential.

-- 
Peter Geoghegan



Re: logical column ordering

From
Andres Freund
Date:
On 2014-12-10 19:06:28 -0800, Josh Berkus wrote:
> On 12/10/2014 05:14 PM, Stephen Frost wrote:
> > * Andres Freund (andres@2ndquadrant.com) wrote:
> >> > But the scheduling of commits with regard to the 9.5 schedule actually
> >> > opens a relevant question: When are we planning to release 9.5? Because
> >> > If we try ~ one year from now it's a whole different ballgame than if we
> >> > try to go back to september. And I think there's pretty good arguments
> >> > for both.
> > This should really be on its own thread for discussion...  I'm leaning,
> > at the moment at least, towards the September release schedule.  I agree
> > that having a later release would allow us to get more into it, but
> > there's a lot to be said for the consistency we've kept up over the past
> > few years with a September (our last non-September release was 8.4).
> 
> Can we please NOT discuss this in the thread about someone's patch?  Thanks.

Well, it's relevant for the arguments made about the patches future...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> 2. It's not clear that we're going to have a particularly-impressive
> list of major features for 9.5.  So far we've got RLS and BRIN. I
> expect that GROUPING SETS is far enough along that it should be
> possible to get it in before development ends, and there are a few
> performance patches pending (Andres's lwlock scalability patches,
> Rahila's work on compressing full-page writes) that I think will
> probably make the grade.  But after that it seems to me that it gets
> pretty thin on the ground.

When it comes to a list of major features for 9.5, it'd be pretty
terrible, imv, if we manage to screw up and not get UPSERT taken care
of (again..).  BDR will be great to have too, but we lose out far more
often for lack of what those outside the community perceive as a simple
and obvious feature that nearly every other system they deal with has.

> Now, against all that, if we don't get back on our usual release
> schedule then (a) it will look like we're losing momentum, which I'm
> actually afraid may be true rather than merely a perception, and (b)
> people whose stuff did get in will have to wait longer to see it
> released.  So, I'm not sure waiting is any better.  But there are
> certainly some things not to like about where we are.

I agree with both of these.  It doesn't help that we have non-committers
working on major patches for years and aren't able to see the fruits of
their labors.  I'm as much at fault for that happening at times as
anyone and I don't have any silver bullets but I certainly don't like
it.
Thanks,
    Stephen

Re: logical column ordering

From
Andres Freund
Date:
On 2014-12-09 14:41:46 -0300, Alvaro Herrera wrote:
> So I've been updating my very old patch to allow logical and physical
> column reordering.  Here's a WIP first cut for examination.

Do you have a updated patch that has ripened further?

> The first thing where this matters is tuple descriptor expansion in
> parse analysis; at this stage, things such as "*" (in "select *") are
> turned into a target list, which must be sorted according to attlognum.
> To achieve this I added a new routine to tupledescs,
> TupleDescGetSortedAttrs() which computes a new Attribute array and
> caches it in the TupleDesc for later uses; this array points to the
> same elements in the normal attribute list but is order by attlognum.

That sounds sane.

> Another place that needs tweaking is heapam.c, which must construct a
> physical tuple from Datum/nulls arrays (heap_form_tuple).  In some cases
> the input arrays are sorted in logical column order.

I'm not sure that changing heaptuple.c's API (you mean that, not
heapam.c, right?)  is a good level to tackle this at. I think some
function to reorder values/isnull arrays into logical order and reverse
might end up being less invasive and actually faster.

Greetings,

Andres Freund



Re: logical column ordering

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2014-12-09 14:41:46 -0300, Alvaro Herrera wrote:
> > So I've been updating my very old patch to allow logical and physical
> > column reordering.  Here's a WIP first cut for examination.
> 
> Do you have a updated patch that has ripened further?

Not yet.  Phil was kind enough to send me his old patch for study; I
am stealing a few interesting ideas from there, in particular:

> > Another place that needs tweaking is heapam.c, which must construct a
> > physical tuple from Datum/nulls arrays (heap_form_tuple).  In some cases
> > the input arrays are sorted in logical column order.
> 
> I'm not sure that changing heaptuple.c's API (you mean that, not
> heapam.c, right?)  is a good level to tackle this at. I think some
> function to reorder values/isnull arrays into logical order and reverse
> might end up being less invasive and actually faster.

Phil took a different route here than I did, and I think his design is
better than mine.  The main idea is that the Datum/nulls arrays in a
TupleTableSlot always follows physical order (he calls it "storage
order"), rather than this very strange mixture of things I did by
hacking the heaptuple.c API.  So I'm reworking my patch with that in
mind.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Michael Paquier
Date:
On Sun, Jan 4, 2015 at 10:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> So I'm reworking my patch with that in mind.
Switching to returned with feedback. Alvaro, feel free to add an entry
to the next CF if you are planning to work on it again.
-- 
Michael



Re: logical column ordering

From
Alvaro Herrera
Date:
I've decided to abandon this patch.  I have spent too much time looking
at it now.

If anyone is interested in trying to study, I can provide the patches I
came up with, explanations, and references to prior discussion -- feel
free to ask.

My main motivation for this work is to enable a later patch for column
stores.  Right now, since columns have monotonically increasing attnums,
it's rather difficult to have columns that are stored elsewhere.  My
plan for that now is much more modest, something like adding a constant
10000 to attnums and that would let us identify columns that are outside
the heap -- or something like that.  I haven't fully worked it out yet.


Just a few quick notes about this patch: last thing I was doing was mess
with setrefs.c so that Var nodes carry "varphysnum" annotations, which
are set to 0 during initial planner phases, and are turned into the
correct attphysnum (the value extracted from catalogs) so that
TupleDescs constructed from targetlists by ExecTypeFromTL and friends
can have the correct attphysnum too.  I think this part works correctly,
with the horrible exception that I had to do a relation_open() in
setrefs.c to get hold of the right attphysnum from a tupledesc obtained
from catalogs.  That's not acceptable at all; I think the right way to
do this would be to store a list of numbers earlier (not sure when) and
store it in RelOptInfo or RangeTableEntry; that would be accesible
during setrefs.c.

The other bit I did was modify all the heaptuple.c code so that it could
deal correctly with tupledescs that have attphysnums and attlognum in an
order different from stock attnum.  That took some time to get right,
but I think it's also correct now.

One issue I had was finding places that use "attnum" as an index into
the tupledesc "attrs" array.  I had to examine all these places and
change them to use a "physattrs" array, which is an array that has been
sorted by physical number.  I don't think all the changes are correct,
and I'm not sure that I caught them all.


Anyway it seems to me that this is "mostly there".  If somebody is
interested in learning executor code, this project would be damn cool to
get done.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 20.1.2015 22:30, Alvaro Herrera wrote:
> I've decided to abandon this patch.  I have spent too much time looking
> at it now.
> 
> If anyone is interested in trying to study, I can provide the patches I
> came up with, explanations, and references to prior discussion -- feel
> free to ask.

I'll take look. Can you share the patches etc. - either here, or maybe
send it to me directly?

regards
Tomas

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
Hi,

attached is the result of my first attempt to make the logical column
ordering patch work. This touches a lot of code in the executor that is
mostly new to me, so if you see something that looks like an obvious
bug, it probably is (so let me know).


improvements
------------
The main improvements of this version are that:

* initdb actually works (while before it was crashing)

* regression tests work, with two exceptions

  (a) 'subselect' fails because EXPLAIN prints columns in physical order
      (but we expect logical)

  (b) col_order crashes works because of tuple descriptor mismatch in a
      function call (this actually causes a segfault)

The main change is this patch is that tlist_matches_tupdesc() now checks
target list vs. physical attribute order, which may result in doing a
projection (in cases when that would not be done previously).

I don not claim this is the best approach - maybe it would be better to
keep the physical tuple and reorder it lazily. That's why I kept a few
pieces of code (fix_physno_mutator) and a few unused fields in Var.

Over the time I've heard various use cases for this patch, but in most
cases it was quite speculative. If you have an idea where this might be
useful, can you explain it here, or maybe point me to a place where it's
described?

There's also a few FIXMEs, mostly from Alvaro's version of the patch.
Some of them are probably obsolete, but I wasn't 100% sure by that so
I've left them in place until I understand the code sufficiently.


randomized testing
------------------
I've also attached a python script for simple randomized testing. Just
execute it like this:

    $ python randomize-attlognum.py -t test_1 test_2 \
           --init-script attlognum-init.sql \
           --test-script attlognum-test.sql

and it will do this over and over

    $ dropdb test
    $ createdb test
    $ run init script
    $ randomly set attlognums for the tables (test_1 and test_2)
    $ run test script

It does not actually check the result, but my experience is that when
there's a bug in handling the descriptor, it results in segfault pretty
fast (just put some varlena columns into the table).


plans / future
--------------
After discussing this with Alvaro, we've both agreed that this is far
too high-risk change to commit in the very last CF (even if it was in a
better shape). So while it's added to 2015-02 CF, we're aiming for 9.6
if things go well.


regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: logical column ordering

From
Jim Nasby
Date:
On 2/23/15 5:09 PM, Tomas Vondra wrote:
> Over the time I've heard various use cases for this patch, but in most
> cases it was quite speculative. If you have an idea where this might be
> useful, can you explain it here, or maybe point me to a place where it's
> described?

For better or worse, table structure is a form of documentation for a 
system. As such, it's very valuable to group related fields in a table 
together. When creating a table, that's easy, but as soon as you need to 
alter your careful ordering can easily end up out the window.

Perhaps to some that just sounds like pointless window dressing, but my 
experience is that on a complex system the less organized things are the 
more bugs you get due to overlooking something.

The other reason for this patch (which it maybe doesn't support 
anymore?) is to allow Postgres to use an optimal physical ordering of 
fields on a page to reduce space wasted on alignment, as well as taking 
nullability and varlena into account.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
Josh Berkus
Date:
On 02/26/2015 10:49 AM, Jim Nasby wrote:
> On 2/23/15 5:09 PM, Tomas Vondra wrote:
>> Over the time I've heard various use cases for this patch, but in most
>> cases it was quite speculative. If you have an idea where this might be
>> useful, can you explain it here, or maybe point me to a place where it's
>> described?
> 
> For better or worse, table structure is a form of documentation for a
> system. As such, it's very valuable to group related fields in a table
> together. When creating a table, that's easy, but as soon as you need to
> alter your careful ordering can easily end up out the window.
> 
> Perhaps to some that just sounds like pointless window dressing, but my
> experience is that on a complex system the less organized things are the
> more bugs you get due to overlooking something.

Yes.  Consider a BI table which has 110 columns.  Having these columns
in a sensible order,even though some were added after table creation,
would be a significant usability benefit for DBAs.

A second usability benefit is making it easy to keep table columns for
import tables sync'd with COPY.

Neither of these is sufficient to overshadow performance penalties, but
they are both common activities/annoyances, and not speculative in the
least.  And I haven't heard that there are any performance issues
associated with this patch.  Are there?

> The other reason for this patch (which it maybe doesn't support
> anymore?) is to allow Postgres to use an optimal physical ordering of
> fields on a page to reduce space wasted on alignment, as well as taking
> nullability and varlena into account.

... and that's the bigger reason.  I was under the impression that we'd
get LCO in first, and then add the column arrangement optimization in
the next version.

In fact, I would argue that LCO *needs* to be a feature at least one
version before we try to add column ordering optimization (COO).  The
reason being that with LCO, users can implement COO as a client tool or
extension, maybe even an addition to pg_repack.  This will allow users
to experiment with, and prove, algorithms for COO, allowing us to put in
a tested algorithm when we're ready to add it to core.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Alvaro Herrera
Date:
Josh Berkus wrote:
> On 02/26/2015 10:49 AM, Jim Nasby wrote:

> > The other reason for this patch (which it maybe doesn't support
> > anymore?) is to allow Postgres to use an optimal physical ordering of
> > fields on a page to reduce space wasted on alignment, as well as taking
> > nullability and varlena into account.
> 
> ... and that's the bigger reason.  I was under the impression that we'd
> get LCO in first, and then add the column arrangement optimization in
> the next version.

The changes in this patch aren't really optimizations -- they are a
complete rework on the design of storage of columns.  In the current
system, columns exist physically on disk in the same order as they are
created, and in the same order as they appear logically (i.e. SELECT *,
psql's \d, etc).  This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so.  I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

A future patch can implement physical ordering optimization (group
columns physically to avoid alignment padding, for instance, and also
put not-nullable fixed-length column at the start of the physical tuple,
so that the "attcacheoffset" thing can be applied in more cases), but I
think it's folly to try to attempt that in the current patch, which is
already hugely complicated.

> In fact, I would argue that LCO *needs* to be a feature at least one
> version before we try to add column ordering optimization (COO).  The
> reason being that with LCO, users can implement COO as a client tool or
> extension, maybe even an addition to pg_repack.  This will allow users
> to experiment with, and prove, algorithms for COO, allowing us to put in
> a tested algorithm when we're ready to add it to core.

The current patch will clear the road for such experimentation.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Josh Berkus
Date:
On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
> This patch decouples these three things so that they
> can changed freely -- but provides no user interface to do so.  I think
> that trying to only decouple the thing we currently have in two pieces,
> and then have a subsequent patch to decouple again, is additional
> conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO.  Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Alvaro Herrera
Date:
Josh Berkus wrote:
> On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
> > This patch decouples these three things so that they
> > can changed freely -- but provides no user interface to do so.  I think
> > that trying to only decouple the thing we currently have in two pieces,
> > and then have a subsequent patch to decouple again, is additional
> > conceptual complexity for no gain.
> 
> Oh, I didn't realize there weren't commands to change the LCO.  Without
> at least SQL syntax for LCO, I don't see why we'd take it; this sounds
> more like a WIP patch.

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use.  It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Jim Nasby
Date:
On 2/26/15 4:01 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
>>> This patch decouples these three things so that they
>>> can changed freely -- but provides no user interface to do so.  I think
>>> that trying to only decouple the thing we currently have in two pieces,
>>> and then have a subsequent patch to decouple again, is additional
>>> conceptual complexity for no gain.
>>
>> Oh, I didn't realize there weren't commands to change the LCO.  Without
>> at least SQL syntax for LCO, I don't see why we'd take it; this sounds
>> more like a WIP patch.
>
> The reason for doing it this way is that changing the underlying
> architecture is really hard, without having to bear an endless hackers
> bike shed discussion about the best userland syntax to use.  It seems a
> much better approach to do the actually difficult part first, then let
> the rest to be argued to death by others and let those others do the
> easy part (and take all the credit along with that); that way, that
> discussion does not kill other possible uses that the new architecture
> allows.

+1. This patch is already several years old; lets not delay it further.

Plus, I suspect that you could hack the catalog directly if you really 
wanted to change LCO. Worst case you could create a C function to do it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
Andres Freund
Date:
On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:
> On 2/26/15 4:01 PM, Alvaro Herrera wrote:
> >The reason for doing it this way is that changing the underlying
> >architecture is really hard, without having to bear an endless hackers
> >bike shed discussion about the best userland syntax to use.  It seems a
> >much better approach to do the actually difficult part first, then let
> >the rest to be argued to death by others and let those others do the
> >easy part (and take all the credit along with that); that way, that
> >discussion does not kill other possible uses that the new architecture
> >allows.

I agree that it's a sane order of developing things. But: I don't think
it makes sense to commit it without the capability to change the
order. Not so much because it'll not serve a purpose at that point, but
rather because it'll essentially untestable. And this is a patch that
needs a fair amount of changes, both automated, and manual.

At least during development I'd even add a function that randomizes the
physical order of columns, and keeps the logical one. Running the
regression tests that way seems likely to catch a fair number of bugs.

> +1. This patch is already several years old; lets not delay it further.
> 
> Plus, I suspect that you could hack the catalog directly if you really
> wanted to change LCO. Worst case you could create a C function to do it.

I don't think that's sufficient for testing purposes. Not only for the
patch itself, but also for getting it right in new code.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: logical column ordering

From
Tom Lane
Date:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 2/26/15 4:01 PM, Alvaro Herrera wrote:
>> Josh Berkus wrote:
>>> Oh, I didn't realize there weren't commands to change the LCO.  Without
>>> at least SQL syntax for LCO, I don't see why we'd take it; this sounds
>>> more like a WIP patch.

>> The reason for doing it this way is that changing the underlying
>> architecture is really hard, without having to bear an endless hackers
>> bike shed discussion about the best userland syntax to use.  It seems a
>> much better approach to do the actually difficult part first, then let
>> the rest to be argued to death by others and let those others do the
>> easy part (and take all the credit along with that); that way, that
>> discussion does not kill other possible uses that the new architecture
>> allows.

> +1. This patch is already several years old; lets not delay it further.

I tend to agree with that, but how are we going to test things if there's
no mechanism to create a table in which the orderings are different?
        regards, tom lane



Re: logical column ordering

From
Kevin Grittner
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> Over the time I've heard various use cases for this patch, but in most
> cases it was quite speculative. If you have an idea where this might be
> useful, can you explain it here, or maybe point me to a place where it's
> described?


One use case is to be able to suppress default display of columns that are
used for internal purposes.  For example, incremental maintenance of
materialized views will require storing a "count(t)" column, and sometimes
state information for aggregate columns, in addition to what the users
explicitly request.  At the developers' meeting there was discussion of
whether and how to avoid displaying these by default, and it was felt that
when we have this logical column ordering it would be good to have a way to
suppress default display.  Perhaps this could be as simple as a special
value for logical position.
-- 

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: logical column ordering

From
Jim Nasby
Date:
On 2/26/15 4:34 PM, Andres Freund wrote:
> On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:
>> On 2/26/15 4:01 PM, Alvaro Herrera wrote:
>>> The reason for doing it this way is that changing the underlying
>>> architecture is really hard, without having to bear an endless hackers
>>> bike shed discussion about the best userland syntax to use.  It seems a
>>> much better approach to do the actually difficult part first, then let
>>> the rest to be argued to death by others and let those others do the
>>> easy part (and take all the credit along with that); that way, that
>>> discussion does not kill other possible uses that the new architecture
>>> allows.
>
> I agree that it's a sane order of developing things. But: I don't think
> it makes sense to commit it without the capability to change the
> order. Not so much because it'll not serve a purpose at that point, but
> rather because it'll essentially untestable. And this is a patch that
> needs a fair amount of changes, both automated, and manual.

It's targeted for 9.6 anyway, so we have a while to decide on what's 
committed when. It's possible that there's no huge debate on the syntax.

> At least during development I'd even add a function that randomizes the
> physical order of columns, and keeps the logical one. Running the
> regression tests that way seems likely to catch a fair number of bugs.

Yeah, I've thought that would be a necessary part of testing. Not really 
sure how we'd go about it with existing framework though...

>> +1. This patch is already several years old; lets not delay it further.
>>
>> Plus, I suspect that you could hack the catalog directly if you really
>> wanted to change LCO. Worst case you could create a C function to do it.
>
> I don't think that's sufficient for testing purposes. Not only for the
> patch itself, but also for getting it right in new code.

We'll want to do testing that it doesn't make sense for the API to 
support. For example, randomizing the storage ordering; that's not a 
sane use case. Ideally we wouldn't even expose physical ordering because 
the code would be smart enough to figure it out.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
David Steele
Date:
On 2/26/15 1:49 PM, Jim Nasby wrote:
> On 2/23/15 5:09 PM, Tomas Vondra wrote:
>> Over the time I've heard various use cases for this patch, but in most
>> cases it was quite speculative. If you have an idea where this might be
>> useful, can you explain it here, or maybe point me to a place where it's
>> described?
>
> For better or worse, table structure is a form of documentation for a
> system. As such, it's very valuable to group related fields in a table
> together. When creating a table, that's easy, but as soon as you need to
> alter your careful ordering can easily end up out the window.
>
> Perhaps to some that just sounds like pointless window dressing, but my
> experience is that on a complex system the less organized things are the
> more bugs you get due to overlooking something.

I agree with Jim's comments.  I've generally followed column ordering
that goes something like:

1) primary key
2) foreign keys
3) flags
4) other programmatic data fields (type, order, etc.)
5) non-programmatic data fields (name, description, etc.)

The immediate practical benefit of this is that users are more likely to
see fields that they need without scrolling right.  Documentation is
also clearer because fields tend to go from most to least important
(left to right, top to bottom).  Also, if you are consistent enough then
users just *know* where to look.

I wrote a function a while back that reorders columns in tables (it not
only deals with reordering, but triggers, constraints, indexes, etc.,
though there are some caveats).  It's painful and not very efficient,
but easy to use.

Most dimension tables that I've worked with are in the millions of rows
so reordering is not problem.  With fact tables, I assess on a
case-by-case basis. It would be nice to not have to do that triage.

The function is attached if anyone is interested.

--
- David Steele
david@pgmasters.net

Attachment

Re: logical column ordering

From
Gavin Flower
Date:
On 27/02/15 14:08, David Steele wrote:
[...]
> I agree with Jim's comments.  I've generally followed column ordering
> that goes something like:
>
> 1) primary key
> 2) foreign keys
> 3) flags
> 4) other programmatic data fields (type, order, etc.)
> 5) non-programmatic data fields (name, description, etc.)
>
> The immediate practical benefit of this is that users are more likely to
> see fields that they need without scrolling right.  Documentation is
> also clearer because fields tend to go from most to least important
> (left to right, top to bottom).  Also, if you are consistent enough then
> users just *know* where to look.
>
> I wrote a function a while back that reorders columns in tables (it not
> only deals with reordering, but triggers, constraints, indexes, etc.,
> though there are some caveats).  It's painful and not very efficient,
> but easy to use.
>
> Most dimension tables that I've worked with are in the millions of rows
> so reordering is not problem.  With fact tables, I assess on a
> case-by-case basis. It would be nice to not have to do that triage.
>
> The function is attached if anyone is interested.
>
I've never formally written down the order of how I define 
fields^H^H^H^H^H^H columns in a table, but David's list is the same 
order I use.

The only additional ordering I do: is to put fields that are likely to 
be long text fields, at the end of the record.

So I would certainly appreciate my logical ordering to be the natural 
order they appear.


Cheers,
Gavin



Re: logical column ordering

From
Tomas Vondra
Date:
On 26.2.2015 23:36, Tom Lane wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>> On 2/26/15 4:01 PM, Alvaro Herrera wrote:
>>> Josh Berkus wrote:
>>>> Oh, I didn't realize there weren't commands to change the LCO.  Without
>>>> at least SQL syntax for LCO, I don't see why we'd take it; this sounds
>>>> more like a WIP patch.
> 
>>> The reason for doing it this way is that changing the underlying
>>> architecture is really hard, without having to bear an endless hackers
>>> bike shed discussion about the best userland syntax to use.  It seems a
>>> much better approach to do the actually difficult part first, then let
>>> the rest to be argued to death by others and let those others do the
>>> easy part (and take all the credit along with that); that way, that
>>> discussion does not kill other possible uses that the new architecture
>>> allows.
> 
>> +1. This patch is already several years old; lets not delay it further.
> 
> I tend to agree with that, but how are we going to test things if there's
> no mechanism to create a table in which the orderings are different?

Physical or logical orderings?

Physical ordering is still determined by the CREATE TABLE command, so
you can do either
   CREATE TABLE order_1 (       a INT,       b INT   );

or (to get the reverse order)
   CREATE TABLE order_2 (       b INT,       a INT   );

Logical ordering may be updated directly in pg_attribute catalog, by
tweaking the attlognum column
   UPDATE pg_attribute SET attlognum = 10    WHERE attrelid = 'order_1'::regclass AND attname = 'a';

Of course, this does not handle duplicities, ranges and such, so that
needs to be done manually. But I think inventing something like
   ALTER TABLE order_1 ALTER COLUMN a SET lognum = 11;

should be straightforward. But IMHO getting the internals working
properly first is more important.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:

> Physical ordering is still determined by the CREATE TABLE command,

In theory, you should be able to UPDATE attphysnum in pg_attribute too
when the table is empty, and new tuples inserted would follow the
specified ordering.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 27.2.2015 19:23, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> Physical ordering is still determined by the CREATE TABLE command,
> 
> In theory, you should be able to UPDATE attphysnum in pg_attribute
> too when the table is empty, and new tuples inserted would follow
> the specified ordering.

Good idea - that'd give you descriptors with
   (attnum != attphysnum)

which might trigger some bugs.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 26.2.2015 23:42, Kevin Grittner wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> 
>> Over the time I've heard various use cases for this patch, but in 
>> most cases it was quite speculative. If you have an idea where
>> this might be useful, can you explain it here, or maybe point me to
>> a place where it's described?
> 
> 
> One use case is to be able to suppress default display of columns 
> that are used for internal purposes. For example, incremental 
> maintenance of materialized views will require storing a "count(t)" 
> column, and sometimes state information for aggregate columns, in 
> addition to what the users explicitly request. At the developers' 
> meeting there was discussion of whether and how to avoid displaying 
> these by default, and it was felt that when we have this logical 
> column ordering it would be good to have a way tosuppress default
> display. Perhaps this could be as simple as a special value for
> logical position.

I don't see how hiding columns is related to this patch at all. That's
completely unrelated thing, and it certainly is not part of this patch.


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:
> On 26.2.2015 23:42, Kevin Grittner wrote:

> > One use case is to be able to suppress default display of columns 
> > that are used for internal purposes. For example, incremental 
> > maintenance of materialized views will require storing a "count(t)" 
> > column, and sometimes state information for aggregate columns, in 
> > addition to what the users explicitly request. At the developers' 
> > meeting there was discussion of whether and how to avoid displaying 
> > these by default, and it was felt that when we have this logical 
> > column ordering it would be good to have a way tosuppress default
> > display. Perhaps this could be as simple as a special value for
> > logical position.
> 
> I don't see how hiding columns is related to this patch at all. That's
> completely unrelated thing, and it certainly is not part of this patch.

It's not directly related to the patch, but I think the intent is that
once we have this patch it will be possible to apply other
transformations, such as having columns that are effectively hidden --
consider for example the idea that attlognum be set to a negative
number.  (For instance, consider the idea that system columns may all
have -1 as attlognum, which would just be an indicator that they are
never present in logical column expansion.  That makes sense to me; what
reason do we have to keep them using the current attnums they have?)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 27.2.2015 19:50, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>> On 26.2.2015 23:42, Kevin Grittner wrote:
> 
>>> One use case is to be able to suppress default display of columns 
>>> that are used for internal purposes. For example, incremental 
>>> maintenance of materialized views will require storing a "count(t)" 
>>> column, and sometimes state information for aggregate columns, in 
>>> addition to what the users explicitly request. At the developers' 
>>> meeting there was discussion of whether and how to avoid displaying 
>>> these by default, and it was felt that when we have this logical 
>>> column ordering it would be good to have a way tosuppress default
>>> display. Perhaps this could be as simple as a special value for
>>> logical position.
>>
>> I don't see how hiding columns is related to this patch at all. That's
>> completely unrelated thing, and it certainly is not part of this patch.
> 
> It's not directly related to the patch, but I think the intent is that
> once we have this patch it will be possible to apply other
> transformations, such as having columns that are effectively hidden --
> consider for example the idea that attlognum be set to a negative
> number.  (For instance, consider the idea that system columns may all
> have -1 as attlognum, which would just be an indicator that they are
> never present in logical column expansion.  That makes sense to me; what
> reason do we have to keep them using the current attnums they have?)

My vote is against reusing attlognums in this way - I see that as a
misuse, making the code needlessly complex. A better way to achieve this
is to introduce a simple 'is hidden' flag into pg_attribute, and
something as simple as
  ALTER TABLE t ALTER COLUMN a SHOW;  ALTER TABLE t ALTER COLUMN a HIDE;

or something along the lines. Not only that's cleaner, but it's also a
better interface for the users than 'you have to set attlognum to a
negative value.'


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 26.2.2015 23:34, Andres Freund wrote:
> On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:
>> On 2/26/15 4:01 PM, Alvaro Herrera wrote:
>>> The reason for doing it this way is that changing the underlying
>>> architecture is really hard, without having to bear an endless hackers
>>> bike shed discussion about the best userland syntax to use.  It seems a
>>> much better approach to do the actually difficult part first, then let
>>> the rest to be argued to death by others and let those others do the
>>> easy part (and take all the credit along with that); that way, that
>>> discussion does not kill other possible uses that the new architecture
>>> allows.
>
> I agree that it's a sane order of developing things. But: I don't
> think it makes sense to commit it without the capability to change
> the order. Not so much because it'll not serve a purpose at that
> point, but rather because it'll essentially untestable. And this is a
> patch that needs a fair amount of changes, both automated, and
> manual.

I agree that committing something without a code that actually uses it
in practice is not the best approach. That's one of the reasons why I
was asking for the use cases we expect from this.


> At least during development I'd even add a function that randomizes
> the physical order of columns, and keeps the logical one. Running
> the regression tests that way seems likely to catch a fair number of
> bugs.

That's not all that difficult, and can be done in 10 lines of PL/pgSQL -
see the attached file. Should be sufficient for development testing (and
in production there's not much use for that anyway).

>> +1. This patch is already several years old; lets not delay it further.
>>
>> Plus, I suspect that you could hack the catalog directly if you
>> really wanted to change LCO. Worst case you could create a C
>> function to do it.
>
> I don't think that's sufficient for testing purposes. Not only for
> the patch itself, but also for getting it right in new code.

I think we could calls to the randomization functions into some of the
regression tests (say 'create_tables.sql'), but that makes regression
tests ... well, random, and I'm not convinced that's a good thing.

Also, this makes regression tests harder to think, because "SELECT *"
does different things depending on the attlognum order.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:

> I think we could calls to the randomization functions into some of the
> regression tests (say 'create_tables.sql'), but that makes regression
> tests ... well, random, and I'm not convinced that's a good thing.
> 
> Also, this makes regression tests harder to think, because "SELECT *"
> does different things depending on the attlognum order.

No, that approach doesn't seem very useful.  Rather, randomize the
columns in the CREATE TABLE statement, and then fix up the attlognums so
that the SELECT * expansion is the same as it would be with the
not-randomized CREATE TABLE.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 27.2.2015 20:34, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> I think we could calls to the randomization functions into some of the
>> regression tests (say 'create_tables.sql'), but that makes regression
>> tests ... well, random, and I'm not convinced that's a good thing.
>>
>> Also, this makes regression tests harder to think, because "SELECT *"
>> does different things depending on the attlognum order.
> 
> No, that approach doesn't seem very useful.  Rather, randomize the
> columns in the CREATE TABLE statement, and then fix up the attlognums so
> that the SELECT * expansion is the same as it would be with the
> not-randomized CREATE TABLE.

Yes, that's a possible approach too - possibly a better one for
regression tests as it fixes the 'SELECT *' but it effectively uses
fixed 'attlognum' and 'attnum' values (it's difficult to randomize
those, as they may be referenced in other catalogs).

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:
> On 27.2.2015 20:34, Alvaro Herrera wrote:
> > Tomas Vondra wrote:
> > 
> >> I think we could calls to the randomization functions into some of the
> >> regression tests (say 'create_tables.sql'), but that makes regression
> >> tests ... well, random, and I'm not convinced that's a good thing.
> >>
> >> Also, this makes regression tests harder to think, because "SELECT *"
> >> does different things depending on the attlognum order.
> > 
> > No, that approach doesn't seem very useful.  Rather, randomize the
> > columns in the CREATE TABLE statement, and then fix up the attlognums so
> > that the SELECT * expansion is the same as it would be with the
> > not-randomized CREATE TABLE.
> 
> Yes, that's a possible approach too - possibly a better one for
> regression tests as it fixes the 'SELECT *' but it effectively uses
> fixed 'attlognum' and 'attnum' values (it's difficult to randomize
> those, as they may be referenced in other catalogs).

Why would you care what values are used as attnum?  If anything
misbehaves, surely that would be a bug in the patch.  (Of course, you
can't just change the numbers too much later after the fact, because the
attnum values could have propagated into other tables via foreign keys
and such; it needs to be done during executing CREATE TABLE or
immediately thereafter.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Arthur Silva
Date:

On Fri, Feb 27, 2015 at 4:44 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 27.2.2015 20:34, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>
>> I think we could calls to the randomization functions into some of the
>> regression tests (say 'create_tables.sql'), but that makes regression
>> tests ... well, random, and I'm not convinced that's a good thing.
>>
>> Also, this makes regression tests harder to think, because "SELECT *"
>> does different things depending on the attlognum order.
>
> No, that approach doesn't seem very useful.  Rather, randomize the
> columns in the CREATE TABLE statement, and then fix up the attlognums so
> that the SELECT * expansion is the same as it would be with the
> not-randomized CREATE TABLE.

Yes, that's a possible approach too - possibly a better one for
regression tests as it fixes the 'SELECT *' but it effectively uses
fixed 'attlognum' and 'attnum' values (it's difficult to randomize
those, as they may be referenced in other catalogs).

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Sorry to intrude, I've been following this post and I was wondering if it would allow (in the currently planed form or in the future) a wider set of non-rewriting DDLs to Postgres. For example, drop a column without rewriting the table.

Re: logical column ordering

From
Stephen Frost
Date:
* Arthur Silva (arthurprs@gmail.com) wrote:
> Sorry to intrude, I've been following this post and I was wondering if it
> would allow (in the currently planed form or in the future) a wider set of
> non-rewriting DDLs to Postgres. For example, drop a column without
> rewriting the table.

Uh, we already support that.  Ditto for add column (but you have to add
it with all NULL values; you can't add it with a default value).
Thanks,
    Stephen

Re: logical column ordering

From
Andres Freund
Date:
On 2015-02-27 16:51:30 -0300, Arthur Silva wrote:
> Sorry to intrude, I've been following this post and I was wondering if it
> would allow (in the currently planed form or in the future) a wider set of
> non-rewriting DDLs to Postgres.

I don't think it makes a big difference.

> For example, drop a column without rewriting the table.

That's already possible.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Arthur Silva wrote:

> Sorry to intrude, I've been following this post and I was wondering if it
> would allow (in the currently planed form or in the future) a wider set of
> non-rewriting DDLs to Postgres. For example, drop a column without
> rewriting the table.

Perhaps.  But dropping a column already does not rewrite the table, only
marks the column as dropped in system catalogs, so do you have a better
example.

One obvious example is that you have 

CREATE TABLE t (  t1 int,  t3 int
);

and later want to add t2 in the middle, the only way currently is to
drop the table and start again (re-creating all dependant views, FKs,
etc).  With the patch you will be able to add the column at the right
place.  If no default value is supplied for the new column, no table
rewrite is necessary at all.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Josh Berkus
Date:
Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

If that's infeasible, a function would be less optimal, but would work:

SELECT pg_column_order(schemaname, tablename, colname, attnum)

If you set the order # to one where a column already exists, other
column attnums would get "bumped down", closing up any gaps in the
process.  Obviously, this would require some kind of exclusive lock, but
then ALTER TABLE usually does, doesn't it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Tomas Vondra
Date:
On 27.2.2015 20:49, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>> On 27.2.2015 20:34, Alvaro Herrera wrote:
>>> Tomas Vondra wrote:
>>>
>>>> I think we could calls to the randomization functions into some of the
>>>> regression tests (say 'create_tables.sql'), but that makes regression
>>>> tests ... well, random, and I'm not convinced that's a good thing.
>>>>
>>>> Also, this makes regression tests harder to think, because "SELECT *"
>>>> does different things depending on the attlognum order.
>>>
>>> No, that approach doesn't seem very useful.  Rather, randomize the
>>> columns in the CREATE TABLE statement, and then fix up the attlognums so
>>> that the SELECT * expansion is the same as it would be with the
>>> not-randomized CREATE TABLE.
>>
>> Yes, that's a possible approach too - possibly a better one for
>> regression tests as it fixes the 'SELECT *' but it effectively uses
>> fixed 'attlognum' and 'attnum' values (it's difficult to randomize
>> those, as they may be referenced in other catalogs).
> 
> Why would you care what values are used as attnum?  If anything
> misbehaves, surely that would be a bug in the patch.  (Of course, you
> can't just change the numbers too much later after the fact, because the
> attnum values could have propagated into other tables via foreign keys
> and such; it needs to be done during executing CREATE TABLE or
> immediately thereafter.)

Because attnums are referenced in other catalogs? For example when you
define PRIMARY KEY or UNIQUE constraint in the table, an index is
created, which gets a row in pg_index catalog, and that references the
attnum values in indkey column.

If you just randomize the attnums in pg_attribute (withouth fixing all
the attnum references), it's going to go BOOOOM.


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 27.2.2015 21:09, Josh Berkus wrote:
> Tomas,
> 
> So for an API, 100% of the use cases I have for this feature would be
> satisfied by:
> 
> ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #
> 
> and:
> 
> ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

Yes, I imagined an interface like that. Just to be clear, you're talking
about logical order (and not a physical one), right?

Do we need an API to modify physical column order? (I don't think so.)


> If that's infeasible, a function would be less optimal, but would work:
> 
> SELECT pg_column_order(schemaname, tablename, colname, attnum)

If we need a user interface, let's have a proper one (ALTER TABLE).


> If you set the order # to one where a column already exists, other 
> column attnums would get "bumped down", closing up any gaps in the 
> process. Obviously, this would require some kind of exclusive lock,
> but then ALTER TABLE usually does, doesn't it?

If we ignore the system columns, the current implementation assumes that
the values in each of the three columns (attnum, attlognum and
attphysnum) are distinct and within 1..natts. So there are no gaps and
you'll always set the value to an existing one (so yes, shuffling is
necessary).

And yes, that certainly requires an exclusive lock on the pg_attribute
(I don't think we need a lock on the table itself).

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Gavin Flower
Date:
On 28/02/15 09:09, Josh Berkus wrote:
> Tomas,
>
> So for an API, 100% of the use cases I have for this feature would be
> satisfied by:
>
> ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #
>
> and:
>
> ALTER TABLE _____ ADD COLUMN colname coltype ORDER #
>
> If that's infeasible, a function would be less optimal, but would work:
>
> SELECT pg_column_order(schemaname, tablename, colname, attnum)
>
> If you set the order # to one where a column already exists, other
> column attnums would get "bumped down", closing up any gaps in the
> process.  Obviously, this would require some kind of exclusive lock, but
> then ALTER TABLE usually does, doesn't it?
>
Might be an idea to allow lists of columns and their starting position.

ALTER TABLE customer ALTER COLUMN (job_id, type_id, account_num) SET 
ORDER 3;

So in a table with fields:
1. id2. *account_num*3. dob4. start_date5. *type_id*6. preferred_status7. */job_id/*8. comment


would end up like:
1. id2. dob3. *job_id*4. *type_id*5. *account_num*6. start_date7. preferred_status8. comment

Am assuming positions are numbered from 1 onwards.


Cheers,
Gavin



Re: logical column ordering

From
Josh Berkus
Date:
On 02/27/2015 12:25 PM, Tomas Vondra wrote:
> On 27.2.2015 21:09, Josh Berkus wrote:
>> Tomas,
>>
>> So for an API, 100% of the use cases I have for this feature would be
>> satisfied by:
>>
>> ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #
>>
>> and:
>>
>> ALTER TABLE _____ ADD COLUMN colname coltype ORDER #
> 
> Yes, I imagined an interface like that. Just to be clear, you're talking
> about logical order (and not a physical one), right?

Correct.  The only reason to rearrange the physical columns is in order
to optimize, which presumably would be carried out by a utility command,
e.g. VACUUM FULL OPTIMIZE.

> If we ignore the system columns, the current implementation assumes that
> the values in each of the three columns (attnum, attlognum and
> attphysnum) are distinct and within 1..natts. So there are no gaps and
> you'll always set the value to an existing one (so yes, shuffling is
> necessary).
> 
> And yes, that certainly requires an exclusive lock on the pg_attribute
> (I don't think we need a lock on the table itself).

If MVCC on pg_attribute is good enough to not lock against concurrent
"SELECT *", then that would be lovely.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Tomas Vondra
Date:
OK, so let me summarize here (the other posts in this subthread are
discussing the user interface, not the use cases, so I'll respond here).


There are two main use cases:

1) change the order of columns in "SELECT *"
  - display columns so that related ones grouped together    (irrespectedly whether they were added later, etc.)
  - keep columns synced with COPY
  - requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)


2) optimization of physical order (efficient storage / tuple deforming)
  - more efficient order for storage (deforming)
  - may be done manually by reordering columns in CREATE TABLE
  - should be done automatically (no user interface required)
  - seems useful both for on-disk physical tuples, and virtual tuples


Anything else?

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 27.2.2015 21:42, Josh Berkus wrote:
> On 02/27/2015 12:25 PM, Tomas Vondra wrote:
>> On 27.2.2015 21:09, Josh Berkus wrote:
>>> Tomas,
>>>
>>> So for an API, 100% of the use cases I have for this feature would be
>>> satisfied by:
>>>
>>> ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #
>>>
>>> and:
>>>
>>> ALTER TABLE _____ ADD COLUMN colname coltype ORDER #
>>
>> Yes, I imagined an interface like that. Just to be clear, you're
>> talking about logical order (and not a physical one), right?
> 
> Correct. The only reason to rearrange the physical columns is in
> order to optimize, which presumably would be carried out by a utility
> command, e.g. VACUUM FULL OPTIMIZE.

I was thinking more about CREATE TABLE at this moment, but yeah, VACUUM
FULL OPTIMIZE might do the same thing.

>> If we ignore the system columns, the current implementation assumes
>> that the values in each of the three columns (attnum, attlognum
>> and attphysnum) are distinct and within 1..natts. So there are no
>> gaps and you'll always set the value to an existing one (so yes,
>> shuffling is necessary).
>>
>> And yes, that certainly requires an exclusive lock on the
>> pg_attribute (I don't think we need a lock on the table itself).
> 
> If MVCC on pg_attribute is good enough to not lock against concurrent
> "SELECT *", then that would be lovely.

Yeah, I think this will need a bit more thought. We certainly don't want
blocking queries on the table, but we need a consistent list of column
definitions from pg_attribute.


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:

> 1) change the order of columns in "SELECT *"
> 
>    - display columns so that related ones grouped together
>      (irrespectedly whether they were added later, etc.)
> 
>    - keep columns synced with COPY
> 
>    - requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)

Not sure about the "ORDER #" syntax.  In ALTER ENUM we have "AFTER
<value>" and such .. I'd consider that instead.

> 2) optimization of physical order (efficient storage / tuple deforming)
> 
>    - more efficient order for storage (deforming)
> 
>    - may be done manually by reordering columns in CREATE TABLE
> 
>    - should be done automatically (no user interface required)

A large part of it can be done automatically: for instance, not-nullable
fixed length types ought to come first, because that enables the
attcacheoff optimizations in heaptuple.c to fire for more columns.  But
what column comes next?  The offset of the column immediately after them
can also be cached, and so it would be faster to obtain than other
attributes.  Which one to choose here is going to be a coin toss in most
cases, but I suppose there are cases out there which can benefit from
having a particular column there.


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Gavin Flower
Date:
On 28/02/15 09:49, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>
>> 1) change the order of columns in "SELECT *"
>>
>>     - display columns so that related ones grouped together
>>       (irrespectedly whether they were added later, etc.)
>>
>>     - keep columns synced with COPY
>>
>>     - requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)
> Not sure about the "ORDER #" syntax.  In ALTER ENUM we have "AFTER
> <value>" and such .. I'd consider that instead.
>
>> 2) optimization of physical order (efficient storage / tuple deforming)
>>
>>     - more efficient order for storage (deforming)
>>
>>     - may be done manually by reordering columns in CREATE TABLE
>>
>>     - should be done automatically (no user interface required)
> A large part of it can be done automatically: for instance, not-nullable
> fixed length types ought to come first, because that enables the
> attcacheoff optimizations in heaptuple.c to fire for more columns.  But
> what column comes next?  The offset of the column immediately after them
> can also be cached, and so it would be faster to obtain than other
> attributes.  Which one to choose here is going to be a coin toss in most
> cases, but I suppose there are cases out there which can benefit from
> having a particular column there.
>
>
Possible, if there is no obvious (to the system) way of deciding the 
order of 2 columns, then the logical order should be used?

As either the order does not really matter, or an expert DBA might know 
which is more efficient.


Cheers,
Gavin



Re: logical column ordering

From
Arthur Silva
Date:
<p dir="ltr"><br /> On Feb 27, 2015 5:02 PM, "Alvaro Herrera" <<a
href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>>wrote:<br /> ><br /> > Arthur Silva
wrote:<br/> ><br /> > > Sorry to intrude, I've been following this post and I was wondering if it<br /> >
>would allow (in the currently planed form or in the future) a wider set of<br /> > > non-rewriting DDLs to
Postgres.For example, drop a column without<br /> > > rewriting the table.<br /> ><br /> > Perhaps.  But
droppinga column already does not rewrite the table, only<br /> > marks the column as dropped in system catalogs, so
doyou have a better<br /> > example.<br /> ><br /> > One obvious example is that you have<br /> ><br />
>CREATE TABLE t (<br /> >    t1 int,<br /> >    t3 int<br /> > );<br /> ><br /> > and later want to
addt2 in the middle, the only way currently is to<br /> > drop the table and start again (re-creating all dependant
views,FKs,<br /> > etc).  With the patch you will be able to add the column at the right<br /> > place.  If no
defaultvalue is supplied for the new column, no table<br /> > rewrite is necessary at all.<br /> ><br /> >
--<br/> > Álvaro Herrera                <a href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br />
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services<p dir="ltr">Cool! I didn't know I could
dropstuff without rewriting.<p dir="ltr">Ya, that's another example, people do these from GUI tools. That's a nice side
effect.Cool (again)! <br /> 

Re: logical column ordering

From
Josh Berkus
Date:
On 02/27/2015 12:48 PM, Tomas Vondra wrote:
> On 27.2.2015 21:42, Josh Berkus wrote:
>> On 02/27/2015 12:25 PM, Tomas Vondra wrote:
>>> On 27.2.2015 21:09, Josh Berkus wrote:
>>>> Tomas,
>>>>
>>>> So for an API, 100% of the use cases I have for this feature would be
>>>> satisfied by:
>>>>
>>>> ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #
>>>>
>>>> and:
>>>>
>>>> ALTER TABLE _____ ADD COLUMN colname coltype ORDER #
>>>
>>> Yes, I imagined an interface like that. Just to be clear, you're
>>> talking about logical order (and not a physical one), right?
>>
>> Correct. The only reason to rearrange the physical columns is in
>> order to optimize, which presumably would be carried out by a utility
>> command, e.g. VACUUM FULL OPTIMIZE.
> 
> I was thinking more about CREATE TABLE at this moment, but yeah, VACUUM
> FULL OPTIMIZE might do the same thing.

Actually, I'm going to go back on what I said.

We need an API for physical column reordering, even if it's just pg_
functions.  The reason is that we want to enable people writing their
own physical column re-ordering tools, so that our users can figure out
for us what the best reordering algorithm is.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Tomas Vondra
Date:
On 27.2.2015 23:48, Josh Berkus wrote:
> On 02/27/2015 12:48 PM, Tomas Vondra wrote:
>> On 27.2.2015 21:42, Josh Berkus wrote:
>>> On 02/27/2015 12:25 PM, Tomas Vondra wrote:
>>>> On 27.2.2015 21:09, Josh Berkus wrote:
>>>>> Tomas,
>>>>>
>>>>> So for an API, 100% of the use cases I have for this feature would be
>>>>> satisfied by:
>>>>>
>>>>> ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #
>>>>>
>>>>> and:
>>>>>
>>>>> ALTER TABLE _____ ADD COLUMN colname coltype ORDER #
>>>>
>>>> Yes, I imagined an interface like that. Just to be clear, you're
>>>> talking about logical order (and not a physical one), right?
>>>
>>> Correct. The only reason to rearrange the physical columns is in
>>> order to optimize, which presumably would be carried out by a utility
>>> command, e.g. VACUUM FULL OPTIMIZE.
>>
>> I was thinking more about CREATE TABLE at this moment, but yeah, VACUUM
>> FULL OPTIMIZE might do the same thing.
> 
> Actually, I'm going to go back on what I said.
> 
> We need an API for physical column reordering, even if it's just pg_
> functions.  The reason is that we want to enable people writing their
> own physical column re-ordering tools, so that our users can figure out
> for us what the best reordering algorithm is.

I doubt that. For example, do you realize you can only do that while the
table is completely empty, and in that case you can just do a CREATE
TABLE with the proper order?

I also doubt the users will be able to optimize the order better than
users, who usually have on idea of how this actually works internally.

But if we want to allow users to define this, I'd say let's make that
part of CREATE TABLE, i.e. the order of columns defines logical order,
and you use something like 'AFTER' to specify physical order.
   CREATE TABLE test (       a INT AFTER b,    -- attlognum = 1, attphysnum = 2       b INT             -- attlognum =
2,attphysnum = 1   );
 

It might get tricky because of cycles, though.


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
David G Johnston
Date:
Tomas Vondra-4 wrote
> But if we want to allow users to define this, I'd say let's make that
> part of CREATE TABLE, i.e. the order of columns defines logical order,
> and you use something like 'AFTER' to specify physical order.
> 
>     CREATE TABLE test (
>         a INT AFTER b,    -- attlognum = 1, attphysnum = 2
>         b INT             -- attlognum = 2, attphysnum = 1
>     );

Why not memorialize this as a storage parameter?

CREATE TABLE test (   a INT, b INT
)
WITH (layout = 'b, a')
;

A canonical form should be defined and then ALTER TABLE can either directly
update the current value or require the user to specify a new layout before
it will perform the alteration.

David J.




--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5839825.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: logical column ordering

From
Josh Berkus
Date:
On 02/27/2015 03:06 PM, Tomas Vondra wrote:
> On 27.2.2015 23:48, Josh Berkus wrote:
>> Actually, I'm going to go back on what I said.
>>
>> We need an API for physical column reordering, even if it's just pg_
>> functions.  The reason is that we want to enable people writing their
>> own physical column re-ordering tools, so that our users can figure out
>> for us what the best reordering algorithm is.
> 
> I doubt that. For example, do you realize you can only do that while the
> table is completely empty, and in that case you can just do a CREATE
> TABLE with the proper order?

Well, you could recreate the table as the de-facto API, although as you
point out below that still requires new syntax.

But I was thinking of something which would re-write the table, just
like ADD COLUMN x DEFAULT '' does now.

> I also doubt the users will be able to optimize the order better than
> users, who usually have on idea of how this actually works internally.

We have a lot of power users, including a lot of the people on this
mailing list.

Among the things we don't know about ordering optimization:

* How important is it for index performance to keep key columns adjacent?

* How important is it to pack values < 4 bytes, as opposed to packing
values which are non-nullable?

* How important is it to pack values of the same size, as opposed to
packing values which are non-nullable?

> But if we want to allow users to define this, I'd say let's make that
> part of CREATE TABLE, i.e. the order of columns defines logical order,
> and you use something like 'AFTER' to specify physical order.
> 
>     CREATE TABLE test (
>         a INT AFTER b,    -- attlognum = 1, attphysnum = 2
>         b INT             -- attlognum = 2, attphysnum = 1
>     );
> 
> It might get tricky because of cycles, though.

It would be a lot easier to allow the user to specific a scalar.

CREATE TABLE test (a INT NOT NULL WITH ( lognum 1, physnum 2 )b INT WITH ( lognum 2, physnum 1 )

... and just throw an error if the user creates duplicates or gaps.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
David G Johnston
Date:
David G Johnston wrote
> 
> Tomas Vondra-4 wrote
>> But if we want to allow users to define this, I'd say let's make that
>> part of CREATE TABLE, i.e. the order of columns defines logical order,
>> and you use something like 'AFTER' to specify physical order.
>> 
>>     CREATE TABLE test (
>>         a INT AFTER b,    -- attlognum = 1, attphysnum = 2
>>         b INT             -- attlognum = 2, attphysnum = 1
>>     );
> Why not memorialize this as a storage parameter?
> 
> CREATE TABLE test (
>     a INT, b INT
> )
> WITH (layout = 'b, a')
> ;
> 
> A canonical form should be defined and then ALTER TABLE can either
> directly update the current value or require the user to specify a new
> layout before it will perform the alteration.
> 
> David J.

Extending the idea a bit further why not have "CREATE TABLE" be the API; or
at least something similar to it?

CREATE OR REARRANGE TABLE test (
[...]
)
WITH ();

The "[...]" part would be logical and the WITH() would define the physical. 
The "PK" would simply be whatever is required to make the command work.

David J.




--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5839828.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: logical column ordering

From
Gavin Flower
Date:
On 28/02/15 12:21, Josh Berkus wrote:
> On 02/27/2015 03:06 PM, Tomas Vondra wrote:
>> On 27.2.2015 23:48, Josh Berkus wrote:
>>> Actually, I'm going to go back on what I said.
>>>
>>> We need an API for physical column reordering, even if it's just pg_
>>> functions.  The reason is that we want to enable people writing their
>>> own physical column re-ordering tools, so that our users can figure out
>>> for us what the best reordering algorithm is.
>> I doubt that. For example, do you realize you can only do that while the
>> table is completely empty, and in that case you can just do a CREATE
>> TABLE with the proper order?
> Well, you could recreate the table as the de-facto API, although as you
> point out below that still requires new syntax.
>
> But I was thinking of something which would re-write the table, just
> like ADD COLUMN x DEFAULT '' does now.
>
>> I also doubt the users will be able to optimize the order better than
>> users, who usually have on idea of how this actually works internally.
> We have a lot of power users, including a lot of the people on this
> mailing list.
>
> Among the things we don't know about ordering optimization:
>
> * How important is it for index performance to keep key columns adjacent?
>
> * How important is it to pack values < 4 bytes, as opposed to packing
> values which are non-nullable?
>
> * How important is it to pack values of the same size, as opposed to
> packing values which are non-nullable?
>
>> But if we want to allow users to define this, I'd say let's make that
>> part of CREATE TABLE, i.e. the order of columns defines logical order,
>> and you use something like 'AFTER' to specify physical order.
>>
>>      CREATE TABLE test (
>>          a INT AFTER b,    -- attlognum = 1, attphysnum = 2
>>          b INT             -- attlognum = 2, attphysnum = 1
>>      );
>>
>> It might get tricky because of cycles, though.
> It would be a lot easier to allow the user to specific a scalar.
>
> CREATE TABLE test (
>     a INT NOT NULL WITH ( lognum 1, physnum 2 )
>     b INT WITH ( lognum 2, physnum 1 )
>
> ... and just throw an error if the user creates duplicates or gaps.
>
I thinks gaps should be okay.

Remember BASIC?  Lines numbers tended to be in 10's so you could easily 
slot new lines in between the existing ones - essential when using the 
Teletype input/output device.

Though the difference here is that you would NOT want to remember the 
original order numbers (at least I don't think that would be worth the 
coding effort & space).  However, the key is the actual order, not the 
numbering.  However, that might require a WARNING message to say that 
the columns would be essentially renumbered from 1 onwards when the 
table was actually created - if gaps had been left.


Cheers,
Gavin



Re: logical column ordering

From
Jim Nasby
Date:
On 2/27/15 2:37 PM, Gavin Flower wrote:
> Might be an idea to allow lists of columns and their starting position.
>
> ALTER TABLE customer ALTER COLUMN (job_id, type_id, account_num) SET
> ORDER 3;

I would certainly want something along those lines because it would be 
*way* less verbose (and presumably a lot faster) than a slew of ALTER 
TABLE statements.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
Jim Nasby
Date:
On 2/27/15 2:49 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>
>> 1) change the order of columns in "SELECT *"
>>
>>     - display columns so that related ones grouped together
>>       (irrespectedly whether they were added later, etc.)

FWIW, I find the ordering more important for things like \d than SELECT *.

Hey, after we get this done the next step is allowing different logical 
ordering for different uses! ;P

>>     - keep columns synced with COPY
>>
>>     - requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)
>
> Not sure about the "ORDER #" syntax.  In ALTER ENUM we have "AFTER
> <value>" and such .. I'd consider that instead.

+1. See also Gavin's suggestion of ALTER COLUMN (a, b, c) SET ORDER ...

>> 2) optimization of physical order (efficient storage / tuple deforming)
>>
>>     - more efficient order for storage (deforming)
>>
>>     - may be done manually by reordering columns in CREATE TABLE
>>
>>     - should be done automatically (no user interface required)
>
> A large part of it can be done automatically: for instance, not-nullable
> fixed length types ought to come first, because that enables the

I would think that eliminating wasted space due to alignment would be 
more important than optimizing attcacheoff, at least for a database that 
doesn't fit in memory. Even if it does fit in memory I suspect memory 
bandwidth is more important than clock cycles.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
Matt Kelly
Date:
Even if it does fit in memory I suspect memory bandwidth is more important than clock cycles.

http://people.freebsd.org/~lstewart/articles/cpumemory.pdf

This paper is old but the ratios should still be pretty accurate.  Main memory is 240 clock cycles away and L1d is only 3.  If the experiments in this paper still hold true loading the 8K block into L1d is far more expensive than the CPU processing done once the block is in cache.

When one adds in NUMA to the contention on this shared resource, its not that hard for a 40 core machine to starve for memory bandwidth, and for cores to sit idle waiting for main memory.  Eliminating wasted space seems far more important even when everything could fit in memory already.

Re: logical column ordering

From
Gavin Flower
Date:
On 28/02/15 18:34, Jim Nasby wrote:
> On 2/27/15 2:49 PM, Alvaro Herrera wrote:
>> Tomas Vondra wrote:
>>
>>> 1) change the order of columns in "SELECT *"
>>>
>>>     - display columns so that related ones grouped together
>>>       (irrespectedly whether they were added later, etc.)
>
> FWIW, I find the ordering more important for things like \d than 
> SELECT *.
>
> Hey, after we get this done the next step is allowing different 
> logical ordering for different uses! ;P
[...]

How about CREATE COLUMN SELECTION my_col_sel (a, g, b, e) FROM TABLE 
my_table;

Notes:
1. The column names must match those of the table
2. The COLUMN SELECTION is associated with the specified table
3. If a column gets renamed, then the COLUMN SELECTION effectively gets   updated to use the new column name   (This
canprobably be done automatically, by virtue of storing   references to the appropriate column definitions)
 
4. Allow fewer columns in the COLUMN SELECTION than the original table
5. Allow the the same column to be duplicated   (trivial, simply don't raise an error for duplicates)
6. Allow the COLUMN SELECTION name to appear instead of the list of   columns after the SELECT key word   (SELECT
COLUMNSELECTION my_col_sel FROM my_table WHERE ... - must   match table in FROM clause)
 

If several tables are defined in the FROM clause, and 2 different tables 
have COLUMN SELECTION with identical names, then the COLUMN SELECTION 
names in the SELECT must be prefixed either the table name or its alias.


Cheers,
Gavin



Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:

> > We need an API for physical column reordering, even if it's just pg_
> > functions.  The reason is that we want to enable people writing their
> > own physical column re-ordering tools, so that our users can figure out
> > for us what the best reordering algorithm is.
> 
> I doubt that. For example, do you realize you can only do that while the
> table is completely empty, and in that case you can just do a CREATE
> TABLE with the proper order?

Not if you have views or constraints depending on the table definition
-- it's not trivial to drop/recreate the table in that case, but you can
of course think about truncating it, then reorder columns, then
repopulate.

Even better you can cause a full table rewrite if needed.

> But if we want to allow users to define this, I'd say let's make that
> part of CREATE TABLE, i.e. the order of columns defines logical order,
> and you use something like 'AFTER' to specify physical order.
> 
>     CREATE TABLE test (
>         a INT AFTER b,    -- attlognum = 1, attphysnum = 2
>         b INT             -- attlognum = 2, attphysnum = 1
>     );

Surely you want an ALTER command as a minimum; perhaps that is enough
and there is no need for options in CREATE.

> It might get tricky because of cycles, though.

If there's a cycle, just raise an error.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On 2/27/15 2:37 PM, Gavin Flower wrote:
> >Might be an idea to allow lists of columns and their starting position.
> >
> >ALTER TABLE customer ALTER COLUMN (job_id, type_id, account_num) SET
> >ORDER 3;
> 
> I would certainly want something along those lines because it would be *way*
> less verbose (and presumably a lot faster) than a slew of ALTER TABLE
> statements.

You know you can issue multiple subcommands in one ALTER TABLE
statement, right?  There's no reason to do more than one table rewrite.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On 2/27/15 2:49 PM, Alvaro Herrera wrote:
> >Tomas Vondra wrote:
> >
> >>1) change the order of columns in "SELECT *"
> >>
> >>    - display columns so that related ones grouped together
> >>      (irrespectedly whether they were added later, etc.)
> 
> FWIW, I find the ordering more important for things like \d than SELECT *.

Logical column ordering is (or should be) used in all places where
column lists are expanded user-visibly.  \d would be one of those.
Think column order in the output of a JOIN also, for instance -- they
must follow logical column order.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Bruce Momjian
Date:
On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:
> On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
> > This patch decouples these three things so that they
> > can changed freely -- but provides no user interface to do so.  I think
> > that trying to only decouple the thing we currently have in two pieces,
> > and then have a subsequent patch to decouple again, is additional
> > conceptual complexity for no gain.
> 
> Oh, I didn't realize there weren't commands to change the LCO.  Without
> at least SQL syntax for LCO, I don't see why we'd take it; this sounds
> more like a WIP patch.

FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
columns in physical order with some logical ordering information, i.e.
pg_upgrade cannot be passed only logical ordering from pg_dump.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: logical column ordering

From
Jim Nasby
Date:
On 3/3/15 11:23 AM, Bruce Momjian wrote:
> On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:
>> On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
>>> This patch decouples these three things so that they
>>> can changed freely -- but provides no user interface to do so.  I think
>>> that trying to only decouple the thing we currently have in two pieces,
>>> and then have a subsequent patch to decouple again, is additional
>>> conceptual complexity for no gain.
>>
>> Oh, I didn't realize there weren't commands to change the LCO.  Without
>> at least SQL syntax for LCO, I don't see why we'd take it; this sounds
>> more like a WIP patch.
>
> FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
> columns in physical order with some logical ordering information, i.e.
> pg_upgrade cannot be passed only logical ordering from pg_dump.

Wouldn't it need attno info too, so all 3 orderings?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
Bruce Momjian
Date:
On Tue, Mar  3, 2015 at 11:24:38AM -0600, Jim Nasby wrote:
> On 3/3/15 11:23 AM, Bruce Momjian wrote:
> >On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:
> >>On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
> >>>This patch decouples these three things so that they
> >>>can changed freely -- but provides no user interface to do so.  I think
> >>>that trying to only decouple the thing we currently have in two pieces,
> >>>and then have a subsequent patch to decouple again, is additional
> >>>conceptual complexity for no gain.
> >>
> >>Oh, I didn't realize there weren't commands to change the LCO.  Without
> >>at least SQL syntax for LCO, I don't see why we'd take it; this sounds
> >>more like a WIP patch.
> >
> >FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
> >columns in physical order with some logical ordering information, i.e.
> >pg_upgrade cannot be passed only logical ordering from pg_dump.
> 
> Wouldn't it need attno info too, so all 3 orderings?

Uh, what is the third ordering?  Physical, logical, and ?  It already
gets information about dropped columns, if that is the third one.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: logical column ordering

From
Jim Nasby
Date:
On 3/3/15 11:26 AM, Bruce Momjian wrote:
> On Tue, Mar  3, 2015 at 11:24:38AM -0600, Jim Nasby wrote:
>> On 3/3/15 11:23 AM, Bruce Momjian wrote:
>>> On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:
>>>> On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
>>>>> This patch decouples these three things so that they
>>>>> can changed freely -- but provides no user interface to do so.  I think
>>>>> that trying to only decouple the thing we currently have in two pieces,
>>>>> and then have a subsequent patch to decouple again, is additional
>>>>> conceptual complexity for no gain.
>>>>
>>>> Oh, I didn't realize there weren't commands to change the LCO.  Without
>>>> at least SQL syntax for LCO, I don't see why we'd take it; this sounds
>>>> more like a WIP patch.
>>>
>>> FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
>>> columns in physical order with some logical ordering information, i.e.
>>> pg_upgrade cannot be passed only logical ordering from pg_dump.
>>
>> Wouldn't it need attno info too, so all 3 orderings?
>
> Uh, what is the third ordering?  Physical, logical, and ?  It already
> gets information about dropped columns, if that is the third one.

attnum; used in other catalogs to reference columns.

If you're shuffling everything though pg_dump anyway then maybe it's not 
needed...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: logical column ordering

From
Bruce Momjian
Date:
On Tue, Mar  3, 2015 at 11:41:20AM -0600, Jim Nasby wrote:
> >>Wouldn't it need attno info too, so all 3 orderings?
> >
> >Uh, what is the third ordering?  Physical, logical, and ?  It already
> >gets information about dropped columns, if that is the third one.
> 
> attnum; used in other catalogs to reference columns.
> 
> If you're shuffling everything though pg_dump anyway then maybe it's
> not needed...

Yes, all those attno system table links are done with pg_dump SQL
commands.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: logical column ordering

From
Peter Eisentraut
Date:
On 12/9/14 12:41 PM, Alvaro Herrera wrote:
> To recap, this is based on the idea of having three numbers for each
> attribute rather than a single attnum; the first of these is attnum (a
> number that uniquely identifies an attribute since its inception and may
> or may not have any relationship to its storage position and the place
> it expands to through user interaction).  The second is attphysnum,
> which indicates where it is stored in the physical structure.  The third
> is attlognum, which indicates where it expands in "*", where must its
> values be placed in COPY or VALUES lists, etc --- the logical position
> as the user sees it.

Side idea:  Let attnum be the logical number, introduce attphysnum as
the storage position, and add an oid to pg_attribute as the eternal
identifier.

That way you avoid breaking pretty much all user code that looks at
pg_attribute, which will probably do something like ORDER BY attnum.

Also, one could get rid of all sorts of ugly code that works around the
lack of an oid in pg_attribute, such as in the dependency tracking.




Re: logical column ordering

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Side idea:  Let attnum be the logical number, introduce attphysnum as
> the storage position, and add an oid to pg_attribute as the eternal
> identifier.

> That way you avoid breaking pretty much all user code that looks at
> pg_attribute, which will probably do something like ORDER BY attnum.

> Also, one could get rid of all sorts of ugly code that works around the
> lack of an oid in pg_attribute, such as in the dependency tracking.

I think using an OID would break more stuff than it fixes in dependency
tracking; in particular you would now need an explicit dependency link
from each column to its table, because the "sub-object" knowledge would
no longer work.  In any case this patch is going to be plenty big enough
already without saddling it with a major rewrite of the dependency system.

I agree though that it's worth considering defining pg_attribute.attnum as
the logical column position so as to minimize the effects on client-side
code.  I doubt there is much stuff client-side that cares about column
creation order, but there is plenty that cares about logical column order.
OTOH this would introduce confusion into the backend code, since Alvaro's
definition of attnum is what most of the backend should care about.
        regards, tom lane



Re: logical column ordering

From
Tomas Vondra
Date:
On 12.3.2015 03:16, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Side idea: Let attnum be the logical number, introduce attphysnum 
>> as the storage position, and add an oid to pg_attribute as the 
>> eternal identifier.
> 
>> That way you avoid breaking pretty much all user code that looks at
>> pg_attribute, which will probably do something like ORDER BY 
>> attnum.
> 
>> Also, one could get rid of all sorts of ugly code that works around
>> the lack of an oid in pg_attribute, such as in the dependency
>> tracking.
> 
> I think using an OID would break more stuff than it fixes in 
> dependency tracking; in particular you would now need an explicit 
> dependency link from each column to its table, because the 
> "sub-object" knowledge would no longer work. In any case this patch 
> is going to be plenty big enough already without saddling it with a 
> major rewrite of the dependency system.

Exactly. I believe Alvaro considered that option in the past.

> I agree though that it's worth considering defining 
> pg_attribute.attnum as the logical column position so as to minimize 
> the effects on client-side code. I doubt there is much stuff 
> client-side that cares about column creation order, but there is 
> plenty that cares about logical column order. OTOH this would 
> introduce confusion into the backend code, since Alvaro's definition 
> of attnum is what most of the backend should care about.

IMHO reusing attnum for logical column order would actually make it more
complex, especially if we allow users to modify the logical order using
ALTER TABLE. Because if you change it, you have to walk through all the
places where it might be referenced and update those too (say, columns
referenced in indexes and such). Keeping attnum immutable makes this
much easier and simpler.

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:
> On 12.3.2015 03:16, Tom Lane wrote:

> > I agree though that it's worth considering defining 
> > pg_attribute.attnum as the logical column position so as to minimize 
> > the effects on client-side code. I doubt there is much stuff 
> > client-side that cares about column creation order, but there is 
> > plenty that cares about logical column order. OTOH this would 
> > introduce confusion into the backend code, since Alvaro's definition 
> > of attnum is what most of the backend should care about.
> 
> IMHO reusing attnum for logical column order would actually make it more
> complex, especially if we allow users to modify the logical order using
> ALTER TABLE. Because if you change it, you have to walk through all the
> places where it might be referenced and update those too (say, columns
> referenced in indexes and such). Keeping attnum immutable makes this
> much easier and simpler.

I think you're misunderstanding.  The suggestion, as I understand it, is
to rename the attnum column to something else (maybe, say, attidnum),
and rename attlognum to attnum.  That preserves the existing property
that "ORDER BY attnum" gives you the correct view of the table from the
point of view of the user.  That's very useful because it means clients
looking at pg_attribute need less changes, or maybe none at all.

I think this wouldn't be too difficult to implement, because there
aren't that many places that refer to the column-identity attribute by
name; most of them just grab the TupleDesc->attrs array in whatever
order is appropriate and scan that in a loop.  Only a few of these use
att->attnum inside the loop --- that's what would need to be changed,
and it should be pretty mechanical.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 12.3.2015 14:17, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>> On 12.3.2015 03:16, Tom Lane wrote:
> 
>>> I agree though that it's worth considering defining 
>>> pg_attribute.attnum as the logical column position so as to minimize 
>>> the effects on client-side code. I doubt there is much stuff 
>>> client-side that cares about column creation order, but there is 
>>> plenty that cares about logical column order. OTOH this would 
>>> introduce confusion into the backend code, since Alvaro's definition 
>>> of attnum is what most of the backend should care about.
>>
>> IMHO reusing attnum for logical column order would actually make it more
>> complex, especially if we allow users to modify the logical order using
>> ALTER TABLE. Because if you change it, you have to walk through all the
>> places where it might be referenced and update those too (say, columns
>> referenced in indexes and such). Keeping attnum immutable makes this
>> much easier and simpler.
> 
> I think you're misunderstanding. The suggestion, as I understand it,
> is to rename the attnum column to something else (maybe, say,
> attidnum), and rename attlognum to attnum. That preserves the
> existing property that "ORDER BY attnum" gives you the correct view
> of the table from the point of view of the user. That's very useful
> because it means clients looking at pg_attribute need less changes,
> or maybe none at all.

Hmm ... I understood it as a suggestion to drop attlognum and just
define (attnum, attphysnum).

> I think this wouldn't be too difficult to implement, because there 
> aren't that many places that refer to the column-identity attribute
> by name; most of them just grab the TupleDesc->attrs array in
> whatever order is appropriate and scan that in a loop. Only a few of
> these use att->attnum inside the loop --- that's what would need to
> be changed, and it should be pretty mechanical.

I think it's way more complicated. We may fix all the pieces of the
code, but that's not all - attnum is referenced in various system views,
catalogs and such. For example pg_stats view does this:
 FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)   JOIN pg_attribute a ON (c.oid = attrelid AND attnum =
s.staattnum)    LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace) WHERE NOT attisdropped   AND
has_column_privilege(c.oid,a.attnum, 'select');
 

information_schema also uses attnum on many places too.

I see the catalogs as a kind of public API, and redefining the meaning
of an existing column this way seems tricky, especially when we
reference it from other catalogs - I'm pretty sure there's plenty of SQL
queries in various tools that rely on this. Just google for "pg_indexes
indkeys unnest" and you'll find posts like this one from Craig:


http://stackoverflow.com/questions/18121103/how-to-get-the-index-column-orderasc-desc-nulls-first-from-postgresql

specifically tell people to do this:

SELECT   ...   FROM (     SELECT       pg_class.relname,       ...       unnest(pg_index.indkey) AS k     FROM pg_index
   INNER JOIN pg_class ON pg_index.indexrelid = pg_class.oid   ) i   ...   INNER JOIN pg_attribute ON
(pg_attribute.attrelid= i.indrelid                           AND pg_attribute.attnum = k);
 

which specifically tells people to match attnum vs. indkeys. If we
redefine the meaning of attnum, and instead match indkeys against a
different column (say, attidnum), all those queries will be broken.

Which actually breaks the catalog definition as specified here:
 http://www.postgresql.org/docs/devel/static/catalog-pg-index.html

which explicitly says that indkey references pg_attribute.attnum.

But maybe we don't really care about breaking this API and it is a good
approach - I need to think about it and try it.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Tomas Vondra wrote:
> On 12.3.2015 14:17, Alvaro Herrera wrote:
> > Tomas Vondra wrote:
> >> On 12.3.2015 03:16, Tom Lane wrote:
> > 
> >>> I agree though that it's worth considering defining 
> >>> pg_attribute.attnum as the logical column position so as to minimize 
> >>> the effects on client-side code. I doubt there is much stuff 
> >>> client-side that cares about column creation order, but there is 
> >>> plenty that cares about logical column order. OTOH this would 
> >>> introduce confusion into the backend code, since Alvaro's definition 
> >>> of attnum is what most of the backend should care about.
> >>
> >> IMHO reusing attnum for logical column order would actually make it more
> >> complex, especially if we allow users to modify the logical order using
> >> ALTER TABLE. Because if you change it, you have to walk through all the
> >> places where it might be referenced and update those too (say, columns
> >> referenced in indexes and such). Keeping attnum immutable makes this
> >> much easier and simpler.
> > 
> > I think you're misunderstanding. The suggestion, as I understand it,
> > is to rename the attnum column to something else (maybe, say,
> > attidnum), and rename attlognum to attnum. That preserves the
> > existing property that "ORDER BY attnum" gives you the correct view
> > of the table from the point of view of the user. That's very useful
> > because it means clients looking at pg_attribute need less changes,
> > or maybe none at all.
> 
> Hmm ... I understood it as a suggestion to drop attlognum and just
> define (attnum, attphysnum).

Pretty sure it wasn't that.

> > I think this wouldn't be too difficult to implement, because there 
> > aren't that many places that refer to the column-identity attribute
> > by name; most of them just grab the TupleDesc->attrs array in
> > whatever order is appropriate and scan that in a loop. Only a few of
> > these use att->attnum inside the loop --- that's what would need to
> > be changed, and it should be pretty mechanical.
> 
> I think it's way more complicated. We may fix all the pieces of the
> code, but that's not all - attnum is referenced in various system views,
> catalogs and such. For example pg_stats view does this:
> 
>   FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
>     JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
>       LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
>   WHERE NOT attisdropped
>     AND has_column_privilege(c.oid, a.attnum, 'select');
> 
> information_schema also uses attnum on many places too.

Those can be fixed with relative ease to refer to attidnum instead.

> I see the catalogs as a kind of public API, and redefining the meaning
> of an existing column this way seems tricky, especially when we
> reference it from other catalogs - I'm pretty sure there's plenty of SQL
> queries in various tools that rely on this.

That's true, but then we've never promised that system catalogs remain
unchanged forever.  That would essentially stop development.

However, there's a difference between making a query silently given
different results, and breaking it completely forcing the user to
re-study how to write it.  I think the latter is better.  In that light
we should just drop attnum as a column name, and use something else:
maybe (attidnum, attlognum, attphysnum).  So all queries in the wild
would be forced to be updated, but we would not silently change
semantics instead.

> Which actually breaks the catalog definition as specified here:
> 
>   http://www.postgresql.org/docs/devel/static/catalog-pg-index.html
> 
> which explicitly says that indkey references pg_attribute.attnum.

That's a simple doc fix.

> But maybe we don't really care about breaking this API and it is a good
> approach - I need to think about it and try it.

Yeah, thanks.


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Andres Freund
Date:
Hi,

On 2015-03-11 22:16:52 -0400, Tom Lane wrote:
> I agree though that it's worth considering defining pg_attribute.attnum as
> the logical column position so as to minimize the effects on client-side
> code.

I actually wonder if it'd not make more sense to define it as the
physical column number. That'd reduce the invasiveness and risk of the
patch considerably. It means that most existing code doesn't have to be
changed and can just continue to refer to attnum like today. There's
much less risk of it being wrongly used to refer to the physical offset
instead of creation order.  Queries against attnum would still give a
somewhat sane response.

It would make some ALTER TABLE commands a bit more complex if we want to
allow reordering the physical order. But that seems like a much more
localized complexity than previous patches in this thread (although I've
not looked at the last version).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: logical column ordering

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> However, there's a difference between making a query silently given
> different results, and breaking it completely forcing the user to
> re-study how to write it.  I think the latter is better.  In that light
> we should just drop attnum as a column name, and use something else:
> maybe (attidnum, attlognum, attphysnum).  So all queries in the wild
> would be forced to be updated, but we would not silently change
> semantics instead.

Hm.  I'm on board with renaming like that inside the backend, but
I'm very much less excited about forcibly breaking client queries.
I think there is little if any client-side code that will care about
either attidnum or attphysnum, so forcing people to think about it
will just create make-work.
        regards, tom lane



Re: logical column ordering

From
Peter Eisentraut
Date:
On 3/12/15 10:07 AM, Andres Freund wrote:
> I actually wonder if it'd not make more sense to define it as the
> physical column number. That'd reduce the invasiveness and risk of the
> patch considerably.

Clearly, the number of places where attnum has to be changed to
something else is not zero, and so it doesn't matter if a lot or a few
have to be changed.  They all have to be looked at and considered.




Re: logical column ordering

From
Peter Eisentraut
Date:
On 3/11/15 10:16 PM, Tom Lane wrote:

> I think using an OID would break more stuff than it fixes in dependency
> tracking; in particular you would now need an explicit dependency link
> from each column to its table, because the "sub-object" knowledge would
> no longer work.

That might not be a bad thing, but ...

> In any case this patch is going to be plenty big enough
> already without saddling it with a major rewrite of the dependency system.

... is true.




Re: logical column ordering

From
Jeff Janes
Date:
On Mon, Feb 23, 2015 at 3:09 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

attached is the result of my first attempt to make the logical column
ordering patch work. This touches a lot of code in the executor that is
mostly new to me, so if you see something that looks like an obvious
bug, it probably is (so let me know).

There is an apply conflict in src/backend/parser/parse_relation.c in the comments for scanRTEForColumn.

It seems like it should be easy to ignore, but when I ignore it I get make check failing all over the place.

(The first patch posted in this thread seems to work for me, I did some testing on it before I realized it was obsolete.)

Cheers,

Jeff

Re: logical column ordering

From
Robert Haas
Date:
On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> However, there's a difference between making a query silently given
> different results, and breaking it completely forcing the user to
> re-study how to write it.  I think the latter is better.  In that light
> we should just drop attnum as a column name, and use something else:
> maybe (attidnum, attlognum, attphysnum).  So all queries in the wild
> would be forced to be updated, but we would not silently change
> semantics instead.

+1 for that approach.  Much better to break all of the third-party
code out there definitively than to bet on which attribute people are
going to want to use most commonly.

I'm a little confused as to the status of this patch.  It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December.  The fact that the new CommitFest
application encourages people to blindly move things to the next CF
instead of forcing patch authors to reopen the record when they update
the patch is, IMHO, not good.  It's just going to lead to the CF
application filling up with things that the authors aren't really
working on.  We've got enough work to do with the patches that are
actually under active development.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: logical column ordering

From
Peter Geoghegan
Date:
On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm a little confused as to the status of this patch.  It's marked as
> Waiting on Author in the CommitFest application, and the last patch
> version was posted in December.  The fact that the new CommitFest
> application encourages people to blindly move things to the next CF
> instead of forcing patch authors to reopen the record when they update
> the patch is, IMHO, not good.  It's just going to lead to the CF
> application filling up with things that the authors aren't really
> working on.  We've got enough work to do with the patches that are
> actually under active development.

Maybe there should be a "stalled" patch status summary, that
highlights patches that have not had their status change in (say) 2
weeks. Although it wouldn't really be a status summary, since that
they're mutually exclusive with each other in the CF app (e.g. a patch
cannot be both "Waiting on Author" and "Ready for Committer").


-- 
Peter Geoghegan



Re: logical column ordering

From
Jeff Janes
Date:
On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> However, there's a difference between making a query silently given
> different results, and breaking it completely forcing the user to
> re-study how to write it.  I think the latter is better.  In that light
> we should just drop attnum as a column name, and use something else:
> maybe (attidnum, attlognum, attphysnum).  So all queries in the wild
> would be forced to be updated, but we would not silently change
> semantics instead.

+1 for that approach.  Much better to break all of the third-party
code out there definitively than to bet on which attribute people are
going to want to use most commonly.

I'm a little confused as to the status of this patch.  It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December. 

There was a patch here, which in the commit fest is "hidden" behind other non-attachments in the same email:

Attachment (randomize.sql) at 2015-02-27 19:10:21 from Tomas Vondra <tomas.vondra at 2ndquadrant.com>

But that patch failed the majority of "make check" checks in my hands.  So I also don't know what the status is.

Cheers,

Jeff

Re: logical column ordering

From
Andres Freund
Date:
On 2015-03-23 13:01:48 -0400, Robert Haas wrote:
> I'm a little confused as to the status of this patch.  It's marked as
> Waiting on Author in the CommitFest application, and the last patch
> version was posted in December.

I think it fairly can be marked as "returned with feedback" for now?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: logical column ordering

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2015-03-23 13:01:48 -0400, Robert Haas wrote:
> > I'm a little confused as to the status of this patch.  It's marked as
> > Waiting on Author in the CommitFest application, and the last patch
> > version was posted in December.
> 
> I think it fairly can be marked as "returned with feedback" for now?

... which means that no useful feedback was received at all in this
round for this patch.  (There was lots of feedback, mind you, but as far
as I can see it was all on the subject of how the patch is going to be
summarily rejected unless user-visible controls are offered -- and you
already know my opinion on that matter.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Andres Freund
Date:
On 2015-03-23 14:19:50 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2015-03-23 13:01:48 -0400, Robert Haas wrote:
> > > I'm a little confused as to the status of this patch.  It's marked as
> > > Waiting on Author in the CommitFest application, and the last patch
> > > version was posted in December.
> > 
> > I think it fairly can be marked as "returned with feedback" for now?
> 
> ... which means that no useful feedback was received at all in this
> round for this patch.  (There was lots of feedback, mind you, but as far
> as I can see it was all on the subject of how the patch is going to be
> summarily rejected unless user-visible controls are offered -- and you
> already know my opinion on that matter.)

To me the actual blocker seems to be the implementation. Which doesn't
look like it's going to be ready for 9.5; there seems to be loads of
work left to do. It's hard to provide non flame-bait feedback if the
patch isn't ready. I'm not sure what review you'd like to see at this
stage?

I think your approach of concentrating on the technical parts is sane,
and I'd continue going that way.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: logical column ordering

From
Tomas Vondra
Date:
Hi,

On 23.3.2015 18:01, Robert Haas wrote:
> On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> However, there's a difference between making a query silently given
>> different results, and breaking it completely forcing the user to
>> re-study how to write it.  I think the latter is better.  In that light
>> we should just drop attnum as a column name, and use something else:
>> maybe (attidnum, attlognum, attphysnum).  So all queries in the wild
>> would be forced to be updated, but we would not silently change
>> semantics instead.
> 
> +1 for that approach.  Much better to break all of the third-party
> code out there definitively than to bet on which attribute people are
> going to want to use most commonly.
> 
> I'm a little confused as to the status of this patch.  It's marked as
> Waiting on Author in the CommitFest application, and the last patch
> version was posted in December.  The fact that the new CommitFest
> application encourages people to blindly move things to the next CF
> instead of forcing patch authors to reopen the record when they update
> the patch is, IMHO, not good.  It's just going to lead to the CF
> application filling up with things that the authors aren't really
> working on.  We've got enough work to do with the patches that are
> actually under active development.

The last version of the patch was submitted on 24/2 by me. Not sure why
it's not listed in the CF app, but it's here:

http://www.postgresql.org/message-id/54EBB312.7090000@2ndquadrant.com

I'm working on a new version of the patch, based on the ideas that were
mentioned in this thread. I plan to post a new version within a few
days, hopefully.

Anyway, it's obvious this patch won't make it into 9.5 - it's a lot of
subtle changes on many places, so it's not suitable for the last
commitfest. But the feedback is welcome, of course.


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
Hi,

On 23.3.2015 18:07, Peter Geoghegan wrote:
> On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I'm a little confused as to the status of this patch.  It's marked as
>> Waiting on Author in the CommitFest application, and the last patch
>> version was posted in December.  The fact that the new CommitFest
>> application encourages people to blindly move things to the next CF
>> instead of forcing patch authors to reopen the record when they update
>> the patch is, IMHO, not good.  It's just going to lead to the CF
>> application filling up with things that the authors aren't really
>> working on.  We've got enough work to do with the patches that are
>> actually under active development.
> 
> Maybe there should be a "stalled" patch status summary, that
> highlights patches that have not had their status change in (say) 2
> weeks. Although it wouldn't really be a status summary, since that
> they're mutually exclusive with each other in the CF app (e.g. a patch
> cannot be both "Waiting on Author" and "Ready for Committer").

Not sure how that's supposed to improve the situation? Also, when you
change the status to 'stalled', it only makes it more difficult to
identify why it was stalled (was it waiting for author or a review?).

What might be done is tracking "time since last patch/review", but I
really don't know how we're going to identify that considering the
problems with identifying which messages are patches.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Peter Geoghegan
Date:
On Mon, Mar 23, 2015 at 11:50 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Not sure how that's supposed to improve the situation? Also, when you
> change the status to 'stalled', it only makes it more difficult to
> identify why it was stalled (was it waiting for author or a review?).
>
> What might be done is tracking "time since last patch/review", but I
> really don't know how we're going to identify that considering the
> problems with identifying which messages are patches.


Perhaps I explained myself poorly. I am proposing having a totally
automated/mechanical way of highlighting no actual change in status in
the CF app. So I think we are in agreement here, or close enough. I
was just talking about a somewhat arbitrary point at which patches are
considered to have stalled within the CF app.

-- 
Peter Geoghegan



Re: logical column ordering

From
Tomas Vondra
Date:
On 23.3.2015 18:08, Jeff Janes wrote:
> On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>> wrote:
> 
> There was a patch here, which in the commit fest is "hidden" behind
> other non-attachments in the same email:
> 
> Attachment (randomize.sql
> <http://www.postgresql.org/message-id/attachment/37076/randomize.sql>)
> at 2015-02-27 19:10:21
> <http://www.postgresql.org/message-id/54F0C11D.7000906@2ndquadrant.com/> from
> Tomas Vondra <tomas.vondra at 2ndquadrant.com <http://2ndquadrant.com>>
> 
> But that patch failed the majority of "make check" checks in my hands. 
> So I also don't know what the status is.

Ummm, that's not a patch but a testing script ...

There was a patch submitted on 23/2, and I believe that passes most make
check tests, except for two IIRC. But it's not perfect - it was the
first version that mostly worked, and was somehow suitable for getting
feedback.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 23.3.2015 19:52, Peter Geoghegan wrote:
> On Mon, Mar 23, 2015 at 11:50 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Not sure how that's supposed to improve the situation? Also, when you
>> change the status to 'stalled', it only makes it more difficult to
>> identify why it was stalled (was it waiting for author or a review?).
>>
>> What might be done is tracking "time since last patch/review", but
>> I really don't know how we're going to identify that considering
>> the problems with identifying which messages are patches.
> 
> 
> Perhaps I explained myself poorly. I am proposing having a totally 
> automated/mechanical way of highlighting no actual change in status
> in the CF app. So I think we are in agreement here, or close enough.
> I was just talking about a somewhat arbitrary point at which patches
> are considered to have stalled within the CF app.

Oh, right. Yes, tracking time since the last status change like this
might be useful, although my experience is that many patches are stuck
at some status yet there was a long discussion on the list ...  Not sure
if that counts as 'stalled'.


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Josh Berkus
Date:
On 03/23/2015 02:32 PM, Tomas Vondra wrote:
> Oh, right. Yes, tracking time since the last status change like this
> might be useful, although my experience is that many patches are stuck
> at some status yet there was a long discussion on the list ...  Not sure
> if that counts as 'stalled'.

"Time since last email" maybe.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: logical column ordering

From
Tomas Vondra
Date:
On 23.3.2015 18:30, Andres Freund wrote:
>>>
>>> I think it fairly can be marked as "returned with feedback" for
>>> now?

That will eventually be the end result, yes. If it's time to do that
now, or leave the patch in the CF and only bounce it at the end, I don't
know.

>>
>> ... which means that no useful feedback was received at all in
>> this round for this patch. (There was lots of feedback, mind you,
>> but as far as I can see it was all on the subject of how the patch
>> is going to be summarily rejected unless user-visible controls are
>> offered -- and you already know my opinion on that matter.)
> 
> To me the actual blocker seems to be the implementation. Which
> doesn't look like it's going to be ready for 9.5; there seems to be
> loads of work left to do. It's hard to provide non flame-bait
> feedback if the patch isn't ready. I'm not sure what review you'd
> like to see at this stage?

The version I posted at the end of February is certainly incomplete (and
some of the regression tests fail), but it seemed reasonably complete to
get some feedback. That is not to say parts of the patch are probably
wrong / need rework.

> I think your approach of concentrating on the technical parts is
> sane, and I'd continue going that way.

I do work in that direction. OTOH I think it's useful to provide some
sort of "minimum usable API" so that people can actually use it without
messing with catalogs directly. It certainly won't have all the bells
and whistles, though.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Tomas Vondra
Date:
On 23.3.2015 22:53, Jeff Janes wrote:
> On Mon, Mar 23, 2015 at 11:52 AM, Tomas Vondra
> 
> Sorry, the 23/2 one is the one I meant.  I got confused over which of
> the emails listed as having an attachment but no patch was the one that
> actually had a patch.  (If the commitfest app can't correctly deal with
> more than one attachment, it needs to at least give an indication that
> this condition may exist).
> 
> But I am still getting a lot of errors during make check.
> 
> 60 of 153 tests failed
> 
> Some of them look like maybe a change in the expected output file didn't
> get included in the patch, but at least one was a coredump.

Yes, there were two coredumps (as noted in the message with the patch).

Not sure of the other errors - it certainly is possible I forgot to
include something in the patch. Thanks for noticing this, will look into
that.


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Kevin Grittner
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> However, there's a difference between making a query silently given
>> different results, and breaking it completely forcing the user to
>> re-study how to write it.  I think the latter is better.  In that light
>> we should just drop attnum as a column name, and use something else:
>> maybe (attidnum, attlognum, attphysnum).  So all queries in the wild
>> would be forced to be updated, but we would not silently change
>> semantics instead.
>
> +1 for that approach.  Much better to break all of the third-party
> code out there definitively than to bet on which attribute people are
> going to want to use most commonly.

+1

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: logical column ordering

From
Alvaro Herrera
Date:
I've been looking at this again.  It has become apparent to me that what
we're doing in parse analysis is wrong, and the underlying node
representation is wrong too.  Here's a different approach, which I hope
will give better fruits.  I'm still working on implementing the ideas
here (and figuring out what the fallout is).

Currently, the patch stores RangeTblEntry->eref->colnames in logical
order; and it also adds a "map" from logical colnums to "attnum" (called
"lognums").  Now this is problematic for two reasons:

1. the lognums map becomes part of the stored representation of a rule;
any time you modified the logical ordering of a table underlying some
view, the view's _RETURN rule would have to be modified as well.  Not
good.

2. RTE->eref->colnames is in attlognum order and thus can only be sanely
interpreted if RTE->lognums is available, so not only lognums would have
to be modified, but colnames as well.

I think the solution to both these issues is to store colnames in attnum
ordering not logical, and *not* output RTE->lognums as part of
_outRangeTblEntry.  This means that every time we read the RTE for the
table we need to obtain lognums from its tupledesc.  RTE->eref->colnames
can then be sorted appropriately at plan time.

At RTE creation time (addRangeTableEntry and siblings) we can obtain
lognums and physnums.  Both these arrays are available for later
application in setrefs.c, avoiding the need of the obviously misplaced
relation_open() call we currently have there.

There is one gotcha, which is that expandTupleDesc (and, really,
everything from expandRTE downwards) will need to be split in two
somehow: one part needs to fill in the colnames array in attnum order,
and the other part needs to expand the attribute array into Var nodes in
logical order.

(If you recall, we need attphysnums at setrefs.c time so that we can
fix-up any TupleDesc created from a targetlist so that it contains the
proper attphysnum values.  The attphysnum values for each attribute do
not propagate properly there, and I believe this is the mechanism to do
so.)

As I said, I'm still writing the first pieces of this so I'm not sure
what other ramifications it will have.  If there are any thoughts, I
would appreciate them.  (Particularly useful input on whether it is
acceptable to omit lognums/physnums from _outRangeTblEntry.)

An alternative idea would be to add lognums and physnums to RelOptInfo
instead of RangeTblEntry (we would do so during get_relation_info).  I'm
not sure how this works for setrefs.c though, if at all; the advantage
is that RelOptInfo is not part of stored rules so we don't have to worry
about not saving them there.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: logical column ordering

From
Robert Haas
Date:
On Tue, Apr 14, 2015 at 2:38 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> As I said, I'm still writing the first pieces of this so I'm not sure
> what other ramifications it will have.  If there are any thoughts, I
> would appreciate them.  (Particularly useful input on whether it is
> acceptable to omit lognums/physnums from _outRangeTblEntry.)

I think the general rule is that an RTE should not contain any
structure information about the underlying relation that can
potentially change: the OID is OK because it's immutable for any given
relation.  The relkind is not quite immutable because you can create a
_SELECT rule on a table and turn it into a view; I'm not sure how we
handle that, but it's a fairly minor exception anyhow.  Everything
else in the RTE, with the new and perhaps-unfortunate exception of
security quals, is stuff derived from what's in the query, not the
table.  I think it would be good for this to work the same way: the
structural information about the table should be found in the
relcache, not the RTE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.5 release scheduling (was Re: logical column ordering)

From
Noah Misch
Date:
On Thu, Dec 11, 2014 at 10:24:20AM -0800, Jeff Janes wrote:
> On Thu, Dec 11, 2014 at 8:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 2. The amount of pre-release testing we get from people outside the
> > hard-core development crowd seems to be continuing to decrease.
> > We were fortunate that somebody found the JSONB issue before it was
> > too late to do anything about it.

> We are not particularly inviting of feedback for whatever testing has been
> done.
> 
> The definitive guide seems to be
> https://wiki.postgresql.org/wiki/HowToBetaTest, and says:
> 
> You can report tests by email. You can subscribe to any PostgreSQL mailing
> list from the subscription form <http://www.postgresql.org/community/lists/>
> .
> 
>    - pgsql-bugs: this is the preferred mailing list if you think you have
>    found a bug in the beta. You can also use the Bug Reporting Form
>    <http://www.postgresql.org/support/submitbug/>.
>    - pgsql-hackers: bugs, questions, and successful test reports are
>    welcome here if you are already subscribed to pgsql-hackers. Note that
>    pgsql-hackers is a high-traffic mailing list with a lot of development
>    discussion.
> 
> 
> =========
> 
> So if you find a bug, you can report it on the bug reporting form (which
> doesn't have a drop-down entry for 9.4RC1).

Let's get 9.5 alpha/beta/rc releases into that drop-down as we release them.

> If you have positive results rather than negative ones (or even complaints
> that are not actually bugs), you can subscribe to a mailing list which
> generates a lot of traffic which is probably over your head and not
> interesting to you.

Feel welcome to revise that part.  Don't messages from non-subscribed people
make it to the list after manual moderation?  Testers might want to create a
no-delivery subscription to avoid moderation delay, but the decision to
receive all -hackers mail is separate.

> Does the core team keep a mental list of items they want to see tested by
> the public, and they will spend their own time testing those things
> themselves if they don't hear back on some positive tests for them?

Not sure about the core team.  I myself would test essentially the same things
during beta regardless of what end users report having tested, because end
users will pick different test scenarios for the same features.

> If we find reports of public testing that yields good results (or at least
> no bugs) to be useful, we should be more clear on how to go about doing
> it.  But are positive reports useful?  If I report a bug, I can write down
> the steps to reproduce it, and then follow my own instructions to make sure
> it does actually reproduce it.  If I find no bugs, it is just "I did a
> bunch of random stuff and nothing bad happened, that I noticed".

Positive reports have potential to be useful.  In particular, mention the new
features you took action to try.  Areas like BRIN, pg_rewind, foreign tables,
event triggers, CREATE POLICY, INSERT ... ON CONFLICT, and GROUPING SETS are
either completely new or have new sub-features.  If nothing else, we can CC
reporters when considering changes to features they reported upon.  Other
analysis would become attractive given a larger corpus of positive reports.

Thanks,
nm