Thread: pgsql: Add reusable routine for making arrays unique.

pgsql: Add reusable routine for making arrays unique.

From
Thomas Munro
Date:
Add reusable routine for making arrays unique.

Introduce qunique() and qunique_arg(), which can be used after qsort()
and qsort_arg() respectively to remove duplicate values.  Use it where
appropriate.

Author: Thomas Munro
Reviewed-by: Tom Lane (in an earlier version)
Discussion: https://postgr.es/m/CAEepm%3D2vmFTNpAmwbGGD2WaryM6T3hSDVKQPfUwjdD_5XY6vAA%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/7815e7efdb4ce9575b5d8460beb0dd2569d7ca3a

Modified Files
--------------
contrib/hstore/hstore_io.c           |  5 +++
contrib/intarray/_int_tool.c         | 19 +++--------
contrib/pg_trgm/trgm_op.c            | 25 ++------------
src/backend/access/nbtree/nbtutils.c | 19 ++---------
src/backend/executor/nodeTidscan.c   | 13 ++------
src/backend/utils/adt/acl.c          | 15 +++------
src/backend/utils/adt/tsgistidx.c    | 29 ++--------------
src/backend/utils/adt/tsquery_op.c   | 29 +++-------------
src/backend/utils/adt/tsvector.c     |  5 +--
src/backend/utils/adt/tsvector_op.c  | 59 +++++---------------------------
src/backend/utils/adt/txid.c         | 19 ++---------
src/backend/utils/cache/syscache.c   | 21 ++++--------
src/include/lib/qunique.h            | 65 ++++++++++++++++++++++++++++++++++++
13 files changed, 115 insertions(+), 208 deletions(-)


Re: pgsql: Add reusable routine for making arrays unique.

From
Peter Geoghegan
Date:
On Wed, Nov 6, 2019 at 8:03 PM Thomas Munro <tmunro@postgresql.org> wrote:
> Add reusable routine for making arrays unique.

I noticed that some of the copyright notices added by this commit look
like this:

+ * Portions Copyright (c) 2019, PostgreSQL Global Development Group

Shouldn't this simply read "Copyright (c) 2019 ..."? We're not sharing
the Copyright with Regents of the University of California, or anyone
else, so there it is not just a portion.

It would be nice to have some general guidance on copyright notices in
new files. I usually just copy what I see in nearby files, but that
doesn't seem particularly principled.

-- 
Peter Geoghegan



Re: pgsql: Add reusable routine for making arrays unique.

From
Alvaro Herrera
Date:
On 2020-Jan-04, Peter Geoghegan wrote:

> On Wed, Nov 6, 2019 at 8:03 PM Thomas Munro <tmunro@postgresql.org> wrote:
> > Add reusable routine for making arrays unique.
> 
> I noticed that some of the copyright notices added by this commit look
> like this:
> 
> + * Portions Copyright (c) 2019, PostgreSQL Global Development Group
> 
> Shouldn't this simply read "Copyright (c) 2019 ..."? We're not sharing
> the Copyright with Regents of the University of California, or anyone
> else, so there it is not just a portion.

Looking at the stats, it's clear that it took a lot of code from other
files, so it seems disingenuous to claim that it doesn't have even a
single line that isn't copyrighted by UCB regents.  I've seen this
argument used even for files that are completely new: they are likely
affected some API design aspect of existing code copyright of UCB
regents, so it too would have Portions (c) them.

> It would be nice to have some general guidance on copyright notices in
> new files. I usually just copy what I see in nearby files, but that
> doesn't seem particularly principled.

I think the easiest is to state that all files, even new files are
Portions (c) each of these entities, period.  Trying to distinguish code
that's not even a single line derived from UCB Regents seems really
labor-intensive.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Add reusable routine for making arrays unique.

From
Peter Geoghegan
Date:
On Sun, Jan 5, 2020 at 5:39 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Looking at the stats, it's clear that it took a lot of code from other
> files, so it seems disingenuous to claim that it doesn't have even a
> single line that isn't copyrighted by UCB regents.

 I took that claim at face value. Perhaps it was just an oversight.
But if it really wasn't, then it's not just "portions" of the
copyright that go to the PGDG.

> I think the easiest is to state that all files, even new files are
> Portions (c) each of these entities, period.  Trying to distinguish code
> that's not even a single line derived from UCB Regents seems really
> labor-intensive.

That seems like a reasonable policy to me, outside of third-party code
that gets vendored into the tree. I am also in favor of being
conservative about this.

-- 
Peter Geoghegan



Re: pgsql: Add reusable routine for making arrays unique.

From
Michael Paquier
Date:
On Sun, Jan 05, 2020 at 05:46:48PM -0800, Peter Geoghegan wrote:
> On Sun, Jan 5, 2020 at 5:39 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Looking at the stats, it's clear that it took a lot of code from other
>> files, so it seems disingenuous to claim that it doesn't have even a
>> single line that isn't copyrighted by UCB regents.
>
>  I took that claim at face value. Perhaps it was just an oversight.
> But if it really wasn't, then it's not just "portions" of the
> copyright that go to the PGDG.
>
>> I think the easiest is to state that all files, even new files are
>> Portions (c) each of these entities, period.  Trying to distinguish code
>> that's not even a single line derived from UCB Regents seems really
>> labor-intensive.
>
> That seems like a reasonable policy to me, outside of third-party code
> that gets vendored into the tree. I am also in favor of being
> conservative about this.

That's also a no-brainer.  So +1 to that.  If we were to make that
more formal, could we add something in the docs in [1]?  An idea could
be a new section dedicated to it.

[1]: https://www.postgresql.org/docs/12/source.html
--
Michael

Attachment

Re: pgsql: Add reusable routine for making arrays unique.

From
Alvaro Herrera
Date:
On 2020-Jan-06, Michael Paquier wrote:

> On Sun, Jan 05, 2020 at 05:46:48PM -0800, Peter Geoghegan wrote:
> > On Sun, Jan 5, 2020 at 5:39 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> >> I think the easiest is to state that all files, even new files are
> >> Portions (c) each of these entities, period.  Trying to distinguish code
> >> that's not even a single line derived from UCB Regents seems really
> >> labor-intensive.
> > 
> > That seems like a reasonable policy to me, outside of third-party code
> > that gets vendored into the tree. I am also in favor of being
> > conservative about this.
> 
> That's also a no-brainer.  So +1 to that.  If we were to make that
> more formal, could we add something in the docs in [1]?  An idea could
> be a new section dedicated to it.

That seems too extreme for me.  I'm happy to leave it as a gentlemen's
agreement.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services