Thread: Find duplicates in a column then print Information to a file

Find duplicates in a column then print Information to a file

From

Sherman Willden

Date:

31 December 2017, 21:19:15

Development Platform: Ubuntu 17.10 mainly command line work

Tools: perl 5.26 and postgresql 9.6

Goal: Display duplicate aria titles on screen and to a local file

Database name: arias

Table name: aria_precis

csv delimiter: the # symbol

arias=# \d aria_precis
   Table "public.aria_precis"
   Column    | Type | Modifiers
-------------+------+-----------
id          | text | not null
aria        | text |
artist      | text |
a_artist    | text |
album_title | text |
Indexes:
    "aria_precis_pkey" PRIMARY KEY, btree (id)

Problems:
1. Can't connect aria title to id
2. Can't write discovered information to file

I know about this link but I probably didn't understand what I read there.
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-WITH

What I know displays the 46 duplicate aria titles of 413 entries but not the id or artist:
SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY aria HAVING COUNT(aria)>1);

When I perform the following I get (0 rows):
SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY id, aria HAVING COUNT(aria)>1);
aria
------
(0 rows)

After I get the above information how do I write that information to a file? The only thing I know writes the entire database to a file:
\COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,' CSV HEADER;

Thank you;

Sherman

Re: Find duplicates in a column then print Information to a file

From

Martin Moore

Date:

31 December 2017, 21:23:59

From: Sherman Willden <operasopranos@gmail.com>
Date: Sunday, 31 December 2017 at 18:19
To: <pgsql-general@postgresql.org>
Subject: Find duplicates in a column then print Information to a file

========================================

On the cmdline just do:

psql arias <pwd> -c ‘SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY aria HAVING COUNT(aria)>1)‘ > outfile.txt

Martin.

Re: Find duplicates in a column then print Information to a file

From

Francisco Olarte

Date:

31 December 2017, 21:53:43

Mmmmm, I notice a faint homework smell here ;-> , but the question is
nicely asked so:

On Sun, Dec 31, 2017 at 7:19 PM, Sherman Willden
<operasopranos@gmail.com> wrote:
...
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY aria  HAVING COUNT(aria)>1);

The outer select is fully redundant. Just use the inner one. In fact,
not redundnat, harmful, as it will spit many copies of aria for the
duplicated ones. If you add id (and/or artist) to the outer query then
it is useful again.

> When I perform the following I get (0 rows):
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY id, aria  HAVING COUNT(aria)>1);

As expected. See the inner query, you are grouping by ID which is the
primary key, by PK definition all counts are going to be one, so no
results, nothing goes to the outer query.

Use the first query, but adding id and artist to the OUTER level.

> After I get the above information how do I write that information to a file?
> The only thing I know writes the entire database to a file:
>  \COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,'
> CSV HEADER;

If copy format is ok to you, knowing \copy look at the docs, looking
at https://www.postgresql.org/docs/9.6/static/app-psql.html you will
read:

>>>
\copy { table [ ( column_list ) ] | ( query ) } { from | to } {
'filename' | program 'command' | stdin | stdout | pstdin | pstdout } [
[ with ] ( option [, ...] ) ]

Performs a frontend (client) copy. This is an operation that runs an
SQL COPY command, but instead of the server reading or writing the
specified file, psql reads or writes the file and routes the data
between the server and the local file system. This means that file
accessibility and privileges are those of the local user, not the
server, and no SQL superuser privileges are required.
<<<

Notice the (query) option? Your copy is using the table+optional
column list format, (aria_precis), just change it to the "( query )"
format ( NOTICE THE MANDATORY PARENS, this is how copy knows what to
do. "t" => table, "t(c1)"=>table+columns, "(xx)" => query ( nothing
before the opening parens ). )

Happy new year.

Francisco Olarte.

Re: Find duplicates in a column then print Information to a file

From

Sherman Willden

Date:

31 December 2017, 22:04:21

Thank you for the replies. I will start working on them now. Not a student but since I now have the time I may look into it. I am 71 retired working at Home Depot. I have a collection of CDs by various artists and I have the time to create and maintain my own database concerning these subjects. I retired from USAF, SUN Microsystems, and HP where I worked with the UNIX systems. As I said my main question right now is which duplicate songs do I have and who performs them. I really appreciate the support I get here.

Sherman

On Sun, Dec 31, 2017 at 11:53 AM, Francisco Olarte <folarte@peoplecall.com> wrote:

Mmmmm, I notice a faint homework smell here ;-> , but the question is
nicely asked so:

On Sun, Dec 31, 2017 at 7:19 PM, Sherman Willden
<operasopranos@gmail.com> wrote:
...
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY aria HAVING COUNT(aria)>1);

The outer select is fully redundant. Just use the inner one. In fact,
not redundnat, harmful, as it will spit many copies of aria for the
duplicated ones. If you add id (and/or artist) to the outer query then
it is useful again.

> When I perform the following I get (0 rows):
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY id, aria HAVING COUNT(aria)>1);

As expected. See the inner query, you are grouping by ID which is the
primary key, by PK definition all counts are going to be one, so no
results, nothing goes to the outer query.

Use the first query, but adding id and artist to the OUTER level.

> After I get the above information how do I write that information to a file?
> The only thing I know writes the entire database to a file:
> \COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,'
> CSV HEADER;

If copy format is ok to you, knowing \copy look at the docs, looking
at https://www.postgresql.org/docs/9.6/static/app-psql.html you will
read:

>>>
\copy { table [ ( column_list ) ] | ( query ) } { from | to } {
'filename' | program 'command' | stdin | stdout | pstdin | pstdout } [
[ with ] ( option [, ...] ) ]

Performs a frontend (client) copy. This is an operation that runs an
SQL COPY command, but instead of the server reading or writing the
specified file, psql reads or writes the file and routes the data
between the server and the local file system. This means that file
accessibility and privileges are those of the local user, not the
server, and no SQL superuser privileges are required.
<<<

Notice the (query) option? Your copy is using the table+optional
column list format, (aria_precis), just change it to the "( query )"
format ( NOTICE THE MANDATORY PARENS, this is how copy knows what to
do. "t" => table, "t(c1)"=>table+columns, "(xx)" => query ( nothing
before the opening parens ). )

Happy new year.

Francisco Olarte.

Re: Find duplicates in a column then print Information to a file

From

Rob Sargent

Date:

31 December 2017, 22:23:21

Keep in mind there is a quick write-to-file in psql with ‘\o <filename>’. And don’t forget to turn it off with ‘\o’

On Dec 31, 2017, at 12:04 PM, Sherman Willden <operasopranos@gmail.com> wrote:

Thank you for the replies. I will start working on them now. Not a student but since I now have the time I may look into it. I am 71 retired working at Home Depot. I have a collection of CDs by various artists and I have the time to create and maintain my own database concerning these subjects. I retired from USAF, SUN Microsystems, and HP where I worked with the UNIX systems. As I said my main question right now is which duplicate songs do I have and who performs them. I really appreciate the support I get here.

Sherman

On Sun, Dec 31, 2017 at 11:53 AM, Francisco Olarte <folarte@peoplecall.com> wrote:
Mmmmm, I notice a faint homework smell here ;-> , but the question is
nicely asked so:

On Sun, Dec 31, 2017 at 7:19 PM, Sherman Willden
<operasopranos@gmail.com> wrote:
...
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY aria HAVING COUNT(aria)>1);

The outer select is fully redundant. Just use the inner one. In fact,
not redundnat, harmful, as it will spit many copies of aria for the
duplicated ones. If you add id (and/or artist) to the outer query then
it is useful again.

> When I perform the following I get (0 rows):
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY id, aria HAVING COUNT(aria)>1);

As expected. See the inner query, you are grouping by ID which is the
primary key, by PK definition all counts are going to be one, so no
results, nothing goes to the outer query.

Use the first query, but adding id and artist to the OUTER level.

> After I get the above information how do I write that information to a file?
> The only thing I know writes the entire database to a file:
> \COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,'
> CSV HEADER;

If copy format is ok to you, knowing \copy look at the docs, looking
at https://www.postgresql.org/docs/9.6/static/app-psql.html you will
read:

>>>
\copy { table [ ( column_list ) ] | ( query ) } { from | to } {
'filename' | program 'command' | stdin | stdout | pstdin | pstdout } [
[ with ] ( option [, ...] ) ]

Performs a frontend (client) copy. This is an operation that runs an
SQL COPY command, but instead of the server reading or writing the
specified file, psql reads or writes the file and routes the data
between the server and the local file system. This means that file
accessibility and privileges are those of the local user, not the
server, and no SQL superuser privileges are required.
<<<

Notice the (query) option? Your copy is using the table+optional
column list format, (aria_precis), just change it to the "( query )"
format ( NOTICE THE MANDATORY PARENS, this is how copy knows what to
do. "t" => table, "t(c1)"=>table+columns, "(xx)" => query ( nothing
before the opening parens ). )

Happy new year.

Francisco Olarte.

Re: Find duplicates in a column then print Information to a file

From

Tom Lane

Date:

31 December 2017, 23:08:49

Rob Sargent <robjsargent@gmail.com> writes:
> Keep in mind there is a quick write-to-file in psql with ‘\o <filename>’. And don’t forget to turn it off with ‘\o’

See also "\g filename", for one-shot output.

            regards, tom lane