Thread: Count of non-null values per table column

Count of non-null values per table column

From
David Nelson
Date:
Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null
valuesfor all columns as a single number. I want the count for each column by column.<br><br>I have inherited a
databaseconsisting of two related huge monolithic tables that lack referential integrity between them, or even basic
dataconstraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to
understandthe information that is actually stored and used so that I can implement a (hopefully) better design. Towards
thatend I would like to know the count of non-null values in each column per column. In other words I would like to get
thefollowing output from a table (the numbers are totally made up):<br><br>column_name |
num_values<br>------------+-----------<br>col1       
|      5787<br>------------+-----------<br>col2       
|        
17<br>------------+-----------<br>col3       
|       
567<br>------------+-----------<br>col4       |      
5787<br>------------+-----------<br>col5       
|       
143<br>------------+-----------<br>col6       
|         1<br>------------+-----------<br>...<br><br>Is this possible
throughone or more of the system views, or will I need to write a function to do this? Obviously I can just issue
multipleSELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David                            

Re: Count of non-null values per table column

From
David Nelson
Date:
Well it is certainly nice to see that my choice to send my question using plain text was honored by this email service.
Apologiesfor that mess. The output I am looking for is a series of rows with two columns, one the name of the table
column,and the other the count of non-null values in a table's column of that same name, for all column names in the
table.<br><br>Thanks<br><br>----------------------------------------<br>>From: dlnelson77808@outlook.com<br>> To:
pgsql-general@postgresql.org<br>>Subject: [GENERAL] Count of non-null values per table column<br>> Date: Fri, 14
Aug2015 14:32:36 +0000<br>><br>> Hello list,<br><br>Apologies if this has been asked before. My
searchonly turned up ways to list the total non-null values for all columns as a single number. I want the count for
eachcolumn by column.<br><br>I have inherited a database consisting of two related huge monolithic tables
thatlack referential integrity between them, or even basic data constraints. One of the problems these tables have is
everysingle non-PK column is NULLable. I am trying to understand the information that is actually stored and used so
thatI can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in
eachcolumn per column. In other words I would like to get the following output from a table (the numbers are totally
madeup):<br><br>column_name |
num_values<br>------------+-----------<br>col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5787<br>------------+-----------<br>col2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
17<br>------------+-----------<br>col3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
567<br>------------+-----------<br>col4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5787<br>------------+-----------<br>col5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
143<br>------------+-----------<br>col6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
1<br>------------+-----------<br>...<br><br>Isthis possible through one or more of the system
views,or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)...
statements,but I'd rather not.<br><br>Thanks,<br>David<br>><br>> --<br>> Sent via
pgsql-generalmailing list (pgsql-general@postgresql.org)<br>> To make changes to your subscription:<br>>
http://www.postgresql.org/mailpref/pgsql-general<br>                           

Re: Count of non-null values per table column

From
David Rowley
Date:
On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.<br><br>I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):<br><br>column_name | num_values<br>------------+-----------<br>col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5787<br>------------+-----------<br>col2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 17<br>------------+-----------<br>col3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 567<br>------------+-----------<br>col4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5787<br>------------+-----------<br>col5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 143<br>------------+-----------<br>col6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1<br>------------+-----------<br>...<br><br>Is this possible through one or more of the system views, or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David


I didn't quite catch a Postgres version number in that mess :)

I assume the tables are quite large if you don't want to just issue a: SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL) ... (assuming you're on a version new enough to support agg FILTER) 

On the other hand if you were happy with just an approximation then you could look at pg_stats;

create table abc(a int, b int,c int);
insert into abc values(1, 1, NULL),(1, NULL, NULL),(NULL, NULL, NULL);
analyze abc;

select attname, null_frac from pg_stats where tablename = 'abc';

 attname | null_frac
---------+-----------
 a       |  0.333333
 b       |  0.666667
 c       |         1

Keep in mind though that this is an *approximation* and possibly could be inaccurate. If you want exact do the COUNT(col).

If you're not on a version new enough for COUNT(col) FILTER WHERE ..., you could just SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END)

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
 

Re: Count of non-null values per table column

From
John McKown
Date:
David, 

It still came through as junk. But I reconstructed it below

=== original message ===
Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.

I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):

column_name | num_values
------------+-----------
col1        |       5787
------------+-----------
col2        |         17
------------+-----------
col3        |        567
------------+-----------
col4        |       5787
------------+-----------
col5        |        143
------------+-----------
col6        |          1
------------+-----------

====

​I assume what "num_values" contains is the _distinctly different_ number of values in "column_name" for each and every column name in a table.​ E.g. if "col1" contains value 'x' twice and 'y' ten times,then "num_values" would be 2, not 12. Or do you really want the 12? I'm unsure.


--

Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

Re: Count of non-null values per table column

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
>> Hello list,<br><br>Apologies if this has been asked before. My search only
>> turned up ways to list the total non-null values for all columns as a
>> single number. I want the count for each column by column.

> I assume the tables are quite large if you don't want to just issue a:
> SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
> ... (assuming you're on a version new enough to support agg FILTER)

AFAIK this should work in any version, or indeed any SQL-compliant DBMS:

      select count(col1), count(col2), ... from table;

COUNT with an argument counts the non-null values of that argument.

> On the other hand if you were happy with just an approximation then you
> could look at pg_stats;

Yeah; you might want to ANALYZE the table first to be sure the stats are
up to date.

            regards, tom lane


Re: Count of non-null values per table column

From
David Nelson
Date:
On Fri, Aug 14, 2015 at 9:59 AM, John McKown <john.archie.mckown@gmail.com> wrote:
>
> David,
>
> It still came through as junk. But I reconstructed it below
>
> === original message ===
> Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.
>
> I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):
>
> column_name | num_values
> ------------+-----------
> col1        |       5787
> ------------+-----------
> col2        |         17
> ------------+-----------
> col3        |        567
> ------------+-----------
> col4        |       5787
> ------------+-----------
> col5        |        143
> ------------+-----------
> col6        |          1
> ------------+-----------
>
> ====
>
> I assume what "num_values" contains is the _distinctly different_ number of values in "column_name" for each and every column name in a table. E.g. if "col1" contains value 'x' twice and 'y' ten times,then "num_values" would be 2, not 12. Or do you really want the 12? I'm unsure.
>

Thanks John for fixing that which microsoft screwed up, and I've switched to my gmail account for this.

For my purposes 12 would be fine. I just want to know of the total number of
rows in the table, how many have a value in each column. I guess I'm actually trying to get the complement of the number of NULLs per column in the end.

BTW, aplologies for omitting basic info:
SELECT version();
                                                    version
                             
----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit
 
>
>
> --
>
> Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.
>
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
>
> He's about as useful as a wax frying pan.
>
> 10 to the 12th power microphones = 1 Megaphone
>
> Maranatha! <><
> John McKown

Re: Count of non-null values per table column

From
David Nelson
Date:
On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
> >> Hello list,<br><br>Apologies if this has been asked before. My search only
> >> turned up ways to list the total non-null values for all columns as a
> >> single number. I want the count for each column by column.
>
> > I assume the tables are quite large if you don't want to just issue a:
> > SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
> > ... (assuming you're on a version new enough to support agg FILTER)
>
> AFAIK this should work in any version, or indeed any SQL-compliant DBMS:
>
>       select count(col1), count(col2), ... from table;

Thanks Tom (I've switched to my gmail account for this conversation). Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.

BTW, aplologies for omitting basic info:
SELECT version();
                                                    version
                             
----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit

>
> COUNT with an argument counts the non-null values of that argument.
>
> > On the other hand if you were happy with just an approximation then you
> > could look at pg_stats;
>
> Yeah; you might want to ANALYZE the table first to be sure the stats are
> up to date.
>
>                         regards, tom lane
>

Re: Count of non-null values per table column

From
Ken Tanzer
Date:
On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com> wrote:
On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
 
Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.

You could use this to generate the SQL:

\set my_table my_real_table_name

 SELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT 'COUNT('||column_name::text ||') AS ' || column_name::text FROM information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM ' || :'my_table' || ';';

Cheers,
Ken
--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Count of non-null values per table column

From
David Nelson
Date:
On Fri, Aug 14, 2015 at 9:17 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com> wrote:
On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
 
Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.

You could use this to generate the SQL:

\set my_table my_real_table_name

 SELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT 'COUNT('||column_name::text ||') AS ' || column_name::text FROM information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM ' || :'my_table' || ';';

Oh yeah, SQL to generate SQL. Done it many times to create ETL scripts. Don't know why I didn't think of that here. Definitely a viable option.

Thanks Ken.
 

Cheers,
Ken
--
learn more about AGENCY or
follow the discussion.