Thread: Easiest way to compare the results of two queries row by row and column by column

Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.

Something like

select compare_results('select * from v_old', 'select * from v_new');

I would want this to check that the row count and each row matched
column by column.

I am hoping someone has already written something for this...



Jason Long-2 wrote
> Can someone suggest the easiest way to compare the results from two
> queries to make sure they are identical?

First thing that comes to mind:

     WITH
       before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )
     , after_qry  (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )
     , before_array AS (SELECT array_agg(before_qry) AS before_agg_array
FROM before_qry)
     , after_array  AS (SELECT array_agg(before_qry) AS after_agg_array FROM
before_qry)
     SELECT *, before_agg_array = after_agg_array
     FROM before_array CROSS JOIN after_array

Basically turn the resultsets into arrays (of composites) and then see if
the arrays are the same.  This has issues with respect to column names and
comparable datatypes (i.e., if one column is bigint and the other is integer
they still compare equally).

One thought would to only allow a view name (and possibly, separately, the
ORDER BY clause).  Catalog lookups can be used to check for identical view
output types.

No idea of something like this exists and is readily available.

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


On Thu, Jun 20, 2013 at 3:18 PM, Jason Long <mailing.lists@octgsoftware.com> wrote:
Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.

Something like

select compare_results('select * from v_old', 'select * from v_new');

I'd run:

select * from v_old 
except
select * from v_new ;

And then 

select * from v_new
except
select * from v_old ;

Both should return no rows.

However, if the queries can contain duplicate rows this will not detect differences in the number of times a row is replicated, i.e. if one query has a row 2 times and the other has it 3 times.  If you need to detect such cases, I'd probably \copy out each query to a file, then use system tools to sort and diff the files.

Cheers,

Jeff
Thank you.  I will give it a try.  I have never used WITH before.

Thank you for the tips.

On Thu, 2013-06-20 at 16:05 -0700, David Johnston wrote:
Jason Long-2 wrote
> Can someone suggest the easiest way to compare the results from two
> queries to make sure they are identical?

First thing that comes to mind:
    WITH       before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )    , after_qry  (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )    , before_array AS (SELECT array_agg(before_qry) AS before_agg_array
FROM before_qry)    , after_array  AS (SELECT array_agg(before_qry) AS after_agg_array FROM
before_qry)    SELECT *, before_agg_array = after_agg_array    FROM before_array CROSS JOIN after_array

Basically turn the resultsets into arrays (of composites) and then see if
the arrays are the same.  This has issues with respect to column names and
comparable datatypes (i.e., if one column is bigint and the other is integer
they still compare equally).

One thought would to only allow a view name (and possibly, separately, the
ORDER BY clause).  Catalog lookups can be used to check for identical view
output types.

No idea of something like this exists and is readily available.

David J.






--
View this message in context: http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.



On 06/21/2013 01:07 AM, Jeff Janes wrote:
On Thu, Jun 20, 2013 at 3:18 PM, Jason Long <mailing.lists@octgsoftware.com> wrote:
Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.

Something like

select compare_results('select * from v_old', 'select * from v_new');

I'd run:

select * from v_old 
except
select * from v_new ;

And then 

select * from v_new
except
select * from v_old ;

Both should return no rows.

This is my solution as well.

However, if the queries can contain duplicate rows this will not detect differences in the number of times a row is replicated, i.e. if one query has a row 2 times and the other has it 3 times.  If you need to detect such cases, I'd probably \copy out each query to a file, then use system tools to sort and diff the files.

No need, just use EXCEPT ALL.

Basically, the following query should return nothing:

(TABLE v_old EXCEPT ALL TABLE v_new)
UNION ALL
(TABLE v_new EXCEPT ALL TABLE v_old);

--
Vik

Vik, note that your new suggestion for a query might be more
readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF
DETECTING DUPLICATE ROWS.



--
View this message in context:
http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


On Wed, Aug 17, 2016 at 4:58 AM, gilad905 <gilad905@gmail.com> wrote:
> Vik, note that your new suggestion for a query might be more
> readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF
> DETECTING DUPLICATE ROWS.
>

I've done this in the past and in my use-case it was easy enough to
export the rows to CSV with a sort, and then use a diff tool to ensure
they're identical. My data had 100's of thousands of rows, but
obviously with massive tables this may not be appropriate.

If you want more details on this I can find the exact psql client
query form I used to get a CSV file of the results.

Hope that helps.

Branden
>
>
> --
> View this message in context:
http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


On 17/08/16 10:58, gilad905 wrote:
> Vik, note that your new suggestion for a query might be more
> readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF
> DETECTING DUPLICATE ROWS.

Yes, it does.

And please don't shout at me.
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Didn't mean to give the impression I'm 'shouting' :) just wanted to make sure
other users will notice my reply.
But you're right! it does. It even disregards the order of the rows. For
some reason in a previous test I looked to me like it didn't.



--
View this message in context:
http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5917059.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


select * from (select id, my_age, ur_age  as urage from datatable where 
agegp != 'S') m where m.myage <> m.urage



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html