Thread: Easiest way to compare the results of two queries row by row and column by column
Easiest way to compare the results of two queries row by row and column by column
From
Jason Long
Date:
Can someone suggest the easiest way to compare the results from two queries to make sure they are identical? I am rewriting a large number of views and I want to make sure that nothing is changes in the results. Something like select compare_results('select * from v_old', 'select * from v_new'); I would want this to check that the row count and each row matched column by column. I am hoping someone has already written something for this...
Re: Easiest way to compare the results of two queries row by row and column by column
From
David Johnston
Date:
Jason Long-2 wrote > Can someone suggest the easiest way to compare the results from two > queries to make sure they are identical? First thing that comes to mind: WITH before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , after_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , before_array AS (SELECT array_agg(before_qry) AS before_agg_array FROM before_qry) , after_array AS (SELECT array_agg(before_qry) AS after_agg_array FROM before_qry) SELECT *, before_agg_array = after_agg_array FROM before_array CROSS JOIN after_array Basically turn the resultsets into arrays (of composites) and then see if the arrays are the same. This has issues with respect to column names and comparable datatypes (i.e., if one column is bigint and the other is integer they still compare equally). One thought would to only allow a view name (and possibly, separately, the ORDER BY clause). Catalog lookups can be used to check for identical view output types. No idea of something like this exists and is readily available. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: Easiest way to compare the results of two queries row by row and column by column
From
Jeff Janes
Date:
On Thu, Jun 20, 2013 at 3:18 PM, Jason Long <mailing.lists@octgsoftware.com> wrote:
Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?
I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.
Something like
select compare_results('select * from v_old', 'select * from v_new');
I'd run:
select * from v_old
except
select * from v_new ;
And then
select * from v_new
except
select * from v_old ;
Both should return no rows.
However, if the queries can contain duplicate rows this will not detect differences in the number of times a row is replicated, i.e. if one query has a row 2 times and the other has it 3 times. If you need to detect such cases, I'd probably \copy out each query to a file, then use system tools to sort and diff the files.
Cheers,
Jeff
Re: Re: Easiest way to compare the results of two queries row by row and column by column
From
Jason Long
Date:
Thank you. I will give it a try. I have never used WITH before.
Thank you for the tips.
On Thu, 2013-06-20 at 16:05 -0700, David Johnston wrote:
Thank you for the tips.
On Thu, 2013-06-20 at 16:05 -0700, David Johnston wrote:
Jason Long-2 wrote > Can someone suggest the easiest way to compare the results from two > queries to make sure they are identical? First thing that comes to mind: WITH before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , after_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , before_array AS (SELECT array_agg(before_qry) AS before_agg_array FROM before_qry) , after_array AS (SELECT array_agg(before_qry) AS after_agg_array FROM before_qry) SELECT *, before_agg_array = after_agg_array FROM before_array CROSS JOIN after_array Basically turn the resultsets into arrays (of composites) and then see if the arrays are the same. This has issues with respect to column names and comparable datatypes (i.e., if one column is bigint and the other is integer they still compare equally). One thought would to only allow a view name (and possibly, separately, the ORDER BY clause). Catalog lookups can be used to check for identical view output types. No idea of something like this exists and is readily available. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: Easiest way to compare the results of two queries row by row and column by column
From
Vik Fearing
Date:
On 06/21/2013 01:07 AM, Jeff Janes wrote:
On Thu, Jun 20, 2013 at 3:18 PM, Jason Long <mailing.lists@octgsoftware.com> wrote:Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?
I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.
Something like
select compare_results('select * from v_old', 'select * from v_new');I'd run:select * from v_oldexceptselect * from v_new ;And thenselect * from v_newexceptselect * from v_old ;Both should return no rows.
This is my solution as well.
However, if the queries can contain duplicate rows this will not detect differences in the number of times a row is replicated, i.e. if one query has a row 2 times and the other has it 3 times. If you need to detect such cases, I'd probably \copy out each query to a file, then use system tools to sort and diff the files.
No need, just use EXCEPT ALL.
Basically, the following query should return nothing:
(TABLE v_old EXCEPT ALL TABLE v_new)
UNION ALL
(TABLE v_new EXCEPT ALL TABLE v_old);
--
Vik
Re: Easiest way to compare the results of two queries row by row and column by column
From
gilad905
Date:
Vik, note that your new suggestion for a query might be more readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF DETECTING DUPLICATE ROWS. -- View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: Re: Easiest way to compare the results of two queries row by row and column by column
From
Branden Visser
Date:
On Wed, Aug 17, 2016 at 4:58 AM, gilad905 <gilad905@gmail.com> wrote: > Vik, note that your new suggestion for a query might be more > readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF > DETECTING DUPLICATE ROWS. > I've done this in the past and in my use-case it was easy enough to export the rows to CSV with a sort, and then use a diff tool to ensure they're identical. My data had 100's of thousands of rows, but obviously with massive tables this may not be appropriate. If you want more details on this I can find the exact psql client query form I used to get a CSV file of the results. Hope that helps. Branden > > > -- > View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Re: Re: Easiest way to compare the results of two queries row by row and column by column
From
Vik Fearing
Date:
On 17/08/16 10:58, gilad905 wrote: > Vik, note that your new suggestion for a query might be more > readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF > DETECTING DUPLICATE ROWS. Yes, it does. And please don't shout at me. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: Easiest way to compare the results of two queries row by row and column by column
From
gilad905
Date:
Didn't mean to give the impression I'm 'shouting' :) just wanted to make sure other users will notice my reply. But you're right! it does. It even disregards the order of the rows. For some reason in a previous test I looked to me like it didn't. -- View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5917059.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: Easiest way to compare the results of two queries row by rowand column by column
From
anupping
Date:
select * from (select id, my_age, ur_age as urage from datatable where agegp != 'S') m where m.myage <> m.urage -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html