Thread: [PROPOSAL] : Use of ORDER BY clause in insert.sql

[PROPOSAL] : Use of ORDER BY clause in insert.sql

From
Nishant Sharma
Date:
Hi,


We would like to share a proposal of a patch, where we have added order by clause in two select statements in src/test/regress/sql/insert.sql file and respective changes in src/test/regress/expected/insert.out output file.

This would help in generating output in consistent sequence, as sometimes we have observed change in sequence in output.

Please find the patch attached <Proposal_OrderBy_insert.sql.out.patch>


Regards,
Nishant Sharma
EDB: http://www.enterprisedb.com

Attachment

Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

From
Tom Lane
Date:
Nishant Sharma <nishant.sharma@enterprisedb.com> writes:
> We would like to share a proposal of a patch, where we have added order by
> clause in two select statements in src/test/regress/sql/insert.sql file and
> respective changes in src/test/regress/expected/insert.out output file.

> This would help in generating output in consistent sequence, as sometimes
> we have observed change in sequence in output.

Please be specific about the circumstances in which the output is
unstable for you.  With zero information to go on, it seems about as
likely that this change is masking a bug as that it's a good idea.

            regards, tom lane



Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

From
Amul Sul
Date:
On Thu, Oct 27, 2022 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Nishant Sharma <nishant.sharma@enterprisedb.com> writes:
> > We would like to share a proposal of a patch, where we have added order by
> > clause in two select statements in src/test/regress/sql/insert.sql file and
> > respective changes in src/test/regress/expected/insert.out output file.
>
> > This would help in generating output in consistent sequence, as sometimes
> > we have observed change in sequence in output.
>
> Please be specific about the circumstances in which the output is
> unstable for you.  With zero information to go on, it seems about as
> likely that this change is masking a bug as that it's a good idea.
>

At the first glance, I thought the patch is pretty much obvious, and
we usually add an ORDER BY clause to ensure stable output.   If we
are too sure that the output usually comes in the same order then the
ORDER BY clause that exists in other tests seems useless. I am a bit
confused & what could be a possible bug?

I have tested on my Centos and the Mac OS, insert.sql test is giving
stable output, I didn't find failure in the subsequent runs too but I
am not sure if that is enough evidence to skip the ORDER BY clause.

Regards,
Amul



Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

From
David Rowley
Date:
On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:
>
> On Thu, Oct 27, 2022 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Please be specific about the circumstances in which the output is
> > unstable for you.  With zero information to go on, it seems about as
> > likely that this change is masking a bug as that it's a good idea.
> >
>
> At the first glance, I thought the patch is pretty much obvious, and
> we usually add an ORDER BY clause to ensure stable output.

Unfortunately, you'll need to do better than that. We're not in the
business of accepting patches with zero justification for why they're
required. If you're not willing to do the analysis on why the order
changes sometimes, why should we accept your patch?

If you can't find the problem then you should modify insert.sql to
EXPLAIN the problem query to see if the plan has changed between the
passing and failing run. The only thing that comes to mind about why
this test might produce rows in a different order would be if a
parallel Append was sorting the subpaths by cost (See
create_append_path's call to list_sort) and the costs were for some
reason coming out differently sometimes. It's hard to imagine why this
query would be parallelised though. If you show us the EXPLAIN from a
passing and failing run, it might help us see the problem.

> If we
> are too sure that the output usually comes in the same order then the
> ORDER BY clause that exists in other tests seems useless. I am a bit
> confused & what could be a possible bug?

You can't claim that if this test shouldn't get an ORDER BY that all
tests shouldn't have an ORDER BY. That's just crazy. What if the test
is doing something like testing sort?!

David



Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:
>> If we
>> are too sure that the output usually comes in the same order then the
>> ORDER BY clause that exists in other tests seems useless. I am a bit
>> confused & what could be a possible bug?

> You can't claim that if this test shouldn't get an ORDER BY that all
> tests shouldn't have an ORDER BY. That's just crazy. What if the test
> is doing something like testing sort?!

The general policy is that we'll add ORDER BY when a test is demonstrated
to have unstable output order for identifiable environmental reasons
(e.g. locale dependency) or timing reasons (e.g. background autovacuum
sometimes changing statistics).  But the key word there is "identifiable".
Without some evidence as to what's causing this, it remains possible
that it's a code bug not the fault of the test case.

regress.sgml explains the policy further:

  You might wonder why we don't order all the regression test queries explicitly
  to get rid of this issue once and for all.  The reason is that that would
  make the regression tests less useful, not more, since they'd tend
  to exercise query plan types that produce ordered results to the
  exclusion of those that don't.

            regards, tom lane



Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

From
Amul Sul
Date:
On Fri, Oct 28, 2022 at 10:28 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:
> >
> > On Thu, Oct 27, 2022 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > Please be specific about the circumstances in which the output is
> > > unstable for you.  With zero information to go on, it seems about as
> > > likely that this change is masking a bug as that it's a good idea.
> > >
> >
> > At the first glance, I thought the patch is pretty much obvious, and
> > we usually add an ORDER BY clause to ensure stable output.
>
> Unfortunately, you'll need to do better than that. We're not in the
> business of accepting patches with zero justification for why they're
> required. If you're not willing to do the analysis on why the order
> changes sometimes, why should we accept your patch?
>

Unfortunately the test is not failing at me. Otherwise, I would have
done that analysis. When I saw the patch for the first time, somehow,
I didn't think anything spurious due to my misconception that we
usually add the ORDER BY clause for the select queries just to be
sure.

> If you can't find the problem then you should modify insert.sql to
> EXPLAIN the problem query to see if the plan has changed between the
> passing and failing run. The only thing that comes to mind about why
> this test might produce rows in a different order would be if a
> parallel Append was sorting the subpaths by cost (See
> create_append_path's call to list_sort) and the costs were for some
> reason coming out differently sometimes. It's hard to imagine why this
> query would be parallelised though. If you show us the EXPLAIN from a
> passing and failing run, it might help us see the problem.
>

Understood.

> > If we
> > are too sure that the output usually comes in the same order then the
> > ORDER BY clause that exists in other tests seems useless. I am a bit
> > confused & what could be a possible bug?
>
> You can't claim that if this test shouldn't get an ORDER BY that all
> tests shouldn't have an ORDER BY. That's just crazy. What if the test
> is doing something like testing sort?!
>

That I can understand that the sorted output doesn't need further
sorting. I am just referring to the simple SELECT queries that do not
have any sorting.

Thanks & Regards,
Amul



Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

From
Amul Sul
Date:
On Fri, Oct 28, 2022 at 10:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:
> >> If we
> >> are too sure that the output usually comes in the same order then the
> >> ORDER BY clause that exists in other tests seems useless. I am a bit
> >> confused & what could be a possible bug?
>
> > You can't claim that if this test shouldn't get an ORDER BY that all
> > tests shouldn't have an ORDER BY. That's just crazy. What if the test
> > is doing something like testing sort?!
>
> The general policy is that we'll add ORDER BY when a test is demonstrated
> to have unstable output order for identifiable environmental reasons
> (e.g. locale dependency) or timing reasons (e.g. background autovacuum
> sometimes changing statistics).  But the key word there is "identifiable".
> Without some evidence as to what's causing this, it remains possible
> that it's a code bug not the fault of the test case.
>
> regress.sgml explains the policy further:
>
>   You might wonder why we don't order all the regression test queries explicitly
>   to get rid of this issue once and for all.  The reason is that that would
>   make the regression tests less useful, not more, since they'd tend
>   to exercise query plan types that produce ordered results to the
>   exclusion of those that don't.
>

Understood. Thanks for the clarification.

Regards,
Amul