Re: Parallel append plan instability/randomness - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel append plan instability/randomness
Date
Msg-id CAA4eK1Jh+8VXDFaxUF7A4v10sHHzDM0XV8pimJKPVP+2GaBKGg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel append plan instability/randomness  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jan 9, 2018 at 12:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jan 7, 2018 at 11:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> One theory that can explain above failure is that the costs of
>>> scanning some of the sub-paths is very close due to which sometimes
>>> the results can vary.  If that is the case, then probably using
>>> fuzz_factor in costs comparison (as is done in attached patch) can
>>> improve the situation, may be we have to consider some other factors
>>> like number of rows in each subpath.
>
>> This isn't an acceptable solution because sorting requires that the
>> comparison operator satisfy the transitive property; that is, if a = b
>> and b = c then a = c.  With your proposed comparator, you could have a
>> = b and b = c but a < c.  That will break stuff.
>
>> It seems like the obvious fix here is to use a query where the
>> contents of the partitions are such that the sorting always produces
>> the same result.  We could do that either by changing the query or by
>> changing the data in the partitions or, maybe, by inserting ANALYZE
>> someplace.
>
> The foo_star tables are made in create_table.sql, filled in
> create_misc.sql, and not modified thereafter.  The fact that we have
> accurate rowcounts for them in select_parallel.sql is because of the
> database-wide VACUUM that happens at the start of sanity_check.sql.
> Given the lack of any WHERE condition, the costs in this particular query
> depend only on the rowcount and physical table size, so inserting an
> ANALYZE shouldn't (and doesn't, for me) change anything.  I would be
> concerned about side-effects on other queries anyway if we were to ANALYZE
> tables that have never been ANALYZEd in the regression tests before.
>

Fair point.  This seems to indicate that wrong rowcounts is probably
not the reason for the failure.  However, I think it might still be
good to use a different set of tables (probably create new tables with
appropriate data for these queries) and analyze them explicitly before
these queries rather than relying on the execution order of some
not-directly related tests.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Condition variable live lock
Next
From: Michael Paquier
Date:
Subject: Re: BUG #14941: Vacuum crashes