Thread: Parallel Scan Bug: invalid attnum: 0

Parallel Scan Bug: invalid attnum: 0

From

Steve Randall

Date:

09 November 2016, 17:59:22

running pg 9.6.1 on AWS

max_parallel_workers_per_gather = 4

This only occurs in some schemas, not all.  The affected schemas have
approx 1 million rows in the table.  Much larger schemas (65 million rows)
and smaller schemas(100K rows) do not have the problem.

table definition:
 CREATE TABLE ro_summaries  (
day             date NOT NULL,
entry_firm       varchar(15) NOT NULL,
route_firm       varchar(15) NOT NULL,
order_type       varchar(3) NOT NULL,
symbol           varchar(15) NOT NULL,
bucket_id       int4 NOT NULL,
entry_status     varchar(2) NOT NULL,
covered_order   bool NOT NULL,
orders_sent     int4 NOT NULL DEFAULT 0,
shares_sent     int4 NOT NULL DEFAULT 0,
orders_pos       int4 NOT NULL DEFAULT 0,
orders_zero     int4 NOT NULL DEFAULT 0,
orders_neg       int4 NOT NULL DEFAULT 0,
orders_canceled int4 NOT NULL DEFAULT 0,
orders_null     int4 NOT NULL DEFAULT 0,
shares_pos       int4 NOT NULL DEFAULT 0,
shares_zero     int4 NOT NULL DEFAULT 0,
shares_neg       int4 NOT NULL DEFAULT 0,
shares_canceled int4 NOT NULL DEFAULT 0,
shares_null     int4 NOT NULL DEFAULT 0,
pi_pos           numeric NOT NULL DEFAULT 0,
pi_neg           numeric NOT NULL DEFAULT 0,
efq             numeric NULL,
effective_spread numeric NULL,
quoted_spread   numeric NULL,
realized_spread numeric NULL,
speed           numeric NULL,
part1_shares     int4 NULL,
trades_total     int4 NULL,
orders_filled   int4 NULL DEFAULT 0,
efq_shares       int4 NULL,
exchange         varchar(10) NULL,
notional_value   numeric NULL
)
WITHOUT OIDS
TABLESPACE pg_default
GO
CREATE UNIQUE INDEX rosum_ux01
ON ro_summaries USING btree (day date_ops, order_type text_ops, bucket_id
int4_ops, route_firm text_ops, entry_firm text_ops, symbol text_ops,
entry_status text_ops, covered_order bool_ops)

view definition:
CREATE OR REPLACE VIEW ro_summaries_v
AS
 SELECT ro_summaries.day,
    'S3'::character varying AS source_code,
    ro_summaries.entry_firm,
    ro_summaries.route_firm,
    ro_summaries.order_type,
    ro_summaries.symbol,
    ro_summaries.bucket_id,
    ro_summaries.entry_status,
    ro_summaries.covered_order,
    ro_summaries.orders_sent,
    ro_summaries.shares_sent,
    (((ro_summaries.shares_pos + ro_summaries.shares_neg) +
ro_summaries.shares_zero) + ro_summaries.shares_null) AS shares_total,
    (ro_summaries.shares_pos + ro_summaries.shares_zero) AS
shares_atorbetter,
    ro_summaries.shares_pos,
    ro_summaries.shares_neg,
    ro_summaries.trades_total,
    ro_summaries.pi_pos,
    ro_summaries.pi_neg,
    ro_summaries.efq,
    ro_summaries.effective_spread,
    ro_summaries.quoted_spread,
    ro_summaries.realized_spread,
    ro_summaries.speed,
    ro_summaries.part1_shares,
    ro_summaries.orders_filled,
    ro_summaries.efq_shares,
    ro_summaries.exchange,
    ro_summaries.notional_value
   FROM msco_data.ro_summaries;

offending query:
select
count(*) OVER() AS full_count,
source_code,
sum(shares_atorbetter) shares_atorbetter,
'All ' day_display, 'All ' entry_firm, 'All ' route_firm, 'All ' symbol,
'All ' order_type,
0 bucket_id,
case when sum(shares_total) > 0 then
(sum(shares_atorbetter)::numeric/sum(shares_total)::numeric)*100 end
at_or_better,
case WHEN sum((shares_total)) > 0 THEN
(sum(shares_pos)::numeric/sum((shares_total))::numeric)::numeric*100 END
pi_percent, coalesce(sum(pi_pos),0) pi_amount,
case WHEN sum(shares_pos) > 0 THEN (sum(pi_pos)::numeric /
sum(shares_pos)::numeric)::numeric ELSE 0 END pi_per_unit,
case WHEN sum((shares_total)) > 0 THEN
(sum(shares_neg)::numeric/sum((shares_total))::numeric)::numeric*100 END
slippage_percent,
coalesce(sum(pi_neg),0) slippage_amount,
case WHEN sum((efq_shares)) > 0 THEN
sum((efq_shares)*effective_spread)::numeric/sum((efq_shares))::numeric END
effective_spread,
case WHEN sum((efq_shares)) > 0 THEN
sum((efq_shares)*quoted_spread)::numeric/sum((efq_shares))::numeric END
quoted_spread,
case WHEN sum((efq_shares)) > 0 THEN
sum((efq_shares)*realized_spread)::numeric/sum((efq_shares))::numeric END
realized_spread,
case when sum((efq_shares)*quoted_spread) > 0 THEN
(sum((efq_shares)*effective_spread)::numeric/sum((efq_shares)*quoted_spread)::numeric)
* 100 END EFQ,
sum(trades_total) trades, sum((shares_total)) shares,
sum(shares_pos) shares_pos, sum(shares_neg) shares_neg,
sum(orders_sent) orders, sum(shares_sent) ordered_shares,
sum(pi_pos + pi_neg) variance,
case when sum((shares_total)) > 0 THEN sum(pi_pos + pi_neg) /
sum((shares_total))::numeric END avg_variance,
case when sum(orders_sent) > 0 THEN sum(shares_sent) / sum(orders_sent) END
avg_shares_order,
case when sum(shares_sent) > 0 then
least((sum((shares_total))::numeric/sum(shares_sent)::numeric)::numeric*100,100)
END pct_shares_executed,
sum(notional_value) notional_value,
sum(orders_filled) orders_filled,
case when sum(part1_shares) > 0 then sum(part1_shares * speed)::numeric /
sum(part1_shares)::numeric end speed
from ro_summaries_v
where day = '2016-11-07'
group by source_code
ORDER BY day_display desc NULLS LAST
LIMIT 25 OFFSET 0;

Commenting out source_code (and the group by) allows the query to run.
Alternately, commenting out several of the case statements also allows it
to run.

In the meantime I've set max_parallel_workers_per_gather = 0 and the query
works fine.

Please let me know if you need any more information.

Re: Parallel Scan Bug: invalid attnum: 0

From

Tom Lane

Date:

09 November 2016, 18:50:53

Steve Randall <srandall@s3.com> writes:
> running pg 9.6.1 on AWS
> max_parallel_workers_per_gather = 4
> This only occurs in some schemas, not all.  The affected schemas have
> approx 1 million rows in the table.  Much larger schemas (65 million rows)
> and smaller schemas(100K rows) do not have the problem.

FWIW, I failed to duplicate this from the information provided.
Can you get us EXPLAIN output for the troublesome case, or does it
fail before printing the EXPLAIN output?  How about a stack trace
from the errfinish call?

            regards, tom lane

Re: Parallel Scan Bug: invalid attnum: 0

From

Steve Randall

Date:

10 November 2016, 15:05:12

I have been able to reproduce the error in my development environment. EXPLAIN output is attached. EXPLAIN ANALYZE fails with the error mentioned.

I have to do a little configuring to get the stack trace logs, but it will be a couple of weeks before they are ready. Tomorrow I head to San Francisco for a few days off prior to pgConf.

On Wed, Nov 9, 2016 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Steve Randall <srandall@s3.com> writes:
> running pg 9.6.1 on AWS
> max_parallel_workers_per_gather = 4
> This only occurs in some schemas, not all. The affected schemas have
> approx 1 million rows in the table. Much larger schemas (65 million rows)
> and smaller schemas(100K rows) do not have the problem.

FWIW, I failed to duplicate this from the information provided.
Can you get us EXPLAIN output for the troublesome case, or does it
fail before printing the EXPLAIN output? How about a stack trace
from the errfinish call?

regards, tom lane

Attachment

attnum_explain.txt

Re: Parallel Scan Bug: invalid attnum: 0

From

Tom Lane

Date:

10 November 2016, 15:49:39

Steve Randall <srandall@s3.com> writes:
> I have been able to reproduce the error in my development environment.
> EXPLAIN output is attached.  EXPLAIN ANALYZE fails with the error
> mentioned.

Ah, I've duplicated it.  My previous attempt to reverse-engineer your
test case was not selecting a partial-aggregation plan.

Immediate impression is that the logic for planning partial grouped
aggregation did not get the zero-sort-keys case right.  In my hands
it produces

TRAP: BadArgument("!(nkeys > 0)", File: "tuplesort.c", Line: 763)
2016-11-10 10:44:26.955 EST [14415] LOG:  server process (PID 15726) was t=
erminated by signal 6: Aborted

but in a non-assert build of course you'd get some other misbehavior.

            regards, tom lane

Re: Parallel Scan Bug: invalid attnum: 0

From

Tom Lane

Date:

10 November 2016, 16:35:48

I wrote:
> Immediate impression is that the logic for planning partial grouped
> aggregation did not get the zero-sort-keys case right.

Yeah: after flattening your view, the planner was left with GROUP BY
'S3'::character varying, which is a no-op, but it mistakenly inserted
a no-op Sort into the plan to sort by that.  One-liner fix is here
if you need it right away:

https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommitdiff;h=3D7=
defc3b97a31537547053946808a83e7234d1b61

            regards, tom lane

Re: Parallel Scan Bug: invalid attnum: 0

From

Steve Randall

Date:

10 November 2016, 17:19:22

Thank you.

On Thu, Nov 10, 2016 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I wrote:
> > Immediate impression is that the logic for planning partial grouped
> > aggregation did not get the zero-sort-keys case right.
>
> Yeah: after flattening your view, the planner was left with GROUP BY
> 'S3'::character varying, which is a no-op, but it mistakenly inserted
> a no-op Sort into the plan to sort by that.  One-liner fix is here
> if you need it right away:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=
> 7defc3b97a31537547053946808a83e7234d1b61
>
>
>                         regards, tom lane
>