Re: BUG #14948: cost overflow - Mailing list pgsql-bugs

From Jan Schulz
Subject Re: BUG #14948: cost overflow
Date
Msg-id CAAc324hPZuLJiwQK_zVxEp3PEcpxpgnee-hxD55jUVp=x2hhCA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #14948: cost overflow  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #14948: cost overflow  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hello!

On 5 December 2017 at 16:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not sure what to make of that, but it seems to be broken in more ways
> than just the wacko cost estimates. 

We run our server with basically no data security in mind as we do regenerate all tables from the source systems each time. So the OOM kill might have broken some tables/data? m_dim wasn't regenerated since then because the regeneration of that schema triggers the OOM.

relevant postgresql.conf entries:

wal_level = minimal
# doesn't start if wal_level=minimal and this is >0
max_wal_senders = 0
fsync = off
synchronous_commit = off
full_page_writes = off
wal_buffers = -1
autovacuum_max_workers = 1

> Can you get a similarly broken plan if you try something involving just
> this table, say "select * from converting_touchpoints_attribution where
> touchpoint_fk = 42 and performance_attribution_model_fk = 2" ?

EXPLAIN ( ANALYSE, BUFFERS )
select * from m_dim.converting_touchpoints_attribution where
touchpoint_fk = 42 and performance_attribution_model_fk = 2

Bitmap Heap Scan on converting_touchpoints_attribution  
     (cost=-25769803761.69..-25769803757.67 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=1)
  Recheck Cond: ((performance_attribution_model_fk = 2) AND (touchpoint_fk = 42))
  ->  BitmapAnd  (cost=-25769803761.69..-25769803761.69 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
        ->  Bitmap Index Scan on converting_touchpoints_attribution__performance_attribution_mod  
                (cost=0.00..-12884901880.97 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1)
              Index Cond: (performance_attribution_model_fk = 2)
        ->  Bitmap Index Scan on converting_touchpoints_attribution__touchpoint_fk  
                (cost=0.00..-12884901880.97 rows=1 width=0) (never executed)
              Index Cond: (touchpoint_fk = 42)
Planning time: 0.122 ms
Execution time: 0.036 ms

This is how it looks on a working system (generated with 2MB work_mem...)

Index Scan using converting_touchpoints_attribution__touchpoint_fk_performance_a on converting_touchpoints_attribution  
    (cost=0.56..2.58 rows=1 width=16) (actual time=2.314..2.314 rows=0 loops=1)
  Index Cond: ((touchpoint_fk = 42) AND (performance_attribution_model_fk = 2))
  Buffers: shared hit=6 read=4
Planning time: 5.371 ms
Execution time: 2.342 ms

> Could we see the pg_stats rows for those two columns?  What is in
> pg_class.reltuples and relpages for this table and these two indexes?

SELECT
  nspname || '.' ||relname,
  relpages,
  reltuples
FROM pg_class
  JOIN pg_catalog.pg_namespace n
    ON n.oid = pg_class.relnamespace
WHERE relname ILIKE 'converting_touchpoints_attributio%';

+-----------------------------------------------------------------------+----------+-----------+
| nspname+relname                                                       | relpages | reltuples |
+-----------------------------------------------------------------------+----------+-----------+
| m_dim.converting_touchpoints_attribution__touchpoint_fk_performance_a | 84153    | 24282896  |
| m_dim.converting_touchpoints_attribution___day_id_performance_attribu | 84153    | 24282896  |
| m_dim.converting_touchpoints_attribution__touchpoint_fk               | 6        | 1026      |
| m_dim.converting_touchpoints_attribution__performance_attribution_mod | 5        | 1026      |
| m_dim.converting_touchpoints_attribution                              | 131259   | 24282892  |
+-----------------------------------------------------------------------+----------+-----------+
5 rows in set.

On a good system it looks like this:

+-----------------------------------------------------------------------+----------+-----------+
| nspname+relname                                                       | relpages | reltuples |
+-----------------------------------------------------------------------+----------+-----------+
| m_dim.converting_touchpoints_attribution___day_id_performance_attribu | 84656    | 24428338  |
| m_dim.converting_touchpoints_attribution                              | 132046   | 24428338  |
| m_dim.converting_touchpoints_attribution__touchpoint_fk_performance_a | 84656    | 24428338  |
| m_dim.converting_touchpoints_attribution__touchpoint_fk               | 6        | 1032      |
| m_dim.converting_touchpoints_attribution__performance_attribution_mod | 5        | 1032      |
| m_tmp.converting_touchpoints_attribution                              | 132046   | 24428338  |
+-----------------------------------------------------------------------+----------+-----------+
6 rows in set.

-> Seems like no real difference (the data has a few days differences now + m_tmp.converting_touchpoints_attribution is an UNLOGGED table so won't survive the crash)

Best regards,

Jan
--
Jan Schulz
mail: jasc@gmx.net


>
>                         regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #14948: cost overflow
Next
From: Jan Schulz
Date:
Subject: Fwd: BUG #14948: cost overflow