BUG #15532: Access a file segment that does not exist - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #15532: Access a file segment that does not exist
Date
Msg-id 15532-bdaabfecf52cdbf1@postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      15532
Logged by:          Chen huajun
Email address:      chjischj@163.com
PostgreSQL version: 10.2
Operating system:   CentOS 7.3
Description:

In one of our systems, the following problem occurred, but failed to
replicate in the test environment.

# environment
- PostgreSQL 10.2
- CentOS 7.3 x64

# Phenomenon 1

The following error occurred when accessing a table using SELECT

    ERROR:  could not open file "base/16400/2079769372.1" (target block
519307264): previous segment is only 100153 blocks

Check the actual data file and find that there is no
"base/16400/2079769372.1" file.
(only "base/16400/2079769372" exists and is 700 MB)

This file is the gin index gin_tdm_ous_les_stock_mon_rt of the
tdm_ous_les_stock_mon_rt table, but the index is only 700 MB.

Note: The table will be deleted and inserted into a large amount of data
frequently(and may trigger autovacuum).

# Phenomenon 2

After a while(about 10+ day), when SELECT accesses the same table, the
following similar error occurs.

ERROR:  could not open file "base/16400/2347719250.2" (target block 306107):
previous segment is only 81847 blocks
CONTEXT:  parallel worker

The actual size of the "base/16400/2347719250.2" file is 0, which is the
heap table data file of tdm_ous_les_stock_mon_rt, and the actual size of
tdm_ous_les_stock_mon_rt is 1.7 GB.
This error occurs when the following SQL is executed using bitmap index
scan.

    explain select count(*) from bi_dm.tdm_ous_les_stock_mon_rt where zdtday <
'20181027';
    Finalize Aggregate (cost=202269.05..202269.08 rows=1 width=8)
    -> Gather (cost=202268.82..202269.05 rows=2 width=8)
    Workers Planned: 2
    -> Partial Aggregate (cost=201268.82..201268.85 rows=1 width=8)
    -> Parallel Bitmap Heap Scan on tdm_ous_les_stock_mon_rt
(cost=3043.71..200919.59 rows=139689 width=0)
    Recheck Cond: (zdtday < 20181027)
    -> Bitmap Index Scan on gin_tdm_ous_les_stock_mon_rt (cost=0.00..2959.90
rows=335253 width=0)
    Index Cond: (zdtday < 20181027)
    
    explain analyze select count(*) from bi_dm.tdm_ous_les_stock_mon_rt where
zdtday < '20181027'
    ERROR:  could not open file "base/16400/2347719250.2" (target block
306107): previous segment is only 81847 blocks
    CONTEXT:  parallel worker

Modify the where condition so that it does not go wrong when performing a
sequential scan.
    
    explain analyze select count(*) from bi_dm.tdm_ous_les_stock_mon_rt where
zdtday > 20181027
    Finalize Aggregate (cost=267492.74..267492.77 rows=1 width=8) (actual
time=522.957..522.957 rows=1 loops=1)
    -> Gather (cost=267492.51..267492.74 rows=2 width=8) (actual
time=522.938..522.950 rows=3 loops=1)
    Workers Planned: 2
    Workers Launched: 2
    -> Partial Aggregate (cost=266492.51..266492.54 rows=1 width=8) (actual
time=520.111..520.111 rows=1 loops=3)
    -> Parallel Seq Scan on tdm_ous_les_stock_mon_rt (cost=0.00..263083.51
rows=1363600 width=0) (actual time=0.027..470.640 rows=618628 loops=3)
    Filter: (zdtday > 20181027)
    Rows Removed by Filter: 82143
    Planning time: 2.523 ms
    Execution time: 562.728 ms

# appendix
## Table definition
------------------------------------------------------------
CREATE TABLE tdm_ous_les_stock_mon_rt (
    zdtday integer,
    s_werks character varying(10),
    scity_flg integer,
    dept_cd character varying(10),
    zsize character varying(4),
    vb_num integer,
    lfimg_num integer,
    online_flag integer,
    num_not_1020 integer,
    lfimg_not_1020 integer,
    num_not_task integer,
    lfimg_not_task integer,
    num_not_1030 integer,
    lfimg_not_1030 integer,
    num_not_1052_s integer,
    lfimg_not_1052_s integer,
    num_not_1060 integer,
    lfimg_not_1060 integer,
    num_not_1055 integer,
    lfimg_not_1055 integer,
    num_fnsh integer,
    lfimg_fnsh integer,
    etl_time timestamp without time zone,
    deopt_cd character varying(4),
    intime_num integer,
    num_fnsh_1 integer,
    tmall_flg integer,
    scitynew_flg integer,
    intime_lfimg bigint,
    lfimg_fnsh_1 bigint,
    endpointdesc character varying(100),
    end_site character varying(100),
    actual_vb_num integer,
    actual_lfimg_num integer
);
CREATE INDEX gin_tdm_ous_les_stock_mon_rt ON tdm_ous_les_stock_mon_rt USING
gin (zdtday, s_werks, dept_cd, zsize, online_flag, deopt_cd, tmall_flg,
scitynew_flg, scity_flg, end_site, endpointdesc);
CREATE UNIQUE INDEX idx_tdm_ous_les_stock_mon_rt ON tdm_ous_les_stock_mon_rt
USING btree (zdtday, s_werks, dept_cd, zsize, online_flag, deopt_cd,
tmall_flg, scitynew_flg, scity_flg, end_site, endpointdesc);
------------------------------------------------------------

## Related logs of Phenomenon 2 
------------------------------------------------------------
    2018-11-24 03:50:08.736 CST 40614     LOG:  automatic vacuum of table
"lma.bi_dm.tdm_ous_les_stock_mon_rt": index scans: 1
            pages: 0 removed, 212919 remain, 0 skipped due to pins, 148918
skipped frozen
            tuples: 217325 removed, 4135969 remain, 0 are dead but not yet
removable, oldest xmin: 2745011834
            buffer usage: 1101870 hits, 1465216 misses, 96217 dirtied
            avg read rate: 51.116 MB/s, avg write rate: 3.357 MB/s
            system usage: CPU: user: 25.34 s, system: 6.88 s, elapsed: 223.94
s
            
    2018-11-24 04:10:30.875 CST 18188     LOG:  automatic vacuum of table
"lma.bi_dm.tdm_ous_les_stock_mon_rt": index scans: 1
            pages: 0 removed, 212919 remain, 0 skipped due to pins, 146429
skipped frozen
            tuples: 22 removed, 4374202 remain, 264467 are dead but not yet
removable, oldest xmin: 2746163661
            buffer usage: 1091742 hits, 1463463 misses, 446 dirtied
            avg read rate: 45.279 MB/s, avg write rate: 0.014 MB/s
            system usage: CPU: user: 11.02 s, system: 5.32 s, elapsed: 252.50
s
            
    2018-11-24 04:19:58.698 CST 13085 10.96.49.233(61176) lma lma [unknown]
ERROR:  could not open file "base/16400/2347719250.2" (target block 306107):
previous segment is only 81847 blocks
    2018-11-24 04:19:58.698 CST 13085 10.96.49.233(61176) lma lma [unknown]
STATEMENT:  delete from bi_dm.tdm_ous_les_stock_mon_rt where  zdtday
<to_char(now()::date-31,'yyyymmdd')::INTEGER
------------------------------------------------------------


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #15531: python-sqlobject shipped with EPEL 7 not compatible withPostgreSQL 9.6.11
Next
From: Tom Lane
Date:
Subject: Re: BUG #15528: on v11.0 version still get error "ERROR: catalog is missing 1 attribute(s) for relid 6855092"