RE: [BUGS] Loosing files after backend crash - Mailing list pgsql-hackers

From KS
Subject RE: [BUGS] Loosing files after backend crash
Date
Msg-id D078291436B0D411B08800805F6D131501E6A4@fantom.tcnet.ru
Whole thread Raw
List pgsql-hackers

> -----Original Message-----
> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]
> Sent: Wednesday, April 04, 2001 3:37 AM
> To: 'Tom Lane'
> Subject: RE: [BUGS] Loosing files after backend crash 
> 
> 
> 1. Indices could be recreated with REINDEX or pg_class could 
> be queried
> with seq scan (something like where relname like 
> '%seq_i___data_buffer%')...
> Konstantin?

Well, bad news. After a few more tries to crash the backend, the whole
postmaster crashed and didn't rise back.
It fails to start up reporting "Apr  4 18:53:05 wale postgres[71618]: [9]
FATAL 2:  XLogWrite: write request is past end of log" to syslog.
And the last line of errlog sounds like "/usr/local/pgsql/bin/postmaster:
Startup proc 72905 exited with status 512 - abort"
I wanted to ask, if I need to re-initdb or there are some other ways to fix
the problem?
If I need to re-init, can I preserve the database in it's current state, to
continue my investigation from the point I was interrupted?

I hope I will be able to answer your questions after I heal the postmaster.

> 3. Could you help us reproduce this bug, Konstantin?
> What exactly did you do after sequence creation?

Here's the script which creates the sequence and the temp table:

---------------------8<---------------------
begin transaction;
drop sequence _seq_i___data_buffer;
create sequence _seq_i___data_buffer;

CREATE TEMPORARY TABLE __data_buffer (  buff_id int4 UNIQUE NOT NULL default NEXTVAL( '_seq_i___data_buffer' ),
rule_idint4 NOT NULL,  _value decimal( 18, 0) NOT NULL,  _count decimal( 18, 0) NOT NULL,  value_time timestamp NOT
NULL);
 

insert into __data_buffer
(buff_id     , rule_id     , _value                 , _count
, value_time                  )
(
... [689 UNION'd selects]
);
commit;
---------------------8<---------------------
After that I run my function.

Shall I send you it's code? (It's 23 Kbytes big).

> Does your function reference temp table you've mentioned?
Yes. And it also creates three more temporary tables (Let's call them A, B
and C).
Actually, temp table 'A' is populated with the values from the external
temptable ('__data_buffer').
And the 'B' temp table is populated with a query from temptable A.
Then table C is populated with a huge query, which joins many tables and
table 'B' among them.
But tables 'A' and the external one are not referenced in this huge query.
Well, this very query crashes the postmaster.
Our team is playing with this query to locate the reasons for this failure.
We will report you the results of our investigation.

If you want to have a look at the query, here it is.
'__vars_info' is what I referenced as temptable C.
'__rule_data_with_tis' is what I referenced as temptable 'B'. 
All other tables are not temporary.
int1 is my self-written type. To prevent from blaming my type I can say,
that replacing it with, say, float gives the same results.
datediff() is my own function (MS SQL analog, works well alone).

---------------------8<--------------------- INSERT INTO __vars_info ( var_id, min_old_ti, max_old_ti, lifetime,
timeinterval )  ( SELECT variable.var_id,     CASE      WHEN year = 1 THEN MIN( datediff(''year'', basetime,
timebegin))     WHEN month = 1 THEN MIN( datediff(''month'', basetime,  timebegin))      WHEN week = 1 THEN MIN(
datediff(''week'',basetime,  timebegin))      WHEN day = 1 THEN MIN( datediff(''day'', basetime,  timebegin))      WHEN
hour= 1 THEN MIN( datediff(''hour'', basetime,  timebegin))      WHEN five_minute = 1 THEN MIN( datediff(''minute'',
basetime,
timebegin) / 5)     END as min_old_ti,     CASE      WHEN year = 1 THEN MAX( datediff(''year'', basetime,  timebegin))
   WHEN month = 1 THEN MAX( datediff(''month'', basetime,  timebegin))      WHEN week = 1 THEN MAX( datediff(''week'',
basetime, timebegin))      WHEN day = 1 THEN MAX( datediff(''day'', basetime,  timebegin))      WHEN hour = 1 THEN MAX(
datediff(''hour'',basetime,  timebegin))      WHEN five_minute = 1 THEN MAX( datediff(''minute'', basetime,
 
timebegin) / 5)     END as max_old_ti,     lifetime, timeinterval    FROM     variable      LEFT JOIN     ( SELECT
var_id,timebegin FROM var_value WHERE var_id IN      ( SELECT DISTINCT var_id FROM variable WHERE vset_id IN       (
SELECTDISTINCT vset_id FROM vset_to_rule WHERE rule_id IN        ( SELECT DISTINCT rule_id FROM __rule_data_with_tis )
)) )     AS sel_var_value      ON variable.var_id = sel_var_value.var_id,     ( SELECT ti_id, (year)::int1 AS year,
(month)::int1AS month,        (week)::int1 AS week, (day)::int1 AS day,        (hour)::int1 AS hour,
(five_minute)::int1AS five_minute       FROM timeinterval )     AS timeinterval    WHERE     variable.ti_id =
timeinterval.ti_idAND     variable.isactive = 1 AND variable.vset_id IN      ( SELECT DISTINCT vset_id FROM
vset_to_rule       WHERE vset_id IN ( SELECT DISTINCT vset_id FROM variable_set where
 
isactive = 1 ) AND              rule_id IN ( SELECT DISTINCT rule_id FROM
__rule_data_with_tis ) )    GROUP BY variable.var_id, lifetime, timeinterval, year::int4,
month::int4, week::int4, day::int4, hour::int4, five_minute::int4)    ;
---------------------8<---------------------

> What cause crash? Maybe crash is related somehow...
See above.

> Could you try to reproduce failure with wal_debug = 1 and
> post me postmaster' log?
I'll do it after I succeed in bringing the postmaster up.

Regards, 
Konstantin Solodovnikov.

P.S. Almost forgot:
Here's what psql tells about our sequence:

---------------------8<---------------------
Netflow_Test=# \d seq_i___data_buffer                                 Table "seq_i___data_buffer"Attribute  |
Type          |                       Modifier
 
------------+--------------------------+------------------------------------
-------------------buff_id    | integer                  | not null default
nextval('seq_i___data_buffer'::text)^^^^^^^buff_id    | integer                  |^^^^^^^rule_id    | integer
      | not null_value     | numeric(18,0)            | not null_count     | numeric(18,0)            | not
nullvalue_time| timestamp with time zone | not null
 
Index: pg_temp.96430.1      ^^^^^^^^^^^^^
---------------------8<---------------------

It shows almost the same structure as __data_buffer should have.
Except for the fact, that it has 'buff_id' doubled :)

K.S.


pgsql-hackers by date:

Previous
From: Jie Liang
Date:
Subject: Re: [ADMIN] pgmonitor completed
Next
From: missive@frontiernet.net (Lee Harr)
Date:
Subject: Re: Table Structure