Thread: Postgres 7.2.2 Segment Error
I just saw this in my logs: 2002-09-18 12:13:10 ERROR: cannot open segment 1 of relation users_sessions (target block 1342198864): No such file or directory This query caused it: DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440 minutes'::interval) AND name = 'fhnid'; However, I cannot repeat the error now. Is this a bug in postgres somewhere. Also, what should I do to fix the table properly. I haven't vacuumed it or anything yet in case someone wants to analyze it. Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > I just saw this in my logs: > 2002-09-18 12:13:10 ERROR: cannot open segment 1 of relation users_sessions > (target block 1342198864): No such file or directory > This query caused it: > DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440 > minutes'::interval) AND name = 'fhnid'; What does EXPLAIN show as the plan for that query? I'm guessing an indexscan, and that the error was caused by reading a broken item pointer from the index. (1342198864 = hex 50005450, which sure looks like the upper 5 shouldn't be there ... how big is the table, anyway?) > However, I cannot repeat the error now. Is this a bug in postgres > somewhere. If the broken item pointer were indeed in the index, I'd expect it to be 100% repeatable. I'm wondering about flaky memory or some such. Have you run any hardware diagnostics? regards, tom lane
> > DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440 > > minutes'::interval) AND name = 'fhnid'; > > What does EXPLAIN show as the plan for that query? I'm guessing an > indexscan, and that the error was caused by reading a broken item > pointer from the index. (1342198864 = hex 50005450, which sure looks > like the upper 5 shouldn't be there ... how big is the table, anyway?) NOTICE: QUERY PLAN: Index Scan using users_sessions_cha_name_idx on users_sessions (cost=0.00..12738.07 rows=1275 width=6) (actual time=231.74..239.39 rows=2 loops=1) Total runtime: 239.81 msec EXPLAIN The size of the table: canaveral# ls -al 44632 -rw------- 1 pgsql pgsql 357130240 Sep 19 18:52 44632 The size of the index: canaveral# ls -al 7331245 -rw------- 1 pgsql pgsql 8151040 Sep 19 18:51 7331245 Holy crap - that table is huge. It's like it's never had a vacuum full sort of thing. Going select count(*) takes _ages_ even though there's only 1451 rows in it - and not particularly large rows. Actually, the longest text entry is 3832 characters and the average is 677. The sessions table holds normal site session data, like a uid, username, some other stuff, etc. However entries older than two hours or so get deleted. We VACUUM everynight, so why is the on-disk relation growing so huge? > > However, I cannot repeat the error now. Is this a bug in postgres > > somewhere. > > If the broken item pointer were indeed in the index, I'd expect it to be > 100% repeatable. I'm wondering about flaky memory or some such. Have > you run any hardware diagnostics? No - the thought occured to me that there might be something wacky going on. We've had problems with users_sessions before. Remember when I mailed about vacuum failing on it before? You suggested doing a select for update on the relation and that fixed it. Chris
On Fri, 20 Sep 2002, Christopher Kings-Lynne wrote: > > > DELETE FROM users_sessions WHERE changed < ('now'::timestamp - '1440 > > > minutes'::interval) AND name = 'fhnid'; > > > > What does EXPLAIN show as the plan for that query? I'm guessing an > > indexscan, and that the error was caused by reading a broken item > > pointer from the index. (1342198864 = hex 50005450, which sure looks > > like the upper 5 shouldn't be there ... how big is the table, anyway?) > > NOTICE: QUERY PLAN: > > Index Scan using users_sessions_cha_name_idx on users_sessions > (cost=0.00..12738.07 rows=1275 width=6) (actual time=231.74..239.39 rows=2 > loops=1) > Total runtime: 239.81 msec > > EXPLAIN > > The size of the table: > > canaveral# ls -al 44632 > -rw------- 1 pgsql pgsql 357130240 Sep 19 18:52 44632 This seems remarkably large. Does pg_filedump reveal anything of interest? Gavin
> > Index Scan using users_sessions_cha_name_idx on users_sessions > > (cost=0.00..12738.07 rows=1275 width=6) (actual > time=231.74..239.39 rows=2 > > loops=1) > > Total runtime: 239.81 msec > > > > EXPLAIN > > > > The size of the table: > > > > canaveral# ls -al 44632 > > -rw------- 1 pgsql pgsql 357130240 Sep 19 18:52 44632 > > This seems remarkably large. Does pg_filedump reveal anything of interest? Where on earth do I find that? BTW - I want to vacuum full this table but I'm holding off until someone like Tom tells me there's nothing more to be gained from it... Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > The sessions table holds normal site session data, like a uid, username, > some other stuff, etc. However entries older than two hours or so get > deleted. We VACUUM everynight, so why is the on-disk relation growing so > huge? FSM not big enough, perhaps? Try doing a vacuum full, then looking to see how big the table is (in physical blocks) after one day's normal usage. You need at least enough FSM space for that many blocks ... unless you want to vacuum it more often. > However, I cannot repeat the error now. If you can't reproduce the error then I'm pretty well convinced that there is no problem in the stored data itself. This was either a hardware glitch or a software bug causing a memory stomp on the top byte of an item pointer retrieved from the index. Although I can't rule out the latter, I find it unlikely given that we don't have similar reports from other people. You may as well do the VACUUM FULL --- I doubt we can learn anything from examining the table. regards, tom lane