Re: i'm really desperate: invalid memory alloc request - Mailing list pgsql-general
From | Janning Vygen |
---|---|
Subject | Re: i'm really desperate: invalid memory alloc request |
Date | |
Msg-id | 200410041913.24845.vygen@gmx.de Whole thread Raw |
In response to | Re: i'm really desperate: invalid memory alloc request (Richard Huxton <dev@archonet.com>) |
List | pgsql-general |
Am Freitag, 1. Oktober 2004 10:56 schrieb Richard Huxton: > Janning Vygen wrote: > > tonight my database got corruppted. before it worked fine. > > in the morning some sql queries failed. it seems only one table was > > affected. i stopped all web access and tried to backup the current > > database: > > > > pg_dump: ERROR: invalid memory alloc request size 0 > > pg_dump: SQL command to dump the contents of table "fragentipps" failed: > > PQendcopy() failed. > > pg_dump: Error message from server: ERROR: invalid memory alloc request > > size 0 > > pg_dump: The command was: COPY public.fragentipps (tr_kurzname, mg_name, > > fr_id, aw_antworttext) TO stdout; > > Does it do this consistently at the same place? Yes. It is in one table if i select a certain row. How can stuff like this can happen? > > i tried to recover from backup which was made just before clustering > > but i got > > ERROR: index row requires 77768 bytes, maximum size is 8191 > > There are a few steps - you've already done the first > 1. Stop PG and take a full copy of the data/ directory > 2. Check your installation - make sure you don't have multiple > versions of pg_dump/libraries/etc installed > 3. Try dumping individual tables (pg_dump -t table1 ...) > 4. Reindex/repair files > 5. Check hardware to make sure it doesn't happen again. > > Once you've dumped as many individual tables as you can, you can even > try selecting data to a file avoiding certain rows if they are causing > the problem. Ok, i can recreate most of the data. My main question is now: - Why does things like this can happen? - how often do they happen? > There's more you can do after that, but let's see how that works out. > > PS - your next mail mentions sig11 which usually implies hardware > problems, so don't forget to test the machine thoroughly once this is over. first i ran the long smart selftest: ************* === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended off-line Completed without error 00% 4097 - ************* AND ************* # smartctl -Hc /dev/hda smartctl version 5.1-18 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED [...] ************* so SMART tells me that everything is fine. but in my messages ************* Oct 2 14:50:45 p15154389 smartd[11205]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 62 to 61 Oct 2 14:50:45 p15154389 smartd[11205]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 62 to 61 Oct 2 14:59:00 p15154389 /USR/SBIN/CRON[11428]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) Oct 2 15:19:55 p15154389 -- MARK -- Oct 2 15:20:46 p15154389 smartd[11205]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 61 to 63 Oct 2 15:20:46 p15154389 smartd[11205]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 61 to 63 Oct 2 15:31:22 p15154389 su: pam_unix2: session finished for user root, service su Oct 2 15:50:45 p15154389 smartd[11205]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 63 to 61 Oct 2 15:50:45 p15154389 smartd[11205]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 63 to 61 ************* don't know what it means. after that i run memtest via a serial console for hours and hours but no errors where found! Its a little bit strange. It would feel much nicer if harddisk oder memory were damaged. so what could be the reason for SIG11?? is it save to use this machine again after testing memory and hardware? kind regards janning
pgsql-general by date: