Thread: invalid memory alloc request size
Hi all, I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB of RAM. When running pg_dump on a specific table, I get the following error: pg_dump: Dumping the contents of table "x_20131111" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613 pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; pg_dump: [parallel archiver] a worker process died unexpectedly If I run a COPY TO file from psql I get the same error. Is this an indication of corrupted data? What steps should I take? Thanks in advance, Gabriel
On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: > Hi all, > > I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB > of RAM. When running pg_dump on a specific table, I get the following > error: > > pg_dump: Dumping the contents of table "x_20131111" failed: > PQgetResult() failed. > pg_dump: Error message from server: ERROR: invalid memory alloc request > size 18446744073709551613 > pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; > pg_dump: [parallel archiver] a worker process died unexpectedly > > If I run a COPY TO file from psql I get the same error. > > Is this an indication of corrupted data? What steps should I take? What is the data that is being dumped, for example is there binary data in there? What is the table definition? Would it be possible to show the header and some sample data from the COPY? > > Thanks in advance, > Gabriel > > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/10/2014 11:16 AM, Adrian Klaver wrote: > On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >> Hi all, >> >> I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB >> of RAM. When running pg_dump on a specific table, I get the following >> error: >> >> pg_dump: Dumping the contents of table "x_20131111" failed: >> PQgetResult() failed. >> pg_dump: Error message from server: ERROR: invalid memory alloc request >> size 18446744073709551613 >> pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; >> pg_dump: [parallel archiver] a worker process died unexpectedly >> >> If I run a COPY TO file from psql I get the same error. >> >> Is this an indication of corrupted data? What steps should I take? > > > What is the data that is being dumped, for example is there binary > data in there? The data types are bigserial, integer, and character varying. > > What is the table definition? The data is confidential so I have obfuscated the names: CREATE TABLE x ( c01 bigserial NOT NULL, c02 integer NOT NULL, c03 integer NOT NULL, c04 integer NOT NULL, c05 integer, c06 integer, c07 integer, c08 integer, c09 integer, c10 integer, c11 integer, c12 integer, c13 character varying(8), c14 integer, c15 integer, c16 character varying(8), c17 integer, c18 character varying(8), c19 integer, c20 integer, c21 integer, c22 integer, c23 integer, c24 integer, c25 integer, c26 integer, c27 integer, c28 integer, c29 integer, c30 integer, c31 integer, c32 integer, c33 integer, CONSTRAINT "PK_x" PRIMARY KEY (c01) ) > > Would it be possible to show the header and some sample data from the > COPY? Here is the first row using SELECT * LIMIT 1: 776696816;12368;47728024;3959;0;256;765;645;309;336;36;102;"";;;"";;"";;-1;0;0;0;0;-1;-1;0;0;5;28;;0;37 > >> >> Thanks in advance, >> Gabriel >> >> > >
On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: > > On 12/10/2014 11:16 AM, Adrian Klaver wrote: >> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>> Hi all, >>> >>> I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB >>> of RAM. When running pg_dump on a specific table, I get the following >>> error: >>> >>> pg_dump: Dumping the contents of table "x_20131111" failed: >>> PQgetResult() failed. >>> pg_dump: Error message from server: ERROR: invalid memory alloc request >>> size 18446744073709551613 >>> pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; >>> pg_dump: [parallel archiver] a worker process died unexpectedly >>> >>> If I run a COPY TO file from psql I get the same error. >>> >>> Is this an indication of corrupted data? What steps should I take? >> >> >> What is the data that is being dumped, for example is there binary >> data in there? > > The data types are bigserial, integer, and character varying. > >> >> What is the table definition? > > The data is confidential so I have obfuscated the names: > > CREATE TABLE x > ( > c01 bigserial NOT NULL, > c02 integer NOT NULL, > c03 integer NOT NULL, > c04 integer NOT NULL, > c05 integer, > c06 integer, > c07 integer, > c08 integer, > c09 integer, > c10 integer, > c11 integer, > c12 integer, > c13 character varying(8), > c14 integer, > c15 integer, > c16 character varying(8), > c17 integer, > c18 character varying(8), > c19 integer, > c20 integer, > c21 integer, > c22 integer, > c23 integer, > c24 integer, > c25 integer, > c26 integer, > c27 integer, > c28 integer, > c29 integer, > c30 integer, > c31 integer, > c32 integer, > c33 integer, > CONSTRAINT "PK_x" PRIMARY KEY (c01) > ) > >> >> Would it be possible to show the header and some sample data from the >> COPY? > > Here is the first row using SELECT * LIMIT 1: > > 776696816;12368;47728024;3959;0;256;765;645;309;336;36;102;"";;;"";;"";;-1;0;0;0;0;-1;-1;0;0;5;28;;0;37 How did the database get to this machine? Was it created from scratch or did it come from somewhere else? Also how did the Postgres server get installed? > > >> >>> >>> Thanks in advance, >>> Gabriel >>> >>> >> >> > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/10/2014 11:49 AM, Adrian Klaver wrote: > On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: >> >> On 12/10/2014 11:16 AM, Adrian Klaver wrote: >>> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>>> Hi all, >>>> >>>> I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB >>>> of RAM. When running pg_dump on a specific table, I get the following >>>> error: >>>> >>>> pg_dump: Dumping the contents of table "x_20131111" failed: >>>> PQgetResult() failed. >>>> pg_dump: Error message from server: ERROR: invalid memory alloc >>>> request >>>> size 18446744073709551613 >>>> pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; >>>> pg_dump: [parallel archiver] a worker process died unexpectedly >>>> >>>> If I run a COPY TO file from psql I get the same error. >>>> >>>> Is this an indication of corrupted data? What steps should I take? >>> >>> >>> What is the data that is being dumped, for example is there binary >>> data in there? >> >> The data types are bigserial, integer, and character varying. >> >>> >>> What is the table definition? >> >> The data is confidential so I have obfuscated the names: >> >> CREATE TABLE x >> ( >> c01 bigserial NOT NULL, >> c02 integer NOT NULL, >> c03 integer NOT NULL, >> c04 integer NOT NULL, >> c05 integer, >> c06 integer, >> c07 integer, >> c08 integer, >> c09 integer, >> c10 integer, >> c11 integer, >> c12 integer, >> c13 character varying(8), >> c14 integer, >> c15 integer, >> c16 character varying(8), >> c17 integer, >> c18 character varying(8), >> c19 integer, >> c20 integer, >> c21 integer, >> c22 integer, >> c23 integer, >> c24 integer, >> c25 integer, >> c26 integer, >> c27 integer, >> c28 integer, >> c29 integer, >> c30 integer, >> c31 integer, >> c32 integer, >> c33 integer, >> CONSTRAINT "PK_x" PRIMARY KEY (c01) >> ) >> >>> >>> Would it be possible to show the header and some sample data from the >>> COPY? >> >> Here is the first row using SELECT * LIMIT 1: >> >> 776696816;12368;47728024;3959;0;256;765;645;309;336;36;102;"";;;"";;"";;-1;0;0;0;0;-1;-1;0;0;5;28;;0;37 >> > > How did the database get to this machine? > > Was it created from scratch or did it come from somewhere else? The database was created from a pg_dump backup using pg_restore. The table has not changed since the backup date, so I could try re-creating it the same way, but that doesn't solve the mystery. > > Also how did the Postgres server get installed? apt-get install > > >> >> >>> >>>> >>>> Thanks in advance, >>>> Gabriel >>>> >>>> >>> >>> >> > >
On 12/10/2014 09:25 AM, Gabriel Sánchez Martínez wrote: > > On 12/10/2014 11:49 AM, Adrian Klaver wrote: >> On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: >>> >>> On 12/10/2014 11:16 AM, Adrian Klaver wrote: >>>> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>>>> Hi all, >>>>> >>>>> I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB >>>>> of RAM. When running pg_dump on a specific table, I get the following >>>>> error: >>>>> >>>>> pg_dump: Dumping the contents of table "x_20131111" failed: >>>>> PQgetResult() failed. >>>>> pg_dump: Error message from server: ERROR: invalid memory alloc >>>>> request >>>>> size 18446744073709551613 >>>>> pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; >>>>> pg_dump: [parallel archiver] a worker process died unexpectedly >>>>> >>>>> If I run a COPY TO file from psql I get the same error. >>>>> >>>>> Is this an indication of corrupted data? What steps should I take? >>>> >>>> >>>> What is the data that is being dumped, for example is there binary >>>> data in there? >>> >>> The data types are bigserial, integer, and character varying. >>> >>>> >> >> How did the database get to this machine? >> >> Was it created from scratch or did it come from somewhere else? > > The database was created from a pg_dump backup using pg_restore. The > table has not changed since the backup date, so I could try re-creating > it the same way, but that doesn't solve the mystery. So where did the data get dumped from? An instance of the same version of Postgres or something different? Which version(s) of pg_dump/pg_restore where used? > >> >> Also how did the Postgres server get installed? > > apt-get install Install from what repo? What was the exact package installed? Was there another instance of Postgres already(or currently) installed on this machine? > >> >> >>> >>> >>>> >>>>> >>>>> Thanks in advance, >>>>> Gabriel >>>>> -- Adrian Klaver adrian.klaver@aklaver.com
On 12/10/2014 12:47 PM, Adrian Klaver wrote: > On 12/10/2014 09:25 AM, Gabriel Sánchez Martínez wrote: >> >> On 12/10/2014 11:49 AM, Adrian Klaver wrote: >>> On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: >>>> >>>> On 12/10/2014 11:16 AM, Adrian Klaver wrote: >>>>> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>>>>> Hi all, >>>>>> >>>>>> I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with >>>>>> 64 GB >>>>>> of RAM. When running pg_dump on a specific table, I get the >>>>>> following >>>>>> error: >>>>>> >>>>>> pg_dump: Dumping the contents of table "x_20131111" failed: >>>>>> PQgetResult() failed. >>>>>> pg_dump: Error message from server: ERROR: invalid memory alloc >>>>>> request >>>>>> size 18446744073709551613 >>>>>> pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; >>>>>> pg_dump: [parallel archiver] a worker process died unexpectedly >>>>>> >>>>>> If I run a COPY TO file from psql I get the same error. >>>>>> >>>>>> Is this an indication of corrupted data? What steps should I take? >>>>> >>>>> >>>>> What is the data that is being dumped, for example is there binary >>>>> data in there? >>>> >>>> The data types are bigserial, integer, and character varying. >>>> >>>>> > >>> >>> How did the database get to this machine? >>> >>> Was it created from scratch or did it come from somewhere else? >> >> The database was created from a pg_dump backup using pg_restore. The >> table has not changed since the backup date, so I could try re-creating >> it the same way, but that doesn't solve the mystery. > > So where did the data get dumped from? > > An instance of the same version of Postgres or something different? > > Which version(s) of pg_dump/pg_restore where used? > >> >>> >>> Also how did the Postgres server get installed? >> >> apt-get install > > Install from what repo? The default Ubuntu repository. > > What was the exact package installed? I don't recall, but it was probably "postgresql". The database was upgraded to 9.3 when Ubuntu was upgraded to 14.04. But I ran it for a while without issues. This issue is recent. > > Was there another instance of Postgres already(or currently) installed > on this machine? No. There has always been only one instance. > >> >>> >>> >>>> >>>> >>>>> >>>>>> >>>>>> Thanks in advance, >>>>>> Gabriel >>>>>> > >
On 12/10/2014 09:54 AM, Gabriel Sánchez Martínez wrote: > > On 12/10/2014 12:47 PM, Adrian Klaver wrote: >> On 12/10/2014 09:25 AM, Gabriel Sánchez Martínez wrote: >>> >>> On 12/10/2014 11:49 AM, Adrian Klaver wrote: >>>> On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: >>>>> >>>>> On 12/10/2014 11:16 AM, Adrian Klaver wrote: >>>>>> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>>>>>> Hi all, >>>>>>> >>>> >>>> Also how did the Postgres server get installed? >>> >>> apt-get install >> >> Install from what repo? > > The default Ubuntu repository. > >> >> What was the exact package installed? > > I don't recall, but it was probably "postgresql". Try: dpkg -l|grep postgresql to get the actual package. The database was > upgraded to 9.3 when Ubuntu was upgraded to 14.04. But I ran it for a > while without issues. This issue is recent. So what was the version before? Where I am going with this, is trying to determine whether you have a 'contaminated' data directory. > >> >> Was there another instance of Postgres already(or currently) installed >> on this machine? > > No. There has always been only one instance. > >> >>> >>>> >>>> >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Gabriel >>>>>>> >> >> > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/10/2014 01:00 PM, Adrian Klaver wrote: > On 12/10/2014 09:54 AM, Gabriel Sánchez Martínez wrote: >> >> On 12/10/2014 12:47 PM, Adrian Klaver wrote: >>> On 12/10/2014 09:25 AM, Gabriel Sánchez Martínez wrote: >>>> >>>> On 12/10/2014 11:49 AM, Adrian Klaver wrote: >>>>> On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: >>>>>> >>>>>> On 12/10/2014 11:16 AM, Adrian Klaver wrote: >>>>>>> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>>>>>>> Hi all, >>>>>>>> > >>>>> >>>>> Also how did the Postgres server get installed? >>>> >>>> apt-get install >>> >>> Install from what repo? >> >> The default Ubuntu repository. >> >>> >>> What was the exact package installed? >> >> I don't recall, but it was probably "postgresql". > > Try: > > dpkg -l|grep postgresql > > to get the actual package. Output is : ii postgresql 9.3+154 all object-relational SQL database (supported version) ii postgresql-9.3 9.3.5-0ubuntu0.14.04.1 amd64 object-relational SQL database, version 9.3 server ii postgresql-9.3-pgrouting 2.0.0-release-ppa1~trusty1 amd64 Routing functionality support for PostgreSQL/PostGIS ii postgresql-9.3-postgis-2.1 2.1.2+dfsg-2 amd64 Geographic objects support for PostgreSQL 9.3 ii postgresql-9.3-postgis-2.1-scripts 2.1.2+dfsg-2 all PostGIS for PostgreSQL 9.3 -- scripts -- dummy package ii postgresql-9.3-postgis-scripts 2.1.2+dfsg-2 all Geographic objects support for PostgreSQL 9.3 -- scripts ii postgresql-client-9.3 9.3.5-0ubuntu0.14.04.1 amd64 front-end programs for PostgreSQL 9.3 ii postgresql-client-common 154 all manager for multiple PostgreSQL client versions ii postgresql-common 154 all PostgreSQL database-cluster manager ii postgresql-contrib 9.3+154 all additional facilities for PostgreSQL (supported version) ii postgresql-contrib-9.3 9.3.5-0ubuntu0.14.04.1 amd64 additional facilities for PostgreSQL ii postgresql-server-dev-9.3 9.3.5-0ubuntu0.14.04.1 amd64 development files for PostgreSQL 9.3 server-side programming > > The database was >> upgraded to 9.3 when Ubuntu was upgraded to 14.04. But I ran it for a >> while without issues. This issue is recent. > > So what was the version before? 9.1. It has always been whatever was on the Ubuntu repos. > > Where I am going with this, is trying to determine whether you have a > 'contaminated' data directory. I appreciate it! Perhaps it is worth mentioning that the server had some issues with the linux md raid a while back. Two disks failed in quick succession and some data was corrupted. Those disks have been replaced and the RAID has been fine ever since. The database cluster was recreated from scratch after that, from backups of a few months before the disks started misbehaving. This is when that table was created with pg_restore. There were no issues with the restore, and the backup was from a few months before the disk issue, so I don't suspect corruption in the backup. >> >>> >>> Was there another instance of Postgres already(or currently) installed >>> on this machine? >> >> No. There has always been only one instance. > > >> >>> >>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> Gabriel >>>>>>>> >>> >>> >> > >
On 10.12.2014 17:07, Gabriel Sánchez Martínez wrote: > Hi all, > > I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB > of RAM. When running pg_dump on a specific table, I get the following > error: > > pg_dump: Dumping the contents of table "x_20131111" failed: > PQgetResult() failed. > pg_dump: Error message from server: ERROR: invalid memory alloc request > size 18446744073709551613 > pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; > pg_dump: [parallel archiver] a worker process died unexpectedly > > If I run a COPY TO file from psql I get the same error. > > Is this an indication of corrupted data? What steps should I take? In my experience, issues like this are caused by a corrupted varlena header (i.e. corruption in text/varchar/... columns). How exactly that corruption happened is difficult to say - it might be a faulty hardware (RAM, controller, storage), it might be a bug (e.g. piece of memory gets overwritten by random data). Or it might be a consequence of incorrect hardware configuration (e.g. leaving the on-disk write cache enabled). If you have a backup of the data, use that instead of recovering the data from the current database - it's faster and safer. However, it might be worth spending some time analyzing the corruption to identify the cause, so that you can prevent it next time. The are tools that might help you with that - "pageinspect" extension is a way to look at the data files on a low-level. It may be quite tedious, though, and it may not work with badly broken data. Another option is "pg_check" - an extension I wrote a few years back. It analyzes the data file and prints info on all corruption occurences. It's available at https://github.com/tvondra/pg_check and I just pushed some minor fixes to make it 9.3-compatible. regards Tomas
On 12/10/2014 01:48 PM, Tomas Vondra wrote: > On 10.12.2014 17:07, Gabriel Sánchez Martínez wrote: >> Hi all, >> >> I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB >> of RAM. When running pg_dump on a specific table, I get the following >> error: >> >> pg_dump: Dumping the contents of table "x_20131111" failed: >> PQgetResult() failed. >> pg_dump: Error message from server: ERROR: invalid memory alloc request >> size 18446744073709551613 >> pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; >> pg_dump: [parallel archiver] a worker process died unexpectedly >> >> If I run a COPY TO file from psql I get the same error. >> >> Is this an indication of corrupted data? What steps should I take? > In my experience, issues like this are caused by a corrupted varlena > header (i.e. corruption in text/varchar/... columns). > > How exactly that corruption happened is difficult to say - it might be a > faulty hardware (RAM, controller, storage), it might be a bug (e.g. > piece of memory gets overwritten by random data). Or it might be a > consequence of incorrect hardware configuration (e.g. leaving the > on-disk write cache enabled). > > If you have a backup of the data, use that instead of recovering the > data from the current database - it's faster and safer. > > However, it might be worth spending some time analyzing the corruption > to identify the cause, so that you can prevent it next time. > > The are tools that might help you with that - "pageinspect" extension is > a way to look at the data files on a low-level. It may be quite tedious, > though, and it may not work with badly broken data. > > Another option is "pg_check" - an extension I wrote a few years back. It > analyzes the data file and prints info on all corruption occurences. > It's available at https://github.com/tvondra/pg_check and I just pushed > some minor fixes to make it 9.3-compatible. Thanks. I'll restore from the earliest available backup, and I'll take a look at pg_check. > > regards > Tomas
On 12/10/2014 10:08 AM, Gabriel Sánchez Martínez wrote: > > On 12/10/2014 01:00 PM, Adrian Klaver wrote: >> On 12/10/2014 09:54 AM, Gabriel Sánchez Martínez wrote: >>> >>> On 12/10/2014 12:47 PM, Adrian Klaver wrote: >>>> On 12/10/2014 09:25 AM, Gabriel Sánchez Martínez wrote: >>>>> >>>>> On 12/10/2014 11:49 AM, Adrian Klaver wrote: >>>>>> On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: >>>>>>> >>>>>>> On 12/10/2014 11:16 AM, Adrian Klaver wrote: >>>>>>>> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>>>>>>>> Hi all, >>>>>>>>> >> >>>>>> >>>>>> Also how did the Postgres server get installed? >>>>> >>>>> apt-get install >>>> >>>> Install from what repo? >>> >>> The default Ubuntu repository. >>> >>>> > > > >> >> The database was >>> upgraded to 9.3 when Ubuntu was upgraded to 14.04. But I ran it for a >>> while without issues. This issue is recent. >> >> So what was the version before? > > 9.1. It has always been whatever was on the Ubuntu repos. > >> >> Where I am going with this, is trying to determine whether you have a >> 'contaminated' data directory. > > I appreciate it! Perhaps it is worth mentioning that the server had > some issues with the linux md raid a while back. Two disks failed in > quick succession and some data was corrupted. Those disks have been > replaced and the RAID has been fine ever since. The database cluster > was recreated from scratch after that, from backups of a few months > before the disks started misbehaving. This is when that table was > created with pg_restore. There were no issues with the restore, and the > backup was from a few months before the disk issue, so I don't suspect > corruption in the backup. I would investigate the possibility that the raid is having problems again. > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/10/2014 02:34 PM, Adrian Klaver wrote: > On 12/10/2014 10:08 AM, Gabriel Sánchez Martínez wrote: >> >> On 12/10/2014 01:00 PM, Adrian Klaver wrote: >>> On 12/10/2014 09:54 AM, Gabriel Sánchez Martínez wrote: >>>> >>>> On 12/10/2014 12:47 PM, Adrian Klaver wrote: >>>>> On 12/10/2014 09:25 AM, Gabriel Sánchez Martínez wrote: >>>>>> >>>>>> On 12/10/2014 11:49 AM, Adrian Klaver wrote: >>>>>>> On 12/10/2014 08:31 AM, Gabriel Sánchez Martínez wrote: >>>>>>>> >>>>>>>> On 12/10/2014 11:16 AM, Adrian Klaver wrote: >>>>>>>>> On 12/10/2014 08:07 AM, Gabriel Sánchez Martínez wrote: >>>>>>>>>> Hi all, >>>>>>>>>> >>> >>>>>>> >>>>>>> Also how did the Postgres server get installed? >>>>>> >>>>>> apt-get install >>>>> >>>>> Install from what repo? >>>> >>>> The default Ubuntu repository. >>>> >>>>> > >> >> >> >>> >>> The database was >>>> upgraded to 9.3 when Ubuntu was upgraded to 14.04. But I ran it for a >>>> while without issues. This issue is recent. >>> >>> So what was the version before? >> >> 9.1. It has always been whatever was on the Ubuntu repos. >> >>> >>> Where I am going with this, is trying to determine whether you have a >>> 'contaminated' data directory. >> >> I appreciate it! Perhaps it is worth mentioning that the server had >> some issues with the linux md raid a while back. Two disks failed in >> quick succession and some data was corrupted. Those disks have been >> replaced and the RAID has been fine ever since. The database cluster >> was recreated from scratch after that, from backups of a few months >> before the disks started misbehaving. This is when that table was >> created with pg_restore. There were no issues with the restore, and the >> backup was from a few months before the disk issue, so I don't suspect >> corruption in the backup. > > I would investigate the possibility that the raid is having problems > again. Everything looks good right now. I ran SMART checks on all drives and there were no errors. I will restore the table from an earlier backup and let you know if that takes care of the issue. Thanks for your help, and please let me know if there is anything else I could do. I'll reply to the list if there are any developments. > >> > > >
On 12/10/2014 01:48 PM, Tomas Vondra wrote: > On 10.12.2014 17:07, Gabriel Sánchez Martínez wrote: >> Hi all, >> >> I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB >> of RAM. When running pg_dump on a specific table, I get the following >> error: >> >> pg_dump: Dumping the contents of table "x_20131111" failed: >> PQgetResult() failed. >> pg_dump: Error message from server: ERROR: invalid memory alloc request >> size 18446744073709551613 >> pg_dump: The command was: COPY public.x_20131111 (...) TO stdout; >> pg_dump: [parallel archiver] a worker process died unexpectedly >> >> If I run a COPY TO file from psql I get the same error. >> >> Is this an indication of corrupted data? What steps should I take? > In my experience, issues like this are caused by a corrupted varlena > header (i.e. corruption in text/varchar/... columns). > > How exactly that corruption happened is difficult to say - it might be a > faulty hardware (RAM, controller, storage), it might be a bug (e.g. > piece of memory gets overwritten by random data). Or it might be a > consequence of incorrect hardware configuration (e.g. leaving the > on-disk write cache enabled). > > If you have a backup of the data, use that instead of recovering the > data from the current database - it's faster and safer. > > However, it might be worth spending some time analyzing the corruption > to identify the cause, so that you can prevent it next time. > > The are tools that might help you with that - "pageinspect" extension is > a way to look at the data files on a low-level. It may be quite tedious, > though, and it may not work with badly broken data. > > Another option is "pg_check" - an extension I wrote a few years back. It > analyzes the data file and prints info on all corruption occurences. > It's available at https://github.com/tvondra/pg_check and I just pushed > some minor fixes to make it 9.3-compatible. Thanks for providing and updating the extension. I used pg_check and got the following messages: WARNING: [104112:52] tuple has too many attributes. 150 found, 33 expected WARNING: [104112] is probably corrupted, there were 1 errors reported The table has 33 columns. Running with DEBUG3 message levels, I get the following: DEBUG: [104112:52] tuple is LP_NORMAL DEBUG: [104112:52] checking attributes for the tuple WARNING: [104112:52] tuple has too many attributes. 150 found, 33 expected I assume this was disk data corruption. Is there anything I should do to investigate further? At this point the table has been restored from a backup, so I could drop the corrupted version of the table, which would allow me to do pg_dumps of the whole database without memory errors. > > regards > Tomas