Thread: ERROR: "invalid memory alloc request size" or "unexpected end of data" on large table

ERROR: "invalid memory alloc request size" or "unexpected end of data" on large table

From
Stephane Bailliez
Date:
(posting on pgsql-perf as I'm questioning the pertinence of the
settings, might not be the best place for the overall pb: apologies)

Postgresql 8.1.10
Linux Ubuntu: 2.6.17-12-server
4GB RAM, machine is only used for this I do have less than 30 tables, 4
of them having between 10-40 million rows, size on disk is approximately 50G
Nothing spectacular on the install, it's mainly sandbox.

Relevant bits of the postgresql.conf
max_connections = 15
shared_buffers = 49152
work_mem = 16384
maintenance_work_mem = 32768
max_fsm_pages = 40000
effective_cache_size = 100000


I'm doing a rather 'simplistic' query, though heavy on hashing and
aggregate:

For the records:
select count(*) from action where action_date between '2007-10-01' and
'2007-10-31'
9647980


The query is:

select tspent, count(*) from (
select sum(time_spent)/60 as tspent from action
where action_date between '2007-10-01' and '2007-10-31'
group by action_date, user_id
) as a
group by tstpent
order by tspent asc;

I do receive a memory alloc error for a 1.5GB request size. So I may
have oversized something significantly that is exploding (work_mem ?)
(I was running an explain analyze and had a pgsql_tmp dir reaching 2.9GB
until it died with result similar error as with the query alone)

ERROR: invalid memory alloc request size 1664639562
SQL state: XX000

Sometimes I do get:

ERROR: unexpected end of data
SQL state: XX000


table is along the line of  (sorry cannot give you the full table):

CREATE TABLE action (
  id SERIAL,
  action_date  DATE NOT NULL,
  time_spent INT NOT NULL,
  user_id TEXT NOT NULL,  -- user id is a 38 character string
  ...
);

CREATE INDEX action_action_date_idx
  ON action USING btree(action_date);

Here is an explain analyze for just 1 day:

"HashAggregate  (cost=709112.04..709114.54 rows=200 width=8) (actual
time=9900.994..9902.188 rows=631 loops=1)"
"  ->  HashAggregate  (cost=706890.66..708001.35 rows=74046 width=49)
(actual time=9377.654..9687.964 rows=122644 loops=1)"
"        ->  Bitmap Heap Scan on action  (cost=6579.73..701337.25
rows=740455 width=49) (actual time=2409.697..6756.027 rows=893351 loops=1)"
"              Recheck Cond: ((action_date >= '2007-10-01'::date) AND
(action_date <= '2007-10-02'::date))"
"              ->  Bitmap Index Scan on action_action_date_idx
(cost=0.00..6579.73 rows=740455 width=0) (actual time=2373.837..2373.837
rows=893351 loops=1)"
"                    Index Cond: ((action_date >= '2007-10-01'::date)
AND (action_date <= '2007-10-02'::date))"
"Total runtime: 9933.165 ms"



-- stephane

Stephane Bailliez <sbailliez@gmail.com> writes:
> ERROR: invalid memory alloc request size 1664639562

This sounds like corrupt data --- specifically, 1664639562 showing
up where a variable-width field's length word ought to be.  It
may or may not be relevant that the ASCII equivalent of that bit
pattern is Jb8c ... do you work with data that contains such
substrings?

> Sometimes I do get:
> ERROR: unexpected end of data

If it's not 100% repeatable I'd start to wonder about flaky hardware.
Have you run memory and disk diagnostics on this machine recently?

            regards, tom lane

Tom Lane wrote:
> This sounds like corrupt data --- specifically, 1664639562 showing
> up where a variable-width field's length word ought to be.  It
> may or may not be relevant that the ASCII equivalent of that bit
> pattern is Jb8c ... do you work with data that contains such
> substrings?
>

Not specifically but I cannot rule it out entirely, it 'could' be in one
of the column which may have a combination of uppercase/lowercase/number
otherwise all other text entries would be lowercase.

> If it's not 100% repeatable I'd start to wonder about flaky hardware.
> Have you run memory and disk diagnostics on this machine recently?
I did extensive tests a month or two ago (long crunching queries running
non stop for 24h) which were ok but honestly cannot say I'm not very
trusty in this particular hardware. Would need to put it offline and
memtest it for good obviously.

I moved some data (and a bit more, same table has 35M rows) to another
machine (also 8.1.10 on ubuntu with 2.6.17-10 smp, 2 Xeon 2GHZ instead
of P4 3GHz) and it passes with flying colors (though it's much much
slower with the same settings so I need to check a few things in there,
I had tmp dir topping 3GB so not sure if I could have more in memory)

Thanks for the insight, Tom.

-- stephane