Thread: invalid memory alloc request size 576460752438159360
Hi,
We are getting same error a lot for more than 1 days from different schemas in the same db.
< user=myuser db=mydb host=mydbip pid=18883 app=[unknown] time=2017-12-31 14:28:16.056 +03 > ERROR: invalid memory alloc request size 576460752438159360
CentOS Linux release 7.4.1708 (Core)
DB version: 9.6.6Memory: 256GB
CPU 2.2 Ghz 24 core
Disk: SAN Multipath
* write_cache is disabled
* there is no incorrect work_mem parameter setting.
* logical dump is working, (maybe) no curruption in data.
* there is streaming replication, we do not repeat the error in the replicas. (replicas in different minor versions, 9.6.4, 9.6.3 accordingly)
* we have large_object field, logical_dump also works with large_objects fields.
Any idea?
On Sun, Dec 31, 2017 at 1:50 PM, Ibrahim Edib Kokdemir <kokdemir@gmail.com> wrote:> * write_cache is disabled > * there is no incorrect work_mem parameter setting. > * logical dump is working, (maybe) no curruption in data. > * there is streaming replication, we do not repeat the error in the > replicas. (replicas in different minor versions, 9.6.4, 9.6.3 accordingly) > * we have large_object field, logical_dump also works with large_objects > fields. > > Any idea? This is very likely to be corruption. It's important to determine the cause and extent of this corruption. I suggest using amcheck for this, which is available for those Postgres versions from: https://github.com/petergeoghegan/amcheck Note that there are Debian and Redhat packages available. You'll definitely want to use the "heapallindexed" option here, at least for primary key indexes (pass "pg_index.indisprimary" as "heapallindexed" argument, while generalizing from the example SQL query for bt_index_check()). This process has a good chance of isolating the problem, especially if you let this list see any errors raised by the tool. -- Peter Geoghegan
Hi Peter,
I just installed and used amcheck_next, I have used your sample query on the git page (changed the schema name) and that listed all indexes different schemes and produced same outputs like yours with bt_index_check field as empty, that means no error.
Am I doing right?
2017-12-31 16:58 GMT+03:00 Peter Geoghegan <pg@bowt.ie>:
On Sun, Dec 31, 2017 at 1:50 PM, Ibrahim Edib Kokdemir
<kokdemir@gmail.com> wrote:> * write_cache is disabled
> * there is no incorrect work_mem parameter setting.
> * logical dump is working, (maybe) no curruption in data.
> * there is streaming replication, we do not repeat the error in the
> replicas. (replicas in different minor versions, 9.6.4, 9.6.3 accordingly)
> * we have large_object field, logical_dump also works with large_objects
> fields.
>
> Any idea?
This is very likely to be corruption. It's important to determine the
cause and extent of this corruption. I suggest using amcheck for this,
which is available for those Postgres versions from:
https://github.com/petergeoghegan/amcheck
Note that there are Debian and Redhat packages available.
You'll definitely want to use the "heapallindexed" option here, at
least for primary key indexes (pass "pg_index.indisprimary" as
"heapallindexed" argument, while generalizing from the example SQL
query for bt_index_check()). This process has a good chance of
isolating the problem, especially if you let this list see any errors
raised by the tool.
--
Peter Geoghegan
On Sun, Dec 31, 2017 at 1:10 PM, Ibrahim Edib Kokdemir <kokdemir@gmail.com> wrote: > I just installed and used amcheck_next, I have used your sample query on the > git page (changed the schema name) and that listed all indexes different > schemes and produced same outputs like yours with bt_index_check field as > empty, that means no error. > Am I doing right? You should give the argument of heapallindexed as 'true'. So: SELECT bt_index_check(index => c.oid, heapallindexed => true), c.relname, c.relpages FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog' -- Don't check temp tables, which may be from another session: AND c.relpersistence != 't' -- Function may throw an error when this is omitted: AND i.indisready AND i.indisvalid ORDER BY c.relpages DESC LIMIT 10; As I mentioned earlier, if this takes too long, you could only do heapallindexed checking once per table (not once per index) by giving "indisprimary" as the heapallindexed argument. That way, only primary keys would be verified against the heap, which is potentially a lot faster. -- Peter Geoghegan
On Sun, Dec 31, 2017 at 1:39 PM, Peter Geoghegan <pg@bowt.ie> wrote: > SELECT bt_index_check(index => c.oid, heapallindexed => true), > c.relname, > c.relpages > FROM pg_index i > JOIN pg_opclass op ON i.indclass[0] = op.oid > JOIN pg_am am ON op.opcmethod = am.oid > JOIN pg_class c ON i.indexrelid = c.oid > JOIN pg_namespace n ON c.relnamespace = n.oid > WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog' > -- Don't check temp tables, which may be from another session: > AND c.relpersistence != 't' > -- Function may throw an error when this is omitted: > AND i.indisready AND i.indisvalid > ORDER BY c.relpages DESC LIMIT 10; > > As I mentioned earlier, if this takes too long, you could only do > heapallindexed checking once per table (not once per index) by giving > "indisprimary" as the heapallindexed argument. That way, only primary > keys would be verified against the heap, which is potentially a lot > faster. Oh, and I think that you should remove the "ORDER BY c.relpages DESC LIMIT 10", too. -- Peter Geoghegan
> As I mentioned earlier, if this takes too long, you could only do
> heapallindexed checking once per table (not once per index) by giving
> "indisprimary" as the heapallindexed argument. That way, only primary
> keys would be verified against the heap, which is potentially a lot
> faster.
Oh, and I think that you should remove the "ORDER BY c.relpages DESC
LIMIT 10", too.
Thanks a lot.
hi Peter,
today, we took the dump of database and restored to another empty cluster and run the queries on it for test purposes, no problem at all. All errors are gone.