Home > mailing lists
Re: pt_toast table seems to be - Mailing list pgsql-admin

From	Mark Kirkwood
Subject	Re: pt_toast table seems to be
Date	January 4, 2018 04:37:55
Msg-id	635fe574-b95a-dc26-24d9-6518c86f0363@catalyst.net.nz Whole thread Raw
In response to	Re: pt_toast table seems to be (Rui DeSousa <rui.desousa@icloud.com>)
List	pgsql-admin
Tree view
Also the Ceph version might be important (if you are running pre Jewel 
then there was a data corruption bug with db type workloads for 0.94.9 
and earlier).

regards

Mark


On 04/01/18 10:42, Rui DeSousa wrote:
>
> Interesting setup.   What is your ceph settings? Are you using caching 
> and is it set to write-back or write-through?
>
> If you’re using caching it should be using write-through with the max 
> dirty set to zero.
>
> |rbd cache max dirty| to 0
>
> http://docs.ceph.com/docs/master/rbd/rbd-config-ref/
>
>
>
>> On Jan 3, 2018, at 3:38 PM, Jorge Daniel <elgaita@hotmail.com 
>> <mailto:elgaita@hotmail.com>> wrote:
>>
>> We're using ceph, ext4  volumes where resides our PGDATA , it's 
>> mounted like this:
>> /dev/rbd5 on /pg_data type ext4 
>> (rw,relatime,discard,stripe=1024,data=ordered)
>>
>> Jorge Daniel Fernandez
>>
>>
>> ------------------------------------------------------------------------
>> *From:*Rui DeSousa <rui.desousa@icloud.com 
>> <mailto:rui.desousa@icloud.com>>
>> *Sent:*Wednesday, January 3, 2018 5:20 PM
>> *To:*Jorge Daniel
>> *Subject:*Re: pt_toast table seems to be
>> I’m not a docker expert nor have I ever run Postgres in a docker 
>> container; but my thought is docker is not handle sync() truly to 
>> form — i.e. buffering for performance improvements instead.  If it 
>> buffers it but then crashes before the data is actual saved it will 
>> lead to corruption.  I’ve seen similar issues with RAID controllers 
>> that buffer the sync() call but eventual fail to actual save the data 
>> for whatever reason; buffer overrun, system crashed, etc.
>>
>> How is the Postgres data stored In the docker container? What file 
>> system is it using? Is using a docker data volume, directory mount, 
>> or storage plug in?
>>
>> Doing a quick google search it does seem that docker has it’s own 
>> storage model which introduces it’s own COW semantics, plugins, etc; 
>> although, I can’t find anything on if it guarantees sync() calls.
>>
>>> On Jan 3, 2018, at 2:43 PM, Jorge Daniel <elgaita@hotmail.com 
>>> <mailto:elgaita@hotmail.com>> wrote:
>>>
>>> Hi Rui, every time autoAnalyze ran against that table , after a long 
>>> time running, the OOM killer act up and killed the autovacuum , in 
>>> some cases the engine got in recovery and other cases it was causing 
>>> the crash of the whole container.  We're pretty sure that this last 
>>> ones were the ones that lead to this issue.
>>>
>>> We were able to reproduce the OOM killing and our findings lead us 
>>> to a strange Docker setting for the SHMMAX:
>>>
>>> Linux 434d18e30d83 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 
>>> 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>> postgres@434d18e30d83:~$ cat /proc/sys/kernel/shmmax
>>> 18446744073692774399
>>>
>>> after we a fixed that   number, we were able to run the ANALYZE and 
>>> Vacuum to its end without any OOM killing.
>>>
>>> This is related to
>>> https://github.com/moby/moby/issues/16315
>>>
>>> I don't have now that  syslog to examine if the sync() has worked 
>>> properly or not.
>>>
>>> <https://github.com/moby/moby/issues/16315>
>>>     
>>> Container is reporting incorrect /proc/sys/kernel/shmmax value · 
>>> Issue #16315 · moby/moby <https://github.com/moby/moby/issues/16315>
>>> github.com <http://github.com/>
>>> BUG REPORT INFORMATION My host is Ubuntu 15.04 (kernel 
>>> 3.19.0-28-generic) docker version: Client version: 1.5.0 Client API 
>>> version: 1.17 Go version (client): go1.3.3 Git commit (client): 
>>> a8a31ef OS...
>>>
>>>
>>>
>>>
>>> Jorge Daniel Fernandez
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:*Rui DeSousa <rui.desousa@icloud.com 
>>> <mailto:rui.desousa@icloud.com>>
>>> *Sent:*Wednesday, January 3, 2018 4:23 PM
>>> *To:*Jorge Daniel
>>> *Cc:*pgsql-admin@lists.postgresql.org 
>>> <mailto:pgsql-admin@lists.postgresql.org>
>>> *Subject:*Re: pt_toast table seems to be
>>> Do you mean the Postgres instance is killed or the docker instance 
>>> is killed?  For OOM, what’s the memory configuration and how much 
>>> swap is allocated?
>>>
>>> I would look at your disk subsystem from the prospected of Postgres; 
>>> when it issues a sync() does your setup honor it?  If you docker 
>>> instance is crashing then it seems like sync() might not honored 
>>> which would lead to corruption issues you’re describing.
>>>
>>>> On Jan 3, 2018, at 10:25 AM, Jorge Daniel <elgaita@hotmail.com 
>>>> <mailto:elgaita@hotmail.com>> wrote:
>>>>
>>>> Hi guys
>>>> This is my first post in the comunity so my apologies in advance 
>>>> about the formalities.
>>>>
>>>>
>>>> In a production db , and after several OOM killer events on the 
>>>> postgres running on a docker (now fixed) we remain with a kind of 
>>>> corrupted pg_toast 😐:
>>>>
>>>>
>>>> 2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL 
>>>> JDBC Driver:user=veon:db=veon:ERROR: unexpected chunk number 0 
>>>> (expected 1) for toast value 17143928 in pg_toast_77809
>>>> 2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL 
>>>> JDBC Driver:user=veon:db=veon:STATEMENT: SELECT "id", 
>>>> "needs_pumping", "needs_lucene", "surveybatchid", "ip_address", 
>>>> "is_open_proxy", "is_partial", "cookies", "social_media_source", 
>>>> "social_media_review_id", "social_media_pull_id", 
>>>> "social_media_pull_id_long", "source_config_unitid", 
>>>> "source_config_source_id", "updated", 
>>>> "external_id_printable_coder", "unique_reusable_id", 
>>>> "invite_errmsg", "reminder_errmsg", "last_seen_pagename", 
>>>> "last_submitted_pagename", "has_content_fields", "logid", 
>>>> "alertid", "language_name", "e_status", "e_reminder_status", 
>>>> "e_lastupdated", "e_creationdate", "e_sampleddate", 
>>>> "e_responsedate", "e_invitationdate", "reminderdate", 
>>>> "e_delivereddate", "e_remindeddate", "e_accepteddate", 
>>>> "e_initialfinishdate", "e_ta_completed_date", "e_expirationdate", 
>>>> "e_survey_method", "e_survey_source", "e_survey_type", 
>>>> "parse_status", "tagging_attempts", "e_optout", 
>>>> "e_bounce_category", "feed_fileid", "feed_file_recordid", 
>>>> "startdate", "e_lastname", "e_firstname", "e_address", 
>>>> "e_address2", "e_city", "e_state", "e_postalcode", "e_phone", 
>>>> "e_email", "is_mobile", "is_mobile_first", "is_mobile_finished", 
>>>> "is_cookie_confirmation_needed", "exclude_from_problem_tracker", 
>>>> "is_invitation_error", "page_number_seen_last", 
>>>> "page_number_seen_highest", "pages_submitted", 
>>>> "pages_validation_failed", "last_submit_date", "user_agent", 
>>>> "jsonb_fields", "thrift", "episodeid", "e_unitid", 
>>>> "e_committed_survey_specid", "customer_veonid", "xmin" FROM 
>>>> "survey_prior" WHERE survey_veon.surveyid = ANY($1)
>>>>
>>>> The usual fix is :
>>>>
>>>> REINDEX TABLE pg_toast_77809;
>>>>
>>>> After this we can select * from survery_prior with no problem, and 
>>>> we can dump the entire table if we wish it.
>>>>
>>>> Then the jobs resumes and we're happy until we hit again with the 
>>>> same error , this occurs randomly , we suspect on the UPDATE  on 
>>>> certain row is creating the corruption.
>>>>
>>>> We can't reproduce the ERROR because we can't do a full scan of the 
>>>> table while the error is still alive () on-call guys applies the 
>>>> fix inmediatly to resume production ).
>>>>
>>>> So how can we discover the bad-rows and expose them, or how do we 
>>>> check the consistency of the pt_toast table?
>>>>
>>>> We need this evidence to ask for window and a complete dump/restore 
>>>> that will wipe out this problem.
>>>>
>>>> Thx in advance
>>>>
>>>> Jorge Daniel Fernandez
>
pgsql-admin by date:
From: Rui DeSousa
Date: 04 January 2018, 00:42:00
Subject: Re: pt_toast table seems to be
From: "Dave Bolt"
Date: 04 January 2018, 16:25:09
Subject: RE: Failed to execute pg_dump
Re: pt_toast table seems to be - Mailing list pgsql-admin

Previous

Next