Re: pt_toast table seems to be - Mailing list pgsql-admin
From | Mark Kirkwood |
---|---|
Subject | Re: pt_toast table seems to be |
Date | |
Msg-id | 635fe574-b95a-dc26-24d9-6518c86f0363@catalyst.net.nz Whole thread Raw |
In response to | Re: pt_toast table seems to be (Rui DeSousa <rui.desousa@icloud.com>) |
List | pgsql-admin |
Also the Ceph version might be important (if you are running pre Jewel then there was a data corruption bug with db type workloads for 0.94.9 and earlier). regards Mark On 04/01/18 10:42, Rui DeSousa wrote: > > Interesting setup. What is your ceph settings? Are you using caching > and is it set to write-back or write-through? > > If you’re using caching it should be using write-through with the max > dirty set to zero. > > |rbd cache max dirty| to 0 > > http://docs.ceph.com/docs/master/rbd/rbd-config-ref/ > > > >> On Jan 3, 2018, at 3:38 PM, Jorge Daniel <elgaita@hotmail.com >> <mailto:elgaita@hotmail.com>> wrote: >> >> We're using ceph, ext4 volumes where resides our PGDATA , it's >> mounted like this: >> /dev/rbd5 on /pg_data type ext4 >> (rw,relatime,discard,stripe=1024,data=ordered) >> >> Jorge Daniel Fernandez >> >> >> ------------------------------------------------------------------------ >> *From:*Rui DeSousa <rui.desousa@icloud.com >> <mailto:rui.desousa@icloud.com>> >> *Sent:*Wednesday, January 3, 2018 5:20 PM >> *To:*Jorge Daniel >> *Subject:*Re: pt_toast table seems to be >> I’m not a docker expert nor have I ever run Postgres in a docker >> container; but my thought is docker is not handle sync() truly to >> form — i.e. buffering for performance improvements instead. If it >> buffers it but then crashes before the data is actual saved it will >> lead to corruption. I’ve seen similar issues with RAID controllers >> that buffer the sync() call but eventual fail to actual save the data >> for whatever reason; buffer overrun, system crashed, etc. >> >> How is the Postgres data stored In the docker container? What file >> system is it using? Is using a docker data volume, directory mount, >> or storage plug in? >> >> Doing a quick google search it does seem that docker has it’s own >> storage model which introduces it’s own COW semantics, plugins, etc; >> although, I can’t find anything on if it guarantees sync() calls. >> >>> On Jan 3, 2018, at 2:43 PM, Jorge Daniel <elgaita@hotmail.com >>> <mailto:elgaita@hotmail.com>> wrote: >>> >>> Hi Rui, every time autoAnalyze ran against that table , after a long >>> time running, the OOM killer act up and killed the autovacuum , in >>> some cases the engine got in recovery and other cases it was causing >>> the crash of the whole container. We're pretty sure that this last >>> ones were the ones that lead to this issue. >>> >>> We were able to reproduce the OOM killing and our findings lead us >>> to a strange Docker setting for the SHMMAX: >>> >>> Linux 434d18e30d83 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec >>> 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>> postgres@434d18e30d83:~$ cat /proc/sys/kernel/shmmax >>> 18446744073692774399 >>> >>> after we a fixed that number, we were able to run the ANALYZE and >>> Vacuum to its end without any OOM killing. >>> >>> This is related to >>> https://github.com/moby/moby/issues/16315 >>> >>> I don't have now that syslog to examine if the sync() has worked >>> properly or not. >>> >>> <https://github.com/moby/moby/issues/16315> >>> >>> Container is reporting incorrect /proc/sys/kernel/shmmax value · >>> Issue #16315 · moby/moby <https://github.com/moby/moby/issues/16315> >>> github.com <http://github.com/> >>> BUG REPORT INFORMATION My host is Ubuntu 15.04 (kernel >>> 3.19.0-28-generic) docker version: Client version: 1.5.0 Client API >>> version: 1.17 Go version (client): go1.3.3 Git commit (client): >>> a8a31ef OS... >>> >>> >>> >>> >>> Jorge Daniel Fernandez >>> >>> >>> ------------------------------------------------------------------------ >>> *From:*Rui DeSousa <rui.desousa@icloud.com >>> <mailto:rui.desousa@icloud.com>> >>> *Sent:*Wednesday, January 3, 2018 4:23 PM >>> *To:*Jorge Daniel >>> *Cc:*pgsql-admin@lists.postgresql.org >>> <mailto:pgsql-admin@lists.postgresql.org> >>> *Subject:*Re: pt_toast table seems to be >>> Do you mean the Postgres instance is killed or the docker instance >>> is killed? For OOM, what’s the memory configuration and how much >>> swap is allocated? >>> >>> I would look at your disk subsystem from the prospected of Postgres; >>> when it issues a sync() does your setup honor it? If you docker >>> instance is crashing then it seems like sync() might not honored >>> which would lead to corruption issues you’re describing. >>> >>>> On Jan 3, 2018, at 10:25 AM, Jorge Daniel <elgaita@hotmail.com >>>> <mailto:elgaita@hotmail.com>> wrote: >>>> >>>> Hi guys >>>> This is my first post in the comunity so my apologies in advance >>>> about the formalities. >>>> >>>> >>>> In a production db , and after several OOM killer events on the >>>> postgres running on a docker (now fixed) we remain with a kind of >>>> corrupted pg_toast 😐: >>>> >>>> >>>> 2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL >>>> JDBC Driver:user=veon:db=veon:ERROR: unexpected chunk number 0 >>>> (expected 1) for toast value 17143928 in pg_toast_77809 >>>> 2017-11-29 23:46:13.147 PST rhost=10.149.54.5(54750) app=PostgreSQL >>>> JDBC Driver:user=veon:db=veon:STATEMENT: SELECT "id", >>>> "needs_pumping", "needs_lucene", "surveybatchid", "ip_address", >>>> "is_open_proxy", "is_partial", "cookies", "social_media_source", >>>> "social_media_review_id", "social_media_pull_id", >>>> "social_media_pull_id_long", "source_config_unitid", >>>> "source_config_source_id", "updated", >>>> "external_id_printable_coder", "unique_reusable_id", >>>> "invite_errmsg", "reminder_errmsg", "last_seen_pagename", >>>> "last_submitted_pagename", "has_content_fields", "logid", >>>> "alertid", "language_name", "e_status", "e_reminder_status", >>>> "e_lastupdated", "e_creationdate", "e_sampleddate", >>>> "e_responsedate", "e_invitationdate", "reminderdate", >>>> "e_delivereddate", "e_remindeddate", "e_accepteddate", >>>> "e_initialfinishdate", "e_ta_completed_date", "e_expirationdate", >>>> "e_survey_method", "e_survey_source", "e_survey_type", >>>> "parse_status", "tagging_attempts", "e_optout", >>>> "e_bounce_category", "feed_fileid", "feed_file_recordid", >>>> "startdate", "e_lastname", "e_firstname", "e_address", >>>> "e_address2", "e_city", "e_state", "e_postalcode", "e_phone", >>>> "e_email", "is_mobile", "is_mobile_first", "is_mobile_finished", >>>> "is_cookie_confirmation_needed", "exclude_from_problem_tracker", >>>> "is_invitation_error", "page_number_seen_last", >>>> "page_number_seen_highest", "pages_submitted", >>>> "pages_validation_failed", "last_submit_date", "user_agent", >>>> "jsonb_fields", "thrift", "episodeid", "e_unitid", >>>> "e_committed_survey_specid", "customer_veonid", "xmin" FROM >>>> "survey_prior" WHERE survey_veon.surveyid = ANY($1) >>>> >>>> The usual fix is : >>>> >>>> REINDEX TABLE pg_toast_77809; >>>> >>>> After this we can select * from survery_prior with no problem, and >>>> we can dump the entire table if we wish it. >>>> >>>> Then the jobs resumes and we're happy until we hit again with the >>>> same error , this occurs randomly , we suspect on the UPDATE on >>>> certain row is creating the corruption. >>>> >>>> We can't reproduce the ERROR because we can't do a full scan of the >>>> table while the error is still alive () on-call guys applies the >>>> fix inmediatly to resume production ). >>>> >>>> So how can we discover the bad-rows and expose them, or how do we >>>> check the consistency of the pt_toast table? >>>> >>>> We need this evidence to ask for window and a complete dump/restore >>>> that will wipe out this problem. >>>> >>>> Thx in advance >>>> >>>> Jorge Daniel Fernandez >
pgsql-admin by date: