Re: [External] postgres 9.5 DB corruption: invalid byte sequence forencoding "UTF8" - Mailing list pgsql-general

From Vijaykumar Jain
Subject Re: [External] postgres 9.5 DB corruption: invalid byte sequence forencoding "UTF8"
Date
Msg-id CAE7uO5iDycP=aDVvxZgdfMCnd0Co1e2WQet_Y0wzWwp8bL1WAg@mail.gmail.com
Whole thread Raw
In response to postgres 9.5 DB corruption: invalid byte sequence for encoding"UTF8"  (Thomas Tignor <tptignor@yahoo.com>)
Responses Re: [External] postgres 9.5 DB corruption: invalid byte sequence forencoding "UTF8"  ("Brad Nicholson" <bradn@ca.ibm.com>)
Re: [External] postgres 9.5 DB corruption: invalid byte sequencefor encoding "UTF8"  (Thomas Tignor <tptignor@yahoo.com>)
List pgsql-general
i think the experts will chime in soon,

but why do you think this as db corruption and not just a bad input?
https://github.com/postgres/postgres/blob/master/src/pl/plperl/expected/plperl_lc_1.out

or it may also be encoding issue.
https://pganalyze.com/docs/log-insights/app-errors/U137

can you do a pg_dump and restore on a parallel instance? does it
result in failure?

we also ask the app to log data (temporarily) inserted  so that we
could figure out directly if there was bad data upstream or have
validations to prevent inserts when there is bad data.

also, in our case the query was stuck at "PARSE"  (if you do ps aux |
grep postgres) and in some cases did result in oom.
but upgrading the client and using session mode pooling in pgbouncer
worked for us.




Regards,
Vijay

On Tue, Mar 26, 2019 at 12:17 AM Thomas Tignor <tptignor@yahoo.com> wrote:
>
> Hoping someone may be able to offer some guidance on this recurring problem. I am providing this "problem report" to
thegeneral list as I understand the bugs list requires a set of reproduction steps we do not yet have. Please advise if
Ihave the process wrong. I have tried to provide all known relevant info here. Thanks in advance for any insights. 
>
> --> A description of what you are trying to achieve and what results you expect.:
>
> We are experiencing intermittent DB corruption in postgres 9.5.14. We are trying to identify and eliminate all
sources.We are using two independent services for data replication, Slony-I v2.2.6 and a custom service developed
in-house.Both are based on COPY operations. DB corruption is observed when COPY operations fail with an error of the
form:'invalid byte sequence for encoding "UTF8"'. This occurs with a frequency ranging between a few days and several
weeks.Each incident is followed by a race to find and repair or remove corrupted data, which we are getting good at.
Withwell over a dozen incidents, the great majority originally showed corruption in a single VARCHAR(2000) column
(value)in a single table (alert_attribute). In this time, we read about suspected and real problems with TOAST
functionalityand so made the decision to change alert_attribute.value to PLAIN storage. Since that change was made,
mostnew incidents show corruption in the alert_attribute.name column instead (VARCHAR(200)). Another table
(alert_instance)has been impacted as well. See below for their schemas. 
>
> We have looked high and low through system logs and device reporting utility output for any sign of hardware
failures.We haven't turned up anything yet. We also tried rebuilding an entire DB from scratch. That did not seem to
help.
>
> We have not been performing routine reindexing. This is a problem we are working to correct. Normally our master DB
servesfor an 8-12 week period without reindexing before we failover to a peer. Before assuming the master role, the
peeralways begins by truncating the alert_instance and alert_attribute tables and loading all data from the current
master.
>
> Hardware specs are listed below. For storage, we have 8 INTEL SSDSA2BW12 direct-attached disks. We can provide
additionalinfo as needed. 
>
> ams=# \d ams.alert_attribute
>                Table "ams.alert_attribute"
>       Column       |          Type           | Modifiers
> -------------------+-------------------------+-----------
>  alert_instance_id | integer                 | not null
>  name              | character varying(200)  | not null
>  data_type         | smallint                | not null
>  value             | character varying(2000) |
> Indexes:
>     "pk_alert_attributes" PRIMARY KEY, btree (alert_instance_id, name), tablespace "tbls5"
>     "idx_aa_aval" btree (name, value)
> Foreign-key constraints:
>     "fk_alert_attr_instance_id" FOREIGN KEY (alert_instance_id) REFERENCES ams.alert_instance(alert_instance_id) ON
DELETECASCADE 
> Triggers:
>     _ams_cluster_logtrigger AFTER INSERT OR DELETE OR UPDATE ON ams.alert_attribute FOR EACH ROW EXECUTE PROCEDURE
_ams_cluster.logtrigger('_ams_cluster','2', 'kk') 
>     _ams_cluster_truncatetrigger BEFORE TRUNCATE ON ams.alert_attribute FOR EACH STATEMENT EXECUTE PROCEDURE
_ams_cluster.log_truncate('2')
> Disabled user triggers:
>     _ams_cluster_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON ams.alert_attribute FOR EACH ROW EXECUTE PROCEDURE
_ams_cluster.denyaccess('_ams_cluster')
>     _ams_cluster_truncatedeny BEFORE TRUNCATE ON ams.alert_attribute FOR EACH STATEMENT EXECUTE PROCEDURE
_ams_cluster.deny_truncate()
>
> ams=#
> ams=# \d ams.alert_instance
>                     Table "ams.alert_instance"
>        Column        |              Type              | Modifiers
> ---------------------+--------------------------------+-----------
>  alert_instance_id   | integer                        | not null
>  alert_definition_id | integer                        | not null
>  alert_instance_key  | character varying(500)         | not null
>  start_active_date   | timestamp(0) without time zone | not null
>  stop_active_date    | timestamp(0) without time zone |
>  active              | smallint                       | not null
>  acknowledged        | smallint                       | not null
>  ack_clear_time      | timestamp(0) without time zone |
>  user_set_clear_time | smallint                       |
>  category_id         | integer                        | not null
>  condition_start     | timestamp(0) without time zone | not null
>  unack_reason        | character varying(1)           |
>  viewer_visible      | smallint                       | not null
> Indexes:
>     "pk_alert_instance" PRIMARY KEY, btree (alert_instance_id), tablespace "tbls5"
>     "idx_alert_inst_1" btree (alert_instance_key, alert_definition_id, alert_instance_id, active, acknowledged,
ack_clear_time),tablespace "tbls5" 
>     "idx_alert_inst_cat_id" btree (category_id), tablespace "tbls5"
>     "idx_alert_inst_def_id" btree (alert_definition_id), tablespace "tbls5"
> Check constraints:
>     "ck_alert_inst_acked" CHECK (acknowledged = 0 OR acknowledged = 1)
>     "ck_alert_inst_active" CHECK (active = 0 OR active = 1)
>     "ck_alert_inst_set_cl_tm" CHECK (user_set_clear_time = 0 OR user_set_clear_time = 1)
>     "ck_alert_inst_viewer_vis" CHECK (viewer_visible = 0 OR viewer_visible = 1)
> Foreign-key constraints:
>     "fk_alert_inst_cat_id" FOREIGN KEY (category_id) REFERENCES ams.category(category_id)
>     "fk_alert_inst_def_id" FOREIGN KEY (alert_definition_id) REFERENCES ams.alert_definition(alert_definition_id)
>     "fk_alert_inst_unack_reason" FOREIGN KEY (unack_reason) REFERENCES ams.unack_reason(unack_reason)
> Referenced by:
>     TABLE "ams.alert_attribute" CONSTRAINT "fk_alert_attr_instance_id" FOREIGN KEY (alert_instance_id) REFERENCES
ams.alert_instance(alert_instance_id)ON DELETE CASCADE 
> Triggers:
>     _ams_cluster_logtrigger AFTER INSERT OR DELETE OR UPDATE ON ams.alert_instance FOR EACH ROW EXECUTE PROCEDURE
_ams_cluster.logtrigger('_ams_cluster','1', 'k') 
>     _ams_cluster_truncatetrigger BEFORE TRUNCATE ON ams.alert_instance FOR EACH STATEMENT EXECUTE PROCEDURE
_ams_cluster.log_truncate('1')
> Disabled user triggers:
>     _ams_cluster_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON ams.alert_instance FOR EACH ROW EXECUTE PROCEDURE
_ams_cluster.denyaccess('_ams_cluster')
>     _ams_cluster_truncatedeny BEFORE TRUNCATE ON ams.alert_instance FOR EACH STATEMENT EXECUTE PROCEDURE
_ams_cluster.deny_truncate()
>
> ams=#
>
> --> PostgreSQL version number you are running:
> 9.5.14
>
> --> How you installed PostgreSQL:
> We use a customized installer. It copies a “make install”-based distribution into place on a target install host.
> Our compiler is gcc 5.4.0.
>
> CFLAGS: -m32 -march=opteron -mno-3dnow -ggdb -O2 -Wall
>
> $P/CONFIGURE_OPTIONS := \
>         --with-perl \
>         --with-openssl \
>         --with-libxml \
>         --with-libxslt \
>         --with-includes="$(COMMON)/include/perl $(COMMON)/include" \
>         --with-libs="$(COMMON)/lib" --prefix=$($P/PREFIX) \
>         --enable-debug \
>         --disable-nls
>
> --> Changes made to the settings in the postgresql.conf file:  see Server Configuration for a quick way to list them
all.
> ams=# SELECT version();
>                                                         version
>
------------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.5.14 on i686-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9aka9.0.3) 5.4.0 20160609,
32-bit
> (1 row)
>
> ams=# SELECT name, current_setting(name), SOURCE
> ams-#   FROM pg_settings
> ams-#   WHERE SOURCE NOT IN ('default', 'override');
>               name              |  current_setting   |        source
> --------------------------------+--------------------+----------------------
>  application_name               | psql               | client
>  autovacuum_vacuum_cost_delay   | 100ms              | configuration file
>  checkpoint_completion_target   | 0.9                | configuration file
>  client_encoding                | SQL_ASCII          | client
>  cpu_index_tuple_cost           | 0.001              | configuration file
>  cpu_operator_cost              | 0.0005             | configuration file
>  cpu_tuple_cost                 | 0.002              | configuration file
>  DateStyle                      | ISO, MDY           | configuration file
>  default_statistics_target      | 150                | configuration file
>  default_text_search_config     | pg_catalog.english | configuration file
>  effective_cache_size           | 8007MB             | configuration file
>  lc_messages                    | C                  | configuration file
>  lc_monetary                    | C                  | configuration file
>  lc_numeric                     | C                  | configuration file
>  lc_time                        | C                  | configuration file
>  listen_addresses               | 0.0.0.0            | configuration file
>  log_autovacuum_min_duration    | 0                  | configuration file
>  log_checkpoints                | on                 | configuration file
>  log_connections                | on                 | configuration file
>  log_disconnections             | on                 | configuration file
>  log_line_prefix                | %m %p %u %h %d     | configuration file
>  log_min_duration_statement     | 1min               | configuration file
>  log_min_messages               | notice             | configuration file
>  log_temp_files                 | 0                  | configuration file
>  maintenance_work_mem           | 512MB              | configuration file
>  max_connections                | 600                | configuration file
>  max_stack_depth                | 2MB                | environment variable
>  max_wal_size                   | 15360TB            | configuration file
>  random_page_cost               | 2.5                | configuration file
>  restart_after_crash            | off                | configuration file
>  shared_buffers                 | 2GB                | configuration file
>  ssl                            | on                 | configuration file
>  ssl_ca_file                    | root.crt           | configuration file
>  standard_conforming_strings    | on                 | configuration file
>  statement_timeout              | 1min               | configuration file
>  superuser_reserved_connections | 10                 | configuration file
>  syslog_ident                   | postgresql9.5-1.12 | configuration file
>  tcp_keepalives_idle            | 0                  | configuration file
>  unix_socket_permissions        | 0770               | configuration file
>  work_mem                       | 30MB               | configuration file
> (40 rows)
>
> ams=#
>
> --> Operating system and version:
> Ubuntu 16.04.5 LTS
>
> --> What program you're using to connect to PostgreSQL:
> Most DML is performed by a standalone JVM using the PostgreSQL JDBC Driver v42.0.0.
>
> --> Is there anything relevant or unusual in the PostgreSQL server logs?:
> We have sometimes but not always seen "invalid memory alloc request size" errors around the time of these events. Our
experiencesuggests that "invalid memory alloc request size" errors are a symptom specifically of index corruption and
areremediated by reindexing. 
>
> --> For questions about any kind of error:
> --> What you were doing when the error happened / how to cause the error:
> Periodic DML (inserts and updates) focused primarily on two DB tables (ams.alert_instance and ams.alert_attribute)
withhourly batched transfers of inactive alerts to history tables (deletes from ams.alert_instance, ams.alert_attribute
andcorresponding inserts to ams.alert_instance_hist, ams.alert_attribute_hist within the same transaction). 
>
> --> The EXACT TEXT of the error message you're getting, if there is one: (Copy and paste the message to the email, do
notsend a screenshot) 
> 2018-12-27 07:25:24.527 GMT 30462 appuser 127.0.0.1 ams ERROR:  invalid byte sequence for encoding "UTF8": 0x86
> The specific identified byte varies with each incident.
>
> --> CPU Info
> 2*4-core Intel(R) Xeon(R) CPU E31270 @ 3.40GHz
>
> --> Mem Info
> 16GB RAM
>
> --> Storage Info
> root@ams4.dfw.netmgmt:~# lsscsi -l
>
> [0:0:0:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sda
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> [0:0:1:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sdb
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> [0:0:2:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sdc
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> [0:0:3:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sdd
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> [0:0:4:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sde
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> [0:0:5:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sdf
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> [0:0:6:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sdg
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> [0:0:7:0]    disk    ATA      INTEL SSDSA2BW12 0362  /dev/sdh
>   state=running queue_depth=32 scsi_level=7 type=0 device_blocked=0 timeout=30
> root@ams4.dfw.netmgmt:~#
>
> root@ams4.dfw.netmgmt:~# modinfo md_mod
> filename:       /lib/modules/4.14.68-4.14.3-amd64-3adf3675665129fa/kernel/drivers/md/md-mod.ko
> alias:          block-major-9-*
> alias:          md
> description:    MD RAID framework
> license:        GPL
> depends:
> retpoline:      Y
> intree:         Y
> name:           md_mod
> vermagic:       4.14.68-4.14.3-amd64-3adf3675665129fa SMP mod_unload modversions
> parm:           start_dirty_degraded:int
> parm:           create_on_open:bool
> root@ams4.dfw.netmgmt:~#
>
>
> Tom    :-)


pgsql-general by date:

Previous
From: Thomas Tignor
Date:
Subject: postgres 9.5 DB corruption: invalid byte sequence for encoding"UTF8"
Next
From: "Brad Nicholson"
Date:
Subject: Re: [External] postgres 9.5 DB corruption: invalid byte sequence forencoding "UTF8"