Thread: How to Resolve Data Being Truncated or Rounded Up During PostgreSQL Migration from v9.623 to v12.8?

I manage a PostgreSQL databases - we currently have clusters on PostgreSQL v9.6.23 & PostgreSQL v12.8.
Our database clusters are on Linux VMs, with OS:
  Flavor: redhat_7
  Release: 3.10.0-1160.45.1.el7.x86_64

We have repmgr clusters of 1 Primary & 2 Standby servers & use another server with PgBouncer to direct the connections to the current Primary.  

I am in the process of migrating the v9.6.23 databases to the v12.8 cluster, which already has live databases on it, so I'm doing a pg_dump on the v9.6 cluster for the individual databases to be migrated & restoring the backups to the v12.8 cluster.  I'm currently testing in a sandbox cluster.  The restore completes successfully.  

After the restore, I compare the rowcounts of the dbs from both versions to verify that the data loaded correctly.
I also do a pg_dump of just the data from both clusters & compare them with the diff utility.  For one of the databases, I'm discovering some differences in the data.  It looks like some data is being truncated:

5,6c5,6 

< -- Dumped from database version 9.6.23 

< -- Dumped by pg_dump version 9.6.23 

--- 

> -- Dumped from database version 12.8 

> -- Dumped by pg_dump version 12.8 

34085c34085 

< xxxxxxxx      xxxxxxxx        xxx     P       108     xxxxxxx UP      FI      xxxx-xx-xx      53809.6016      53809.6016      52W      0       xxx     0       xxxxx   \N 

--- 

> xxxxxxxx      xxxxxxxx        xxx     P       108     xxxxxxx UP      FI      xxxx-xx-xx      53809.6        53809.6        52W        0       xxx     0       xxxxx   \N 

34088c34088 

< xxxxxxxx      xxxxxxxx        xxx     P       108     xxxxxxx UP      FI      xxxx-xx-xx      53809.6016      53809.6016      52W      0       xxx     0       xxxxx   \N 

--- 

> xxxxxxxx      xxxxxxxx        xxx     P       108     xxxxxxx UP      FI      xxxx-xx-xx      53809.6        53809.6        52W        0       xxx     0       xxxxx   \N  

       ß data is truncated in new database 

147825,147826c147825,147826 

< xxxxxxxx      \N      \N      \N      46716.8008      \N      \N      \N      \N      \N      \N      \N 

< xxxxxxxx      \N      \N      \N      38729.6016      \N      \N      \N      \N      \N      \N      \N 

--- 

> xxxxxxxx      \N      \N      \N      46716.8          \N      \N      \N      \N      \N      \N      \N 

> xxxxxxxx      \N      \N      \N      38729.6          \N      \N      \N      \N      \N      \N      \N 


When I looked at the table specification, it is the same in both versions & the affected columns are specified as datatype real:

               Table "tablex" 

      Column      |         Type         | Modifiers 

------------------+----------------------+----------- 

 id               | integer              | not null 

 column2          | character(8)         | not null 

 column3          | character(3)         | not null 

 column4          | character(1)        

 column5          | character(4)        

 column6          | character(10)       

 column7          | character(2)        

 column8          | date                

 column9          | real                 | 

 column10         | real                 | 


When I do a select on each database version, the results both display the truncated data:

 id         | column9      | column10 

------------+--------------+------------------ 

   xxxxxxxx |      53809.6 |          53809.6 

(1 row) 

And when I try to export the data from both versions, the data also exports with a 1-digit decimal for those columns.
It's only when I do the pg_dump that I can see the extra digits from the v9.6.23 tables.

In other tables, I'm seeing differences with only 2 digits showing for columns where the datatype is real - they are being rounded up.  For example:

xxxxxxxx        19.8199997      \N      \N      3435    \N      1       \N      \N      \N      3435    0      

       3435    \N      \N      \N     

… 

xxxxxxxx        25.8700008      \N      \N      4484.12988      80      \N      \N      \N      \N      2069.6001 

       0       0       2069.6001       \N      \N      \N      vs.

xxxxxxxx        19.82   \N      \N      3435    \N      1       \N      \N      \N      3435    0       0       3435 

    \N      \N      \N     

… 

xxxxxxxx        25.87   \N      \N      4484.13 80      \N      \N      \N      \N      2069.6  0       0       2069.6  \N      \N      \N     



How can I ensure that the data was migrated correctly - that the data hasn't been truncated or rounded up in the v12.8 tables?
Any help would be greatly appreciated.

Thanks,

Karin Hilbert

"Hilbert, Karin" <ioh1@psu.edu> writes:
> [ PG12 displays float values a tad differently from 9.6 ]

This is not a bug; we just changed the behavior of the
"extra_float_digits" display option, so that it's less likely
to print garbage digits.  A float4 value only has about six
decimal digits of precision to begin with, and those extra
digits you are seeing in the 9.6 dump are basically fictional.

pg_dump does "set extra_float_digits to 3", which used to be
necessary to get reproducible results when dumping from old
servers.  That has this effect on 9.6:

regression=# select '53809.6'::float4;
 float4  
---------
 53809.6
(1 row)

regression=# set extra_float_digits to 3;
SET
regression=# select '53809.6'::float4;
   float4   
------------
 53809.6016
(1 row)

But it has no effect on new servers, because the six digits are
already enough to recreate the float4 value exactly.  The "016"
added by the old server is basically roundoff noise.

The reason for extra_float_digits is that with the old output
algorithm, there were corner cases where we had to print more than
six digits to ensure the value reloads exactly.  The new algorithm
automatically prints the minimum number of digits needed to ensure
exact reload.

All the same comments apply to float8, of course, with a
different number of digits.

            regards, tom lane



On 10/28/21 07:44, Hilbert, Karin wrote:
> I manage aPostgreSQL databases - we currently have clusters on 
> *PostgreSQL v9.6.23* & **PostgreSQL v12.8**.
> Our database clusters are on Linux VMs, with OS:
>    Flavor:*redhat_7*
>    Release: *3.10.0-1160.45.1.el7.x86_64*
> 
> We have repmgr clusters of 1 Primary & 2 Standby servers & use another 
> server with PgBouncer to direct the connections to the current Primary.
> 
> How can I ensure that the data was migrated correctly - that the data 
> hasn't been truncated or rounded up in the v12.8 tables?
> Any help would be greatly appreciated.

In postgresql.conf what are the settings for?:

https://www.postgresql.org/docs/12/runtime-config-client.html

extra_float_digits

"
Note

The meaning of this parameter, and its default value, changed in 
PostgreSQL 12; see Section 8.1.3 for further discussion.
"

Section 8.1.3

https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-FLOAT


> 
> Thanks,
> 
> Karin Hilbert
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com





Thanks for the explanation, Tom.  🙂

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, October 28, 2021 11:04 AM
To: Hilbert, Karin <ioh1@psu.edu>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: How to Resolve Data Being Truncated or Rounded Up During PostgreSQL Migration from v9.623 to v12.8?
 
"Hilbert, Karin" <ioh1@psu.edu> writes:
> [ PG12 displays float values a tad differently from 9.6 ]

This is not a bug; we just changed the behavior of the
"extra_float_digits" display option, so that it's less likely
to print garbage digits.  A float4 value only has about six
decimal digits of precision to begin with, and those extra
digits you are seeing in the 9.6 dump are basically fictional.

pg_dump does "set extra_float_digits to 3", which used to be
necessary to get reproducible results when dumping from old
servers.  That has this effect on 9.6:

regression=# select '53809.6'::float4;
 float4 
---------
 53809.6
(1 row)

regression=# set extra_float_digits to 3;
SET
regression=# select '53809.6'::float4;
   float4  
------------
 53809.6016
(1 row)

But it has no effect on new servers, because the six digits are
already enough to recreate the float4 value exactly.  The "016"
added by the old server is basically roundoff noise.

The reason for extra_float_digits is that with the old output
algorithm, there were corner cases where we had to print more than
six digits to ensure the value reloads exactly.  The new algorithm
automatically prints the minimum number of digits needed to ensure
exact reload.

All the same comments apply to float8, of course, with a
different number of digits.

                        regards, tom lane