Thread: Insane behaviour in 8.3.3

Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
Hello,one remote user reported a problem and i was surprised to witness the following behaviour.
It is on postgresql 8.3.3

dynacom=# BEGIN;
BEGIN
dynacom=# 
dynacom=# 
dynacom=# insert into xadmin(appname,apptbl_tmp,gao,id,comment)
dynacom-# values('PMS','overhaul_report_tmp','INSERT',nextval('overhaul_report_tmp_pkid_seq'),' zzz ');
INSERT 0 1
dynacom=# 
dynacom=# insert into
items_tmp(id,vslwhid,serialno,rh,lastinspdate,classused,classsurvey,classsurveydate,classduedate,
dynacom(# classpostponed,classcomment,defid,machtypecount,totalrh,comment,attachments,lastrepdate,pmsstate,xid,classaa)

dynacom-# select
id,vslwhid,serialno,rh,lastinspdate,classused,classsurvey,classsurveydate,classduedate,classpostponed,
dynacom-# classcomment,defid,machtypecount,totalrh,comment,attachments,lastrepdate,pmsstate,currval('xadmin_xid_seq'),
dynacom-# classaa from items where id=1261319;
INSERT 0 1
dynacom=# -- in the above 'xadmin_xid_seq' has taken a new value in the first insert
dynacom=# SELECT currval('xadmin_xid_seq');currval 
---------  61972
(1 row)
dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=61972;  id    
---------1261319
(1 row)
dynacom=# -- ok this is how it should be
dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=currval('xadmin_xid_seq');id 
----
(0 rows)
dynacom=# -- THIS IS INSANE

This code has run fine (the last SELECT returns exactly one row) for 5,409,779 total transactions thus far, in 70 
different postgresql slave installations (mixture of 8.3.3 and 8.3.13) (we are a shipping company), 
until i got this error report from a user yesterday.

What could be causing this? How could i further investigate this? Unfortunately the remote installations are neither 
physically accessible nor by TCP/IP accesible (comms are done via UUCP and administration via minicom, and the costs
are
 
just huge 5 USD/min for 33Kbits/sec). So, i would exhaust all posibilities before deciding to ship a new postgresql
versionthere,
 
and remotely upgrade, physically travel to the ship or even trying to do a backup/initdb/restore in the existing
version.
Any help would be really really appreciated.

Also, as you might have understood, upgrading, although generally a good idea, does not apply so easily in our case.

Some information about the schema :

dynacom=# \d xadmin                                             Table "public.xadmin"  Column   |            Type
     |                          Modifiers                           
 
------------+-----------------------------+--------------------------------------------------------------xid        |
bigint                     | not null default nextval(('xadmin_xid_seq'::text)::regclass)appname    | text
         | not nullapptbl_tmp | text                        | not nullgao        | character varying(40)       | not
nullid        | integer                     | not nullcomment    | text                        | state      | text
                 | not null default 'NPY'::textarcedon    | timestamp without time zone | default now()
 
Indexes:   "xa_pk" PRIMARY KEY, btree (xid)   "xa_appname_idx" btree (appname)   "xa_appname_state_idx" btree (appname,
state)  "xa_state_idx" btree (state)
 


dynacom=# \d items_tmp               Table "public.items_tmp"    Column      |          Type          | Modifiers 
-----------------+------------------------+-----------id              | integer                | not nullvslwhid
| integer                | serialno        | character varying(40)  | rh              | integer                |
lastinspdate   | date                   | classused       | integer                | classaa         | text
     | classsurvey     | character varying(100) | classsurveydate | date                   | classduedate    | date
             | classpostponed  | date                   | classcomment    | text                   | defid           |
integer               | machtypecount   | integer                | totalrh         | integer                | comment
     | character varying(200) | attachments     | text[]                 | lastrepdate     | date                   |
pmsstate       | character varying(200) | xid             | bigint                 | not null
 
Indexes:   "it_tmp_pk" PRIMARY KEY, btree (id, xid)
Foreign-key constraints:   "items_tmp_xid_fkey" FOREIGN KEY (xid) REFERENCES xadmin(xid)


-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Adrian Klaver
Date:
On 06/14/2012 01:39 AM, Achilleas Mantzios wrote:
> Hello,one remote user reported a problem and i was surprised to witness the following behaviour.
> It is on postgresql 8.3.3
> 
> dynacom=# BEGIN;
> BEGIN
> dynacom=#
> dynacom=#
> dynacom=# insert into xadmin(appname,apptbl_tmp,gao,id,comment)
> dynacom-# values('PMS','overhaul_report_tmp','INSERT',nextval('overhaul_report_tmp_pkid_seq'),' zzz ');
> INSERT 0 1
> dynacom=#
> dynacom=# insert into
items_tmp(id,vslwhid,serialno,rh,lastinspdate,classused,classsurvey,classsurveydate,classduedate,
> dynacom(#
classpostponed,classcomment,defid,machtypecount,totalrh,comment,attachments,lastrepdate,pmsstate,xid,classaa)
> dynacom-# select
id,vslwhid,serialno,rh,lastinspdate,classused,classsurvey,classsurveydate,classduedate,classpostponed,
> dynacom-#
classcomment,defid,machtypecount,totalrh,comment,attachments,lastrepdate,pmsstate,currval('xadmin_xid_seq'),
> dynacom-# classaa from items where id=1261319;
> INSERT 0 1
> dynacom=# -- in the above 'xadmin_xid_seq' has taken a new value in the first insert
> dynacom=# SELECT currval('xadmin_xid_seq');
>   currval
> ---------
>     61972
> (1 row)
> dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=61972;
>     id
> ---------
>   1261319
> (1 row)
> dynacom=# -- ok this is how it should be
> dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=currval('xadmin_xid_seq');
>   id
> ----
> (0 rows)
> dynacom=# -- THIS IS INSANE
> 
> This code has run fine (the last SELECT returns exactly one row) for 5,409,779 total transactions thus far, in 70
> different postgresql slave installations (mixture of 8.3.3 and 8.3.13) (we are a shipping company),
> until i got this error report from a user yesterday.
> 
> What could be causing this? How could i further investigate this?

The only thing I could come up with is:

SELECT id, currval('xadmin_xid_seq') from items_tmp WHERE id=1261319 ;

Its grasping at straws, but I can not come up with a logical reason for the above.

> Achilleas Mantzios
> IT DEPT
> 


-- 
Adrian Klaver
adrian.klaver@gmail.com


Re: Insane behaviour in 8.3.3

From
Richard Huxton
Date:
On 14/06/12 09:39, Achilleas Mantzios wrote:
> dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=61972;
>     id
> ---------
>   1261319
> (1 row)
> dynacom=# -- ok this is how it should be
> dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=currval('xadmin_xid_seq');
>   id
> ----
> (0 rows)
> dynacom=# -- THIS IS INSANE

Perhaps just do an EXPLAIN ANALYSE on both of those. If for some reason 
one is using the index and the other isn't then it could be down to a 
corrupted index. Seems unlikely though.

--   Richard Huxton  Archonet Ltd


Re: Insane behaviour in 8.3.3

From
Robert Edwards
Date:
On 14/06/12 18:39, Achilleas Mantzios wrote:

> dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=currval('xadmin_xid_seq');
>   id
> ----
> (0 rows)
> dynacom=# -- THIS IS INSANE

Have you tried:

SELECT id from items_tmp WHERE id=1261319 AND 
xid=currval('xadmin_xid_seq'::text)

or even:

SELECT id from items_tmp WHERE id=1261319 AND 
xid=currval(('xadmin_xid_seq'::text)::regclass)

Bob Edwards.


Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
On Παρ 15 Ιουν 2012 09:34:16 Richard Huxton wrote:
> On 14/06/12 09:39, Achilleas Mantzios wrote:
> > dynacom=# SELECT id from items_tmp WHERE id=1261319 AND xid=61972;
> >
> >     id
> >
> > ---------
> >
> >   1261319
> >
> > (1 row)
> > dynacom=# -- ok this is how it should be
> > dynacom=# SELECT id from items_tmp WHERE id=1261319 AND
> > xid=currval('xadmin_xid_seq');
> >
> >   id
> >
> > ----
> > (0 rows)
> > dynacom=# -- THIS IS INSANE
>
> Perhaps just do an EXPLAIN ANALYSE on both of those. If for some reason
> one is using the index and the other isn't then it could be down to a
> corrupted index. Seems unlikely though.

Hello Richard,
I had the same thought, and did the EPXLAIN ANALYZE and it gave results which looked pretty much
like the below (unfortunately i didn't keep the original exact output, cause i was in a hurry to solve the problem):

dynacom=# EXPLAIN ANALYZE SELECT id from items_tmp where id=1261319 AND xid=62035;
                                                           QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------Index
Scanusing it_tmp_pk on items_tmp  (cost=0.00..8.28 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)  Index
Cond:((id = 1261319) AND (xid = 62035))Total runtime: 0.042 ms 
(3 rows)

dynacom=#
dynacom=# EXPLAIN ANALYZE SELECT id from items_tmp where id=1261319 AND xid=currval('xadmin_xid_seq');
                                 QUERY PLAN                                                      

--------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on items_tmp  (cost=4.53..120.32 rows=1 width=4) (actual time=58.212..58.212 rows=1 loops=1)  Recheck Cond:
(id= 1261319)  Filter: (xid = currval('xadmin_xid_seq'::regclass))  ->  Bitmap Index Scan on it_tmp_pk
(cost=0.00..4.53rows=37 width=0) (actual time=0.021..0.021 rows=39 loops=1)        Index Cond: (id = 1261319)Total
runtime:58.235 ms 
(6 rows)

dynacom=#

After that, i tried to REINDEX items_tmp, which succeeded, and also made the last select return correctly one row.
Being suspicious of the general condition of the database,I then tried to REINDEX DATABASE the whole db, which failed
at some point because of corrupted data, but i didn't indicate which table had the corruption. I then wrote a script to

make more verbose what table was being reindexed at any time and this time i got no errors. I also re-issued the batch
REINDEX DATABASE command again with no errors. So it was indeed an index/data corruption problem.

Thanx to Richard and Adrian

-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Scott Marlowe
Date:
You do realize you're missing four years of bug fixes right?


Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
On Παρ 15 Ιουν 2012 10:28:20 Scott Marlowe wrote:
> You do realize you're missing four years of bug fixes right?

On Πεμ 14 Ιουν 2012 11:39:35 Achilleas Mantzios wrote:
> Unfortunately the remote installations are neither physically accessible
> nor by TCP/IP accesible (comms are done via UUCP and administration via
> minicom, and the costs are just huge 5 USD/min for 33Kbits/sec). So, i
> would exhaust all posibilities before deciding to ship a new postgresql
> version there, and remotely upgrade, physically travel to the ship or even
> trying to do a backup/initdb/restore in the existing version. Any help
> would be really really appreciated.
>
> Also, as you might have understood, upgrading, although generally a good
> idea, does not apply so easily in our case.

And i forgot to mention, minicom term emulation quality sucks, even giving simple
shell commands is a PITA, upgrading the whole fleet would mean bast case scenario
minimum 21K USD for the whole fleet + suspension of all other activities for two months.
If physical travel was involved, the cost would be increased at even higher levels.


-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Samuel Gendler
Date:


On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

And i forgot to mention, minicom term emulation quality sucks, even giving simple
shell commands is a PITA, upgrading the whole fleet would mean bast case scenario
minimum 21K USD for the whole fleet + suspension of all other activities for two months.
If physical travel was involved, the cost would be increased at even higher levels.

And what is the cost of data corruption on large numbers of systems?  And how much to fix that, especially if multiple systems fail at the same time?  Some things aren't free. $21K in exchange for NOT having had to keep systems up to date for 4 years seems like a decent trade.  


Re: Insane behaviour in 8.3.3

From
Samuel Gendler
Date:


On Fri, Jun 15, 2012 at 2:28 AM, Samuel Gendler <sgendler@ideasculptor.com> wrote:


On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

And i forgot to mention, minicom term emulation quality sucks, even giving simple
shell commands is a PITA, upgrading the whole fleet would mean bast case scenario
minimum 21K USD for the whole fleet + suspension of all other activities for two months.
If physical travel was involved, the cost would be increased at even higher levels.

And what is the cost of data corruption on large numbers of systems?  And how much to fix that, especially if multiple systems fail at the same time?  Some things aren't free. $21K in exchange for NOT having had to keep systems up to date for 4 years seems like a decent trade.  


Just call up an oracle sales rep and get a price quote for a single baseline system.  Put that next to the postgresql upgrade cost for your whole fleet.
 

Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
On Παρ 15 Ιουν 2012 12:28:21 Samuel Gendler wrote:
> On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <
>
> achill@matrix.gatewaynet.com> wrote:
> > And i forgot to mention, minicom term emulation quality sucks, even
> > giving simple
> > shell commands is a PITA, upgrading the whole fleet would mean bast case
> > scenario
> > minimum 21K USD for the whole fleet + suspension of all other activities
> > for two months.
> > If physical travel was involved, the cost would be increased at even
> > higher levels.
>
> And what is the cost of data corruption on large numbers of systems?  And
> how much to fix that, especially if multiple systems fail at the same time?
>  Some things aren't free. $21K in exchange for NOT having had to keep
> systems up to date for 4 years seems like a decent trade.

After 12 years of using PostgreSQL in this environment, I can assure you that things are not so scary.
We have multiple plans of action in case a slave installation totally gets damaged,
plus PostgreSQL has been a real beast of reliability.
Also here you neglet the cost of actual migration, test plans, which adds to the whole picture.
Moving from 7.4 to 8.3 back in 2008, was not easy at all (tsearch, intarray, loads of black magic, etc..).
You could no just send a tech guy to the ship to perform the migration, he would have to know
what he is actually doing and why when executing the 100 lines script line by line.
Some time in the future, we will commence another round of migration,
(at any point in time we need to support all current working versions)
but we will have to feel a substantial reason to do so.

-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
On Παρ 15 Ιουν 2012 12:29:38 Samuel Gendler wrote:
> On Fri, Jun 15, 2012 at 2:28 AM, Samuel Gendler
>
> <sgendler@ideasculptor.com>wrote:
> > On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <
> >
> > achill@matrix.gatewaynet.com> wrote:
> >> And i forgot to mention, minicom term emulation quality sucks, even
> >> giving simple
> >> shell commands is a PITA, upgrading the whole fleet would mean bast case
> >> scenario
> >> minimum 21K USD for the whole fleet + suspension of all other activities
> >> for two months.
> >> If physical travel was involved, the cost would be increased at even
> >> higher levels.
> >
> > And what is the cost of data corruption on large numbers of systems?  And
> > how much to fix that, especially if multiple systems fail at the same
> > time?
> >
> >  Some things aren't free. $21K in exchange for NOT having had to keep
> >
> > systems up to date for 4 years seems like a decent trade.
>
> Just call up an oracle sales rep and get a price quote for a single
> baseline system.  Put that next to the postgresql upgrade cost for your
> whole fleet.

:) I know, I have used this argument sometimes successfully, sometimes not.
The problem with oracle is not the price. The problem is that it just cannot do
what postgresql does, or what we have managed to do ourselves with postgresql.
Our replication system is unique, built in-house, and no commercial alternative existed,
exists, or (most probably) will exist.

-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Scott Marlowe
Date:
Not talking about going to something after 8.3.19, just updating to
the latest 8.3 version.  On most systems it's a simple:

sudo apt-get upgrade

or similar and sit back and watch.

On Fri, Jun 15, 2012 at 2:24 AM, Achilleas Mantzios
<achill@matrix.gatewaynet.com> wrote:
> On Παρ 15 Ιουν 2012 10:28:20 Scott Marlowe wrote:
>> You do realize you're missing four years of bug fixes right?
>
> On Πεμ 14 Ιουν 2012 11:39:35 Achilleas Mantzios wrote:
>> Unfortunately the remote installations are neither physically accessible
>> nor by TCP/IP accesible (comms are done via UUCP and administration via
>> minicom, and the costs are just huge 5 USD/min for 33Kbits/sec). So, i
>> would exhaust all posibilities before deciding to ship a new postgresql
>> version there, and remotely upgrade, physically travel to the ship or even
>> trying to do a backup/initdb/restore in the existing version. Any help
>> would be really really appreciated.
>>
>> Also, as you might have understood, upgrading, although generally a good
>> idea, does not apply so easily in our case.
>
> And i forgot to mention, minicom term emulation quality sucks, even giving simple
> shell commands is a PITA, upgrading the whole fleet would mean bast case scenario
> minimum 21K USD for the whole fleet + suspension of all other activities for two months.
> If physical travel was involved, the cost would be increased at even higher levels.
>
>
> -
> Achilleas Mantzios
> IT DEPT



--
To understand recursion, one must first understand recursion.


Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
On Παρ 15 Ιουν 2012 18:03:26 Scott Marlowe wrote:
> Not talking about going to something after 8.3.19, just updating to
> the latest 8.3 version.  On most systems it's a simple:
>
> sudo apt-get upgrade
>
> or similar and sit back and watch.

Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the ships, and AFAIK apt-get does not yet work
over advanced UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
just joking :)

>
> On Fri, Jun 15, 2012 at 2:24 AM, Achilleas Mantzios
>
> <achill@matrix.gatewaynet.com> wrote:
> > On Παρ 15 Ιουν 2012 10:28:20 Scott Marlowe wrote:
> >> You do realize you're missing four years of bug fixes right?
> >
> > On Πεμ 14 Ιουν 2012 11:39:35 Achilleas Mantzios wrote:
> >> Unfortunately the remote installations are neither physically accessible
> >> nor by TCP/IP accesible (comms are done via UUCP and administration via
> >> minicom, and the costs are just huge 5 USD/min for 33Kbits/sec). So, i
> >> would exhaust all posibilities before deciding to ship a new postgresql
> >> version there, and remotely upgrade, physically travel to the ship or
> >> even trying to do a backup/initdb/restore in the existing version. Any
> >> help would be really really appreciated.
> >>
> >> Also, as you might have understood, upgrading, although generally a good
> >> idea, does not apply so easily in our case.
> >
> > And i forgot to mention, minicom term emulation quality sucks, even
> > giving simple shell commands is a PITA, upgrading the whole fleet would
> > mean bast case scenario minimum 21K USD for the whole fleet + suspension
> > of all other activities for two months. If physical travel was involved,
> > the cost would be increased at even higher levels.
> >
> >
> > -
> > Achilleas Mantzios
> > IT DEPT

-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Richard Huxton
Date:
On 15/06/12 16:32, Achilleas Mantzios wrote:
> On Παρ 15 Ιουν 2012 18:03:26 Scott Marlowe wrote:
>> Not talking about going to something after 8.3.19, just updating to
>> the latest 8.3 version.  On most systems it's a simple:
>>
>> sudo apt-get upgrade
>>
>> or similar and sit back and watch.
>
> Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the ships, and AFAIK apt-get does not yet work
> over advanced UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
> just joking :)

Can you run rsync over a serial connection? Never tried, but if you had
something that took the same options as ssh I daresay you could get it
working.


--   Richard Huxton  Archonet Ltd


Re: Insane behaviour in 8.3.3

From
Scott Marlowe
Date:
Well, I'd see about finding a way to upgrade to 8.3.19.  8.3.3 has
know data eating bugs.

On Fri, Jun 15, 2012 at 9:32 AM, Achilleas Mantzios
<achill@matrix.gatewaynet.com> wrote:
> On Παρ 15 Ιουν 2012 18:03:26 Scott Marlowe wrote:
>> Not talking about going to something after 8.3.19, just updating to
>> the latest 8.3 version.  On most systems it's a simple:
>>
>> sudo apt-get upgrade
>>
>> or similar and sit back and watch.
>
> Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the ships, and AFAIK apt-get does not yet work
> over advanced UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
> just joking :)
>
>>
>> On Fri, Jun 15, 2012 at 2:24 AM, Achilleas Mantzios
>>
>> <achill@matrix.gatewaynet.com> wrote:
>> > On Παρ 15 Ιουν 2012 10:28:20 Scott Marlowe wrote:
>> >> You do realize you're missing four years of bug fixes right?
>> >
>> > On Πεμ 14 Ιουν 2012 11:39:35 Achilleas Mantzios wrote:
>> >> Unfortunately the remote installations are neither physically accessible
>> >> nor by TCP/IP accesible (comms are done via UUCP and administration via
>> >> minicom, and the costs are just huge 5 USD/min for 33Kbits/sec). So, i
>> >> would exhaust all posibilities before deciding to ship a new postgresql
>> >> version there, and remotely upgrade, physically travel to the ship or
>> >> even trying to do a backup/initdb/restore in the existing version. Any
>> >> help would be really really appreciated.
>> >>
>> >> Also, as you might have understood, upgrading, although generally a good
>> >> idea, does not apply so easily in our case.
>> >
>> > And i forgot to mention, minicom term emulation quality sucks, even
>> > giving simple shell commands is a PITA, upgrading the whole fleet would
>> > mean bast case scenario minimum 21K USD for the whole fleet + suspension
>> > of all other activities for two months. If physical travel was involved,
>> > the cost would be increased at even higher levels.
>> >
>> >
>> > -
>> > Achilleas Mantzios
>> > IT DEPT
>
> -
> Achilleas Mantzios
> IT DEPT
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql



--
To understand recursion, one must first understand recursion.


Re: Insane behaviour in 8.3.3

From
"Raj Mathur (राज माथुर)"
Date:
On Friday 15 Jun 2012, Samuel Gendler wrote:
> On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <
> 
> achill@matrix.gatewaynet.com> wrote:
> > And i forgot to mention, minicom term emulation quality sucks, even
> > giving simple
> > shell commands is a PITA, upgrading the whole fleet would mean bast
> > case scenario
> > minimum 21K USD for the whole fleet + suspension of all other
> > activities for two months.
> > If physical travel was involved, the cost would be increased at
> > even higher levels.
> 
> And what is the cost of data corruption on large numbers of systems? 
> And how much to fix that, especially if multiple systems fail at the
> same time? Some things aren't free. $21K in exchange for NOT having
> had to keep systems up to date for 4 years seems like a decent
> trade.

While I agree in principle with what you're saying, this specific 
comparison would be better stated as "What is the cost of data 
corruption multiplied by the risk of that corruption occurring?"

The cost of upgrading is known and unavoidable.  The cost of data 
corruption, while probably higher (unless Achilles has an effective 
backup/restore system), needs to be factored by its probability of 
occurance.

Of course, neither you nor I are in Achilles' shoes, so trying to figure 
out where they pinch is academic at best.

Regards,

-- Raj
-- 
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F


Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
Thanx, well said, and especially after this incident we should very well consider
seriously an upgrade.


On Παρ 15 Ιουν 2012 19:59:05 Scott Marlowe wrote:
> Well, I'd see about finding a way to upgrade to 8.3.19.  8.3.3 has
> know data eating bugs.
>
> On Fri, Jun 15, 2012 at 9:32 AM, Achilleas Mantzios
>
> <achill@matrix.gatewaynet.com> wrote:
> > On Παρ 15 Ιουν 2012 18:03:26 Scott Marlowe wrote:
> >> Not talking about going to something after 8.3.19, just updating to
> >> the latest 8.3 version.  On most systems it's a simple:
> >>
> >> sudo apt-get upgrade
> >>
> >> or similar and sit back and watch.
> >
> > Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the
> > ships, and AFAIK apt-get does not yet work over advanced
> > UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
> > just joking :)
> >
> >> On Fri, Jun 15, 2012 at 2:24 AM, Achilleas Mantzios
> >>
> >> <achill@matrix.gatewaynet.com> wrote:
> >> > On Παρ 15 Ιουν 2012 10:28:20 Scott Marlowe wrote:
> >> >> You do realize you're missing four years of bug fixes right?
> >> >
> >> > On Πεμ 14 Ιουν 2012 11:39:35 Achilleas Mantzios wrote:
> >> >> Unfortunately the remote installations are neither physically
> >> >> accessible nor by TCP/IP accesible (comms are done via UUCP and
> >> >> administration via minicom, and the costs are just huge 5 USD/min
> >> >> for 33Kbits/sec). So, i would exhaust all posibilities before
> >> >> deciding to ship a new postgresql version there, and remotely
> >> >> upgrade, physically travel to the ship or even trying to do a
> >> >> backup/initdb/restore in the existing version. Any help would be
> >> >> really really appreciated.
> >> >>
> >> >> Also, as you might have understood, upgrading, although generally a
> >> >> good idea, does not apply so easily in our case.
> >> >
> >> > And i forgot to mention, minicom term emulation quality sucks, even
> >> > giving simple shell commands is a PITA, upgrading the whole fleet
> >> > would mean bast case scenario minimum 21K USD for the whole fleet +
> >> > suspension of all other activities for two months. If physical travel
> >> > was involved, the cost would be increased at even higher levels.
> >> >
> >> >
> >> > -
> >> > Achilleas Mantzios
> >> > IT DEPT
> >
> > -
> > Achilleas Mantzios
> > IT DEPT
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql

-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
On Σαβ 16 Ιουν 2012 03:22:16 you wrote:
> On Fri, Jun 15, 2012 at 3:06 AM, Achilleas Mantzios <
>
> achill@matrix.gatewaynet.com> wrote:
> > On Παρ 15 Ιουν 2012 12:29:38 Samuel Gendler wrote:
> > > On Fri, Jun 15, 2012 at 2:28 AM, Samuel Gendler
> > >
> > > <sgendler@ideasculptor.com>wrote:
> > > > On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <
> > > >
> > > > achill@matrix.gatewaynet.com> wrote:
> > > >> And i forgot to mention, minicom term emulation quality sucks, even
> > > >> giving simple
> > > >> shell commands is a PITA, upgrading the whole fleet would mean bast
> >
> > case
> >
> > > >> scenario
> > > >> minimum 21K USD for the whole fleet + suspension of all other
> >
> > activities
> >
> > > >> for two months.
> > > >> If physical travel was involved, the cost would be increased at even
> > > >> higher levels.
> > > >
> > > > And what is the cost of data corruption on large numbers of systems?
> >
> >  And
> >
> > > > how much to fix that, especially if multiple systems fail at the same
> > > > time?
> > > >
> > > >  Some things aren't free. $21K in exchange for NOT having had to keep
> > > >
> > > > systems up to date for 4 years seems like a decent trade.
> > >
> > > Just call up an oracle sales rep and get a price quote for a single
> > > baseline system.  Put that next to the postgresql upgrade cost for your
> > > whole fleet.
> > :
> > :) I know, I have used this argument sometimes successfully, sometimes
> > :not.
> >
> > The problem with oracle is not the price. The problem is that it just
> > cannot do
> > what postgresql does, or what we have managed to do ourselves with
> > postgresql.
> > Our replication system is unique, built in-house, and no commercial
> > alternative existed,
> > exists, or (most probably) will exist.
>
> Just to be clear, I wasn't, in any way, sugggesting you actually use
> Oracle.  I was merely suggesting that if someone is up in arms about
> upgrading multiple systems for a cost of $21K, you might show them the
> price of a single oracle license as a point of comparison.
>

It's ok, you were pretty clear, I was just trying to enhance/enrich our position with more arguments ;)

> > -
> > Achilleas Mantzios
> > IT DEPT
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql

-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Karsten Hilbert
Date:
On Mon, Jun 18, 2012 at 11:36:14AM +0300, Achilleas Mantzios wrote:

> > >> Not talking about going to something after 8.3.19, just updating to
> > >> the latest 8.3 version.  On most systems it's a simple:
> > >> 
> > >> sudo apt-get upgrade
> > >> 
> > >> or similar and sit back and watch.
> > > 
> > > Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the
> > > ships, and AFAIK apt-get does not yet work over advanced
> > > UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
> > > just joking :)

You might consider shipping .debs over the dialup and either
"dpkg -i"ing them or even setup a local repository on the
ships from which to "apt-get upgrade".

Karsten
-- 
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Insane behaviour in 8.3.3

From
Jasen Betts
Date:
On 2012-06-15, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
> On Παρ 15 Ιουν 2012 18:03:26 Scott Marlowe wrote:
>> Not talking about going to something after 8.3.19, just updating to
>> the latest 8.3 version.  On most systems it's a simple:
>> 
>> sudo apt-get upgrade
>> 
>> or similar and sit back and watch.
>
> Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the ships, and AFAIK apt-get does not yet work 
> over advanced UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
> just joking :)

If you were serious I'd suggest writing a backend for it :)

minicom terminal emulations sucks, use ckermit instead it does no
terminal emulation at all,  less is not more more, but in this case none is all.

the worst bit about upgrades is when you hit some newly invalid characters
in a UTF8 text column.


-- 
⚂⚃ 100% natural



Re: Insane behaviour in 8.3.3

From
Jasen Betts
Date:
On 2012-06-15, Richard Huxton <dev@archonet.com> wrote:
> On 15/06/12 16:32, Achilleas Mantzios wrote:
>> On Παρ 15 Ιουν 2012 18:03:26 Scott Marlowe wrote:
>>> Not talking about going to something after 8.3.19, just updating to
>>> the latest 8.3 version.  On most systems it's a simple:
>>>
>>> sudo apt-get upgrade
>>>
>>> or similar and sit back and watch.
>>
>> Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the ships, and AFAIK apt-get does not yet work
>> over advanced UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
>> just joking :)
>
> Can you run rsync over a serial connection? Never tried, but if you had 
> something that took the same options as ssh I daresay you could get it 
> working.

probably possible. it may be possible to create a wrapper that makes
its user interface behave like lszrz, then it can be launched over an
existing serial connection 

but for uploading packages lszrz should be enough, 

-- 
⚂⚃ 100% natural



Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
We had another corruption incident on the very same machine, this time in the jboss subsystem (a "jar cvf" produced
corrupted.jar). 
IMHO this means faulty RAM/disk.
If that is true, then i guess HW sanity checks are even more important than SW upgrades.

On Δευ 18 Ιουν 2012 11:36:14 you wrote:
> Thanx, well said, and especially after this incident we should very well
> consider seriously an upgrade.
>
> On Παρ 15 Ιουν 2012 19:59:05 Scott Marlowe wrote:
> > Well, I'd see about finding a way to upgrade to 8.3.19.  8.3.3 has
> > know data eating bugs.
> >
> > On Fri, Jun 15, 2012 at 9:32 AM, Achilleas Mantzios
> >
> > <achill@matrix.gatewaynet.com> wrote:
> > > On Παρ 15 Ιουν 2012 18:03:26 Scott Marlowe wrote:
> > >> Not talking about going to something after 8.3.19, just updating to
> > >> the latest 8.3 version.  On most systems it's a simple:
> > >>
> > >> sudo apt-get upgrade
> > >>
> > >> or similar and sit back and watch.
> > >
> > > Thanx, unfortunately we dont have TCP/IP connectivity to (most of) the
> > > ships, and AFAIK apt-get does not yet work over advanced
> > > UUCP/minicom/kermit or other equivalent hich-tech dial up connection.
> > > just joking :)
> > >
> > >> On Fri, Jun 15, 2012 at 2:24 AM, Achilleas Mantzios
> > >>
> > >> <achill@matrix.gatewaynet.com> wrote:
> > >> > On Παρ 15 Ιουν 2012 10:28:20 Scott Marlowe wrote:
> > >> >> You do realize you're missing four years of bug fixes right?
> > >> >
> > >> > On Πεμ 14 Ιουν 2012 11:39:35 Achilleas Mantzios wrote:
> > >> >> Unfortunately the remote installations are neither physically
> > >> >> accessible nor by TCP/IP accesible (comms are done via UUCP and
> > >> >> administration via minicom, and the costs are just huge 5 USD/min
> > >> >> for 33Kbits/sec). So, i would exhaust all posibilities before
> > >> >> deciding to ship a new postgresql version there, and remotely
> > >> >> upgrade, physically travel to the ship or even trying to do a
> > >> >> backup/initdb/restore in the existing version. Any help would be
> > >> >> really really appreciated.
> > >> >>
> > >> >> Also, as you might have understood, upgrading, although generally a
> > >> >> good idea, does not apply so easily in our case.
> > >> >
> > >> > And i forgot to mention, minicom term emulation quality sucks, even
> > >> > giving simple shell commands is a PITA, upgrading the whole fleet
> > >> > would mean bast case scenario minimum 21K USD for the whole fleet +
> > >> > suspension of all other activities for two months. If physical
> > >> > travel was involved, the cost would be increased at even higher
> > >> > levels.
> > >> >
> > >> >
> > >> > -
> > >> > Achilleas Mantzios
> > >> > IT DEPT
> > >
> > > -
> > > Achilleas Mantzios
> > > IT DEPT
> > >
> > > --
> > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > > To make changes to your subscription:
> > > http://www.postgresql.org/mailpref/pgsql-sql
>
> -
> Achilleas Mantzios
> IT DEPT

-
Achilleas Mantzios
IT DEPT


Re: Insane behaviour in 8.3.3

From
Craig Ringer
Date:
On 06/19/2012 05:17 PM, Achilleas Mantzios wrote:
> We had another corruption incident on the very same machine, this time in the jboss subsystem (a "jar cvf" produced
corrupted.jar).
 
> IMHO this means faulty RAM/disk.
> If that is true, then i guess HW sanity checks are even more important than SW upgrades.

... and a lot more difficult :S

Log monitoring is often the most imporant part - monitoring for NMIs and 
other hardware notifications, checking the kernel log for odd issues or 
reports of unexpected segfaults from userspace programs, etc.

--
Craig Ringer


Re: Insane behaviour in 8.3.3

From
Achilleas Mantzios
Date:
On Τετ 20 Ιουν 2012 07:08:09 Craig Ringer wrote:
> On 06/19/2012 05:17 PM, Achilleas Mantzios wrote:
> > We had another corruption incident on the very same machine, this time in
> > the jboss subsystem (a "jar cvf" produced corrupted .jar). IMHO this
> > means faulty RAM/disk.
> > If that is true, then i guess HW sanity checks are even more important
> > than SW upgrades.
>
> ... and a lot more difficult :S
>
> Log monitoring is often the most imporant part - monitoring for NMIs and
> other hardware notifications, checking the kernel log for odd issues or
> reports of unexpected segfaults from userspace programs, etc.
>

That's right, we have written a whole framework for this, but there are always cases
which escape our attention.

> --
> Craig Ringer

-
Achilleas Mantzios
IT DEPT