Thread: weird duplicate data problem

weird duplicate data problem

From
Carolyn Lu Wong
Date:
a table was created with:

create table bills (id        serial    not null,account_no     int4    not null,date_issued    date    not null,.....
);

The following SQL script returns only 1 row of data:
select oid, * from bills where id = xxxxx' order by oid;



However with the following SQL:
select oid, * from bills where date_issued = '01/01/2001' order by oid;

some of the data is duplicated, including the OID.

When trying to 'vaccum' the database, get

NOTICE:  Index bills_id_key: number of index tupples (4755) is not the
same as heap (7800).

What is happening? What could have caused this?

Database version used is V6.50. I know it's old, but i hope it's not the
version that's causing this problem.

Thanks in advance for any suggestions.


Re: weird duplicate data problem

From
Jeff Self
Date:
On Fri, 14 Dec 2001, Carolyn Lu Wong wrote:

> a table was created with:
>
> create table bills (
>     id        serial    not null,
>     account_no     int4    not null,
>     date_issued    date    not null,
>     .....
> );
There is no reason to set id to not null. It is a serial field. It will
automatically increase. Your account_no should probably be set to a char
field. My old rule of thumb is not to make something int unless you plan
to perform some type of math on it.

>
> The following SQL script returns only 1 row of data:
>
>     select oid, * from bills where id = xxxxx' order by oid;
>
If your id field is serial, all the records will have unique id's in them.
Therefore you are only going to get one row of data.

>
>
> However with the following SQL:
>
>     select oid, * from bills where date_issued = '01/01/2001'
>     order by oid;
>

Can't help you here. I'm running 7.1.3 and the query works fine.

> some of the data is duplicated, including the OID.
>
> When trying to 'vaccum' the database, get
>
> NOTICE:  Index bills_id_key: number of index tupples (4755) is not the
> same as heap (7800).
>
> What is happening? What could have caused this?
>
> Database version used is V6.50. I know it's old, but i hope it's not the
> version that's causing this problem.
>
> Thanks in advance for any suggestions.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

-- 
Jeff Self
Powered by Debian GNU/Linux
http://members.home.net/jocknerd



Re: weird duplicate data problem

From
"Josh Berkus"
Date:
Carolyn, Jeff,

> > a table was created with:
> >
> > create table bills (
> >  id  serial not null,
> >  account_no  int4 not null,
> >  date_issued date not null,
> >  .....
> > );

> There is no reason to set id to not null. It is a serial field. It
> will
> automatically increase. Your account_no should probably be set to a
> char
> field. My old rule of thumb is not to make something int unless you
> plan
> to perform some type of math on it.

I have to disagree with this, Jeff.  A six-digit integer takes up only
20% of the storage space of a 6-character string.  As such, it will be
faster to index and faster to retrieve, and faster to perform JOINs and
sorts.  So if Carolyn will never want to store anything but numbers in
the field, integer is ideal.

> > The following SQL script returns only 1 row of data:
> >
> >  select oid, * from bills where id = xxxxx' order by oid;
> >
> If your id field is serial, all the records will have unique id's in
> them.
> Therefore you are only going to get one row of data.

Yup.  Why, BTW, is Carloyn interested in the OID?  It is not a useful
piece of information for query purposes ... the OID is a tool for the
Postgres DB engine and should not be used as data, since it may change
unexpectedly.

> > However with the following SQL:
> >
> >  select oid, * from bills where date_issued = '01/01/2001'
> >  order by oid;
> >
> 
> Can't help you here. I'm running 7.1.3 and the query works fine.

Try date_issued = '2001-01-01'.  Some platforms (Unix for one) get fussy
about using universal date format.

> > some of the data is duplicated, including the OID.
> >
> > When trying to 'vaccum' the database, get
> >
> > NOTICE:  Index bills_id_key: number of index tupples (4755) is not
> the
> > same as heap (7800).
> >
> > What is happening? What could have caused this?
> >
> > Database version used is V6.50. I know it's old, but i hope it's
> not the
> > version that's causing this problem.

Sorry, Carolyn, can't help you on Vacuum problems unless you upgrade.
6.5 does indeed have several major problems with Vacuum; time to get
7.1.3.  The core team ain't releasing new versions for the fun alone,
y'know!

-Josh

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: weird duplicate data problem

From
Jan Wieck
Date:
Josh Berkus wrote:
> Carolyn, Jeff,
>
> I have to disagree with this, Jeff.  A six-digit integer takes up only
> 20% of the storage space of a 6-character string.  As such, it will be
> faster to index and faster to retrieve, and faster to perform JOINs and
> sorts.  So if Carolyn will never want to store anything but numbers in
> the field, integer is ideal.
   That  is  all  true,  and  still  you should not restrict the   possible content to something that got nothing to do
withthe   nature of the data.
 
   Things  like  account  numbers,  zip  codes,  phone  numbers,   whatnot ARE  CHARACTER  SEQUENCES  -  period.  How
sure can   Carolyn  be  that there will never be an account number where   leading zeroes are significant?  That true
already for  zip   codes (and by the way, european zipcodes are alphanumeric).
 
   And  what  is the cosmic significance of my phone number plus   your creditcard number divided by Carolyn's
shoesize?:-)
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: weird duplicate data problem

From
"Henshall, Stuart - WCP"
Date:

-----Original Message-----
From: Carolyn Lu Wong [mailto:carolyn@greatpacific.com.au]
Sent: 14 December 2001 07:29
To: pgsql-sql@postgresql.org
Subject: weird duplicate data problem


a table was created with:

create table bills (id        serial    not null,account_no     int4    not null,date_issued    date    not null,.....
);

The following SQL script returns only 1 row of data:
select oid, * from bills where id = xxxxx' order by oid;

--
Seems reasonable
--


However with the following SQL:
select oid, * from bills where date_issued = '01/01/2001' order by oid;

some of the data is duplicated, including the OID.

--
OIDs are not guaranteed unique and can wraparound (unlesss there's a unique
index). However this seems a little unlikely in the same day so the only
thing I can think of is that there is a corrupt index on date_issued. Try
dropping and recreating any indexes.However I havn't really used 6.5 so
couldn't be sure.
--

When trying to 'vaccum' the database, get

NOTICE:  Index bills_id_key: number of index tupples (4755) is not the
same as heap (7800).

What is happening? What could have caused this?

Database version used is V6.50. I know it's old, but i hope it's not the
version that's causing this problem.

Thanks in advance for any suggestions.