Thread: Partial key usage

Partial key usage

From

Steve Tucknott

Date:

22 August 2004, 09:53:11

PostGreSQL 7.4.3

I have a table that 'translates' codes between two types. The structure is:
recno         | integer               | not null default nextval('public.kah_kahxlate_recno_seq'::text)
kahcode       | character(25)         | not null
othercodetype | character varying(40) | not null
othercode     | character varying(40) | not null
othercoden    | numeric(20,0)         |
Indexes:
    "kah_kahxlate_cpk" primary key, btree (recno)
    "ka_kahxlate_2" btree (othercodetype, othercode)
    "kah_kahxlate_1" btree (kahcode, othercodetype)

What can happen is that the 'othercode' can be partial - so can be accessed with LIKE - ie
SELECT kahCode FROM kah_kahXlate
WHERE otherCodeType = 'FRED'
AND     otherCode LIKE 'ABC%';

This appears to use a sequential access according to the explain plan.
In Informix, to get around this the LIKE could be changed to address a substring of the 'otherCode' - ie :
....
AND otherCode[1,3] = 'ABC'

This would then use the index.
I have tried making the PostGreSQL code:
AND SUBSTRING(otherCode FROM 1 FOR 3) = 'ABC'

But this still comes up with a sequential scan. Is there a way to force an indexed read?

(I did prove that it was using a sequential scan by dropping the index and retrying the query - same time - the explain plan is accurate)

Regards,

Steve Tucknott

ReTSol Ltd

DDI: 01903 828769

Re: Partial key usage

From

Tom Lane

Date:

22 August 2004, 14:19:26

Steve Tucknott <steve@retsol.co.uk> writes:
>  recno         | integer               | not null default
> nextval('public.kah_kahxlate_recno_seq'::text)
>  kahcode       | character(25)         | not null
>  othercodetype | character varying(40) | not null
>  othercode     | character varying(40) | not null
>  othercoden    | numeric(20,0)         |
> Indexes:
>     "kah_kahxlate_cpk" primary key, btree (recno)
>     "ka_kahxlate_2" btree (othercodetype, othercode)
>     "kah_kahxlate_1" btree (kahcode, othercodetype)

> What can happen is that the 'othercode' can be partial - so can be
> accessed with LIKE - ie
> SELECT kahCode FROM kah_kahXlate
> WHERE otherCodeType = 'FRED'
> AND     otherCode LIKE 'ABC%';

This should be able to use an index on (othercodetype, othercode).
If it's not, I would speculate that your database collation is not C
(check "SHOW LC_COLLATE").  Non-C locales usually sort in an order
that isn't compatible with pattern matching.

You can either re-initdb in C locale, or make a specialized index
using LIKE-compatible comparison operators.  See the docs about
specialized index operator classes.

            regards, tom lane

Re: Populating tables with data

From

Kumar S

Date:

23 August 2004, 08:37:18

Dear group,
 I am a biologist dealing with tons of gene chip data.

I made a schema for storing the data using postgreSQL.

To make my question short, I have a (say) 3 tables.
My data is in tab delimted text. I have to parse the
data and direct the contents of each colums to columns
in each table.

How can I do this using Python? Any ideas please.

Thank you

PS



__________________________________
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
http://promotions.yahoo.com/new_mail

Re: Populating tables with data

From

Arthur van Dorp

Date:

23 August 2004, 08:43:24

Hi

> To make my question short, I have a (say) 3 tables. My data is in tab
> delimted text. I have to parse the data and direct the contents of
> each colums to columns in each table.
>
> How can I do this using Python? Any ideas please.

You don't even have to use Python. The COPY command should be enough:
http://www.postgresql.org/docs/7.4/interactive/sql-copy.html

Arthur

Re: Populating tables with data

From

"Sean Davis"

Date:

23 August 2004, 10:15:19

I'm not a python user, but a quick google search for python postgresql
database turned up:

http://www.pygresql.org/

Might check that out.

Sean
----- Original Message -----
From: "Kumar S" <ps_postgres@yahoo.com>
To: "PostGreSQL" <pgsql-novice@postgresql.org>
Sent: Monday, August 23, 2004 7:37 AM
Subject: Re: [NOVICE] Populating tables with data


> Dear group,
>  I am a biologist dealing with tons of gene chip data.
>
> I made a schema for storing the data using postgreSQL.
>
> To make my question short, I have a (say) 3 tables.
> My data is in tab delimted text. I have to parse the
> data and direct the contents of each colums to columns
> in each table.
>
> How can I do this using Python? Any ideas please.
>
> Thank you
>
> PS
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail is new and improved - Check it out!
> http://promotions.yahoo.com/new_mail
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: Populating tables with data

From

Kumar S

Date:

15 September 2004, 23:21:53

Thank you Arthur for your e-mail.

I did not face the realistic problem until I tried to
populate the tables.

In fact I am developing this database to store
microarray data(gene chip data).

the whole data is distributed in 3 kinds of TAB
delimtted text.

Each file will have 45K rows and the first approx. 40
rows is normal text data without tab delimitation.
Then onwards the text is in 4 different columns.

The problem with copy was :
1. Columns in Tables did not match the number of
columns in the flat file.
2. Each column in Flat file has different location in
different tables (although this is not a problem due
to
difference in number of columns it raises and error).
3. The foreign keys in tables are defined as SERIAL
type and COPY does not happen becuase of duplication
of keys.

I cannot do 45 K insert statements for the contents of
files.

I have two questions (stumbling blocks) in order to
proceed further and I am stuck:

 My collegue suggested me to change the schema to a
3-order normalized schema because the existing schema
is a 1-order normalized (I hope I am conveying the
correct terminology). That means, I define entities
exactly like the flat files and define more
Relationship tables instead of foreign keys. This also
makes indexing and querying more easy.

1. Is there a solution for pouplating data instead of
COPY tool  without changing schema.
2. If I change schema will populating data tables will
be made easy.

Could any one please help me.

Thank you.

Kumar

--- Arthur van Dorp <arthur_vd@gmx.net> wrote

> Hi
>
> > To make my question short, I have a (say) 3
> tables. My data is in tab
> > delimted text. I have to parse the data and direct
> the contents of
> > each colums to columns in each table.
> >
> > How can I do this using Python? Any ideas please.
>
> You don't even have to use Python. The COPY command
> should be enough:
>
http://www.postgresql.org/docs/7.4/interactive/sql-copy.html
>
> Arthur
>




__________________________________
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail