Thread: Partial key usage
PostGreSQL 7.4.3
I have a table that 'translates' codes between two types. The structure is:
recno | integer | not null default nextval('public.kah_kahxlate_recno_seq'::text)
kahcode | character(25) | not null
othercodetype | character varying(40) | not null
othercode | character varying(40) | not null
othercoden | numeric(20,0) |
Indexes:
"kah_kahxlate_cpk" primary key, btree (recno)
"ka_kahxlate_2" btree (othercodetype, othercode)
"kah_kahxlate_1" btree (kahcode, othercodetype)
What can happen is that the 'othercode' can be partial - so can be accessed with LIKE - ie
SELECT kahCode FROM kah_kahXlate
WHERE otherCodeType = 'FRED'
AND otherCode LIKE 'ABC%';
This appears to use a sequential access according to the explain plan.
In Informix, to get around this the LIKE could be changed to address a substring of the 'otherCode' - ie :
....
AND otherCode[1,3] = 'ABC'
This would then use the index.
I have tried making the PostGreSQL code:
AND SUBSTRING(otherCode FROM 1 FOR 3) = 'ABC'
But this still comes up with a sequential scan. Is there a way to force an indexed read?
(I did prove that it was using a sequential scan by dropping the index and retrying the query - same time - the explain plan is accurate)
I have a table that 'translates' codes between two types. The structure is:
recno | integer | not null default nextval('public.kah_kahxlate_recno_seq'::text)
kahcode | character(25) | not null
othercodetype | character varying(40) | not null
othercode | character varying(40) | not null
othercoden | numeric(20,0) |
Indexes:
"kah_kahxlate_cpk" primary key, btree (recno)
"ka_kahxlate_2" btree (othercodetype, othercode)
"kah_kahxlate_1" btree (kahcode, othercodetype)
What can happen is that the 'othercode' can be partial - so can be accessed with LIKE - ie
SELECT kahCode FROM kah_kahXlate
WHERE otherCodeType = 'FRED'
AND otherCode LIKE 'ABC%';
This appears to use a sequential access according to the explain plan.
In Informix, to get around this the LIKE could be changed to address a substring of the 'otherCode' - ie :
....
AND otherCode[1,3] = 'ABC'
This would then use the index.
I have tried making the PostGreSQL code:
AND SUBSTRING(otherCode FROM 1 FOR 3) = 'ABC'
But this still comes up with a sequential scan. Is there a way to force an indexed read?
(I did prove that it was using a sequential scan by dropping the index and retrying the query - same time - the explain plan is accurate)
Regards, Steve Tucknott ReTSol Ltd DDI: 01903 828769 |
Steve Tucknott <steve@retsol.co.uk> writes: > recno | integer | not null default > nextval('public.kah_kahxlate_recno_seq'::text) > kahcode | character(25) | not null > othercodetype | character varying(40) | not null > othercode | character varying(40) | not null > othercoden | numeric(20,0) | > Indexes: > "kah_kahxlate_cpk" primary key, btree (recno) > "ka_kahxlate_2" btree (othercodetype, othercode) > "kah_kahxlate_1" btree (kahcode, othercodetype) > What can happen is that the 'othercode' can be partial - so can be > accessed with LIKE - ie > SELECT kahCode FROM kah_kahXlate > WHERE otherCodeType = 'FRED' > AND otherCode LIKE 'ABC%'; This should be able to use an index on (othercodetype, othercode). If it's not, I would speculate that your database collation is not C (check "SHOW LC_COLLATE"). Non-C locales usually sort in an order that isn't compatible with pattern matching. You can either re-initdb in C locale, or make a specialized index using LIKE-compatible comparison operators. See the docs about specialized index operator classes. regards, tom lane
Dear group, I am a biologist dealing with tons of gene chip data. I made a schema for storing the data using postgreSQL. To make my question short, I have a (say) 3 tables. My data is in tab delimted text. I have to parse the data and direct the contents of each colums to columns in each table. How can I do this using Python? Any ideas please. Thank you PS __________________________________ Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! http://promotions.yahoo.com/new_mail
Hi > To make my question short, I have a (say) 3 tables. My data is in tab > delimted text. I have to parse the data and direct the contents of > each colums to columns in each table. > > How can I do this using Python? Any ideas please. You don't even have to use Python. The COPY command should be enough: http://www.postgresql.org/docs/7.4/interactive/sql-copy.html Arthur
I'm not a python user, but a quick google search for python postgresql database turned up: http://www.pygresql.org/ Might check that out. Sean ----- Original Message ----- From: "Kumar S" <ps_postgres@yahoo.com> To: "PostGreSQL" <pgsql-novice@postgresql.org> Sent: Monday, August 23, 2004 7:37 AM Subject: Re: [NOVICE] Populating tables with data > Dear group, > I am a biologist dealing with tons of gene chip data. > > I made a schema for storing the data using postgreSQL. > > To make my question short, I have a (say) 3 tables. > My data is in tab delimted text. I have to parse the > data and direct the contents of each colums to columns > in each table. > > How can I do this using Python? Any ideas please. > > Thank you > > PS > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail is new and improved - Check it out! > http://promotions.yahoo.com/new_mail > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
Thank you Arthur for your e-mail. I did not face the realistic problem until I tried to populate the tables. In fact I am developing this database to store microarray data(gene chip data). the whole data is distributed in 3 kinds of TAB delimtted text. Each file will have 45K rows and the first approx. 40 rows is normal text data without tab delimitation. Then onwards the text is in 4 different columns. The problem with copy was : 1. Columns in Tables did not match the number of columns in the flat file. 2. Each column in Flat file has different location in different tables (although this is not a problem due to difference in number of columns it raises and error). 3. The foreign keys in tables are defined as SERIAL type and COPY does not happen becuase of duplication of keys. I cannot do 45 K insert statements for the contents of files. I have two questions (stumbling blocks) in order to proceed further and I am stuck: My collegue suggested me to change the schema to a 3-order normalized schema because the existing schema is a 1-order normalized (I hope I am conveying the correct terminology). That means, I define entities exactly like the flat files and define more Relationship tables instead of foreign keys. This also makes indexing and querying more easy. 1. Is there a solution for pouplating data instead of COPY tool without changing schema. 2. If I change schema will populating data tables will be made easy. Could any one please help me. Thank you. Kumar --- Arthur van Dorp <arthur_vd@gmx.net> wrote > Hi > > > To make my question short, I have a (say) 3 > tables. My data is in tab > > delimted text. I have to parse the data and direct > the contents of > > each colums to columns in each table. > > > > How can I do this using Python? Any ideas please. > > You don't even have to use Python. The COPY command > should be enough: > http://www.postgresql.org/docs/7.4/interactive/sql-copy.html > > Arthur > __________________________________ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail