Home > mailing lists

Big wide datasets - Mailing list pgsql-novice

From	Michael Lush
Subject	Big wide datasets
Date	December 8, 2011 12:05:41
Msg-id	CACXX7MdoDdACfJMfhnugNoGxAhe-n5kxr716tGt6iUZ1n4ZKyQ@mail.gmail.com Whole thread Raw
Responses	Re: Big wide datasets Re: Big wide datasets Re: Big wide datasets
List	pgsql-novice

Tree view

I have dataset with ~10000 columns and ~200000 rows (GWAS data (1)) in the form

sample1, A T, A A, G C, ....
sampel2, A C, C T, A A, ....

I'd like to take subsets of both columns and rows for analysis

Two approaches spring to mind either unpack it into something like an RDF triple

ie
CREATE TABLE long_table (
                             sample_id varchar(20),
                         column_number int,
                               snp_data varchar(3));

for a table with 20 billion rows

or use the array datatype

CREATE TABLE wide_table (
                                sample_id,
                                snp_data[]);

Does anyone have any experience of this sort of thing?

(1) http://en.wikipedia.org/wiki/Genome-wide_association_study

--
Michael Lush

pgsql-novice by date:

From: Ioannis Anagnostopoulos
Date: 08 December 2011, 11:57:23
Subject: What is faster?

From: "Jean-Yves F. Barbier"
Date: 08 December 2011, 12:25:05
Subject: Re: Big wide datasets

Big wide datasets - Mailing list pgsql-novice

Previous

Next