Confronting the maximum column limitation - Mailing list pgsql-general
From | Jeff Gentry |
---|---|
Subject | Confronting the maximum column limitation |
Date | |
Msg-id | Pine.SOL.4.20.0808121606020.10620-100000@noah.dfci.harvard.edu Whole thread Raw |
Responses |
Re: Confronting the maximum column limitation
Re: Confronting the maximum column limitation Re: Confronting the maximum column limitation Re: Confronting the maximum column limitation |
List | pgsql-general |
Hi there ... I recently discovered that there is a hard cap on the # of columns, being at 1600. I also understand that it is generally unfathomable that anyone would ever feel limited by that number ... however I've managed to bump into it myself and was looking to see if anyone had advice on how to manage the situation. As a bit of background, we have a Postgres database to manage information revolving around genomic datasets, including the dataset itself. The actual data is treated in other applications as a matrix, and while it has caused the DB design to be sub-optimal the model worked to just stash the entire matrix in the DB (the rest of the DB design is proper, but the storage of these matrices straight up is unorthodox ... for the convenience of having everything in the same storage unit with all of the other information, it has been worth the extra headache and potential performance dings). In these matrices, columns represent biological samples, rows represent fragments of the genome and the cells are populated with values. There are a variety of row configurations (depending on what chip the samples were handled on) which range in number from a few thousand to a few hundred thousand (currently, it is constantly expanding upwards). The real problem lies with the columns (biological samples) in that it is rarely the case that we'll have multiple matrices with overlap in columns - and even in the cases where that happens, it is almost never a good idea to treat them as the same thing. Mind you, this is a world where having a set with a few hundred samples is still considered pretty grandiose - I just happened to have one of the very few out there which would come anywhere close to breaking the 1600 barrier and it is unlikely to really be an issue for at least a few (if not more) years ... but looking down the road it'd be better to nip this in the bud now than punt it until it becomes a real issue. So I've seen the header file where the 1600 column limit is defined, and I know the arguments that no one should ever want to come anywhere close to that limit. I'm willing to accept that these matrices could be stored in some alternate configuration, although I don't really know what that would be. It's possible that the right answer might be "pgsql just isn't the right tool for this job" or even punting it for down the road might be the correct choice. I was just hoping that some folks here might be able to give their thoughts here.
pgsql-general by date: