Home > mailing lists

Re: TEXT column > 1Gb - Mailing list pgsql-general

From	Rob Sargent
Subject	Re: TEXT column > 1Gb
Date	April 12, 2023 21:29:34
Msg-id	4e976074-5708-01a8-4cbf-f317e4f424b0@gmail.com Whole thread Raw
In response to	Re: TEXT column > 1Gb (Joe Carlson <jwcarlson@lbl.gov>)
List	pgsql-general

Tree view

On 4/12/23 15:03, Joe Carlson wrote:

On Apr 12, 2023, at 12:21 PM, Rob Sargent <robjsargent@gmail.com> wrote:

On 4/12/23 13:02, Ron wrote:
Must the genome all be in one big file, or can you store them one line per table row?

The assumption in the schema I’m using is 1 chromosome per record. Chromosomes are typically strings of continuous sequence (A, C, G, or T) separated by gaps (N) of approximately known, or completely unknown size. In the past this has not been a problem since sequenced chromosomes were maybe 100 megabases. But sequencing is better now with the technology improvements and tackling more complex genomes. So gigabase chromosomes are common.

A typical use case might be from someone interested in seeing if they can identify the regulatory elements (the on or off switches) of a gene. The protein coding part of a gene can be predicted pretty reliably, but the upstream untranslated region and regulatory elements are tougher. So they might come to our web site and want to extract the 5 kb bit of sequence before the start of the gene and look for some of the common motifs that signify a protein binding site. Being able to quickly pull out a substring of the genome to drive a web app is something we want to do quickly.

Well if you're actually using the sequence, both text and bytea are inherently substring friendly. Your problem goes back to transferring large strings and that's where http/tomcat is you friend. Sounds like you're web friendly already. You have to stream from the client/supplier, of course.

pgsql-general by date:

From: Joe Carlson
Date: 12 April 2023, 21:03:36
Subject: Re: TEXT column > 1Gb

From: Kirk Wolak
Date: 12 April 2023, 21:35:20
Subject: Re: Guidance on INSERT RETURNING order

Re: TEXT column > 1Gb - Mailing list pgsql-general

Previous

Next