Home > mailing lists

Re: EXTERNAL storage and substring on long strings - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: EXTERNAL storage and substring on long strings
Date	July 31, 2003 20:54:01
Msg-id	18305.1059680697@sss.pgh.pa.us Whole thread Raw
In response to	EXTERNAL storage and substring on long strings (Scott Cain <cain@cshl.org>)
List	pgsql-performance

Tree view

Scott Cain <cain@cshl.org> writes:
> explain analyze select substring(residues from 1000000 for 20000)
> from feature where feature_id=1;

> where feature is a table with ~3 million rows, and residues is a text
> column, where for the majority of the rows of feature, it is null, for a
> large minority, it is shortish strings (a few thousand characters), and
> for 6 rows, residues contains very long strings (~20 million characters
> (it's chromosome DNA sequence from fruit flies)).

I think the reason uncompressed storage loses here is that the runtime
is dominated by the shortish strings, and you have to do more I/O to get
at those if they're uncompressed, negating any advantage from not having
to fetch all of the longish strings.

Or it could be that there's a bug preventing John Gray's substring-slice
optimization from getting used.  The only good way to tell that I can
think of is to rebuild PG with profiling enabled and try to profile the
execution both ways.  Are you up for that?

(BTW, if you are using a multibyte database encoding, then that's your
problem right there --- the optimization is practically useless unless
character and byte indexes are the same.)

            regards, tom lane

pgsql-performance by date:

From: "Jim C. Nasby"
Date: 31 July 2003, 20:53:51
Subject: Odd explain estimate

From: Joe Conway
Date: 31 July 2003, 20:56:45
Subject: Re: EXTERNAL storage and substring on long strings

Re: EXTERNAL storage and substring on long strings - Mailing list pgsql-performance

Previous

Next