Robert Haas wrote:
>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
>> Three things:
>> a. Shouldn't it in theory be possible to have a decompression algorithm
>> which is IO-bound because it decompresses faster than the disk can
>> supply the data? (On common current hardware).
>> b. Has the current algorithm been carefully benchmarked and/or optimised
>> and/or chosen to fit the IO-bound target as close as possible?
>> c. Are there any well-known pitfalls/objections which would prevent me from
>> changing the algorithm to something more efficient (read: IO-bound)?
>Any compression algorithm is going to require you to decompress the
>entire string before extracting a substring at a given offset. When
>the data is uncompressed, you can jump directly to the offset you want
>to read. Even if the compression algorithm requires no overhead at
>all, it's going to make the location of the data nondeterministic, and
>therefore force additional disk reads.
That shouldn't be insurmountable:
- I currently have difficulty imagining applications that actually do lots of substring extractions from large
compressiblefields. The most likely operation would be a table which contains tsearch indexed large textfields, but
thoseare unlikely to participate in a lot of substring extractions.
- Even if substring operations would be likely, I could envision a compressed format which compresses in compressed
chunksof say 64KB which can then be addressed randomly independently.
--
Sincerely, Stephen R. van den Berg.
"Always remember that you are unique. Just like everyone else."