Re: Converting MySQL tinyint to PostgreSQL - Mailing list pgsql-general
From | Jim C. Nasby |
---|---|
Subject | Re: Converting MySQL tinyint to PostgreSQL |
Date | |
Msg-id | 20050714163036.GO92165@decibel.org Whole thread Raw |
In response to | Re: Converting MySQL tinyint to PostgreSQL (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Converting MySQL tinyint to PostgreSQL
|
List | pgsql-general |
On Wed, Jul 13, 2005 at 07:52:04PM -0400, Bruce Momjian wrote: > Ron Mayer wrote: > > Martijn van Oosterhout wrote: > > > > > > Well, you get another issue, alignment. If you squeeze your string > > > down, the next field, if it is an int or string, will get padded to a > > > multiple of 4 negating most of the gains. Like in C structures, there > > > is padding to optimise access. > > > > Anecdotally I hear at least as many people say that their database > > is more I/O bound than CPU bound; and it seems that adding bytes > > for alignment is a way of reducing CPU for more disk I/O. > > > > I guess unaligned access so expensive that it makes up for the extra i/o? > > This is a good point. We have always stored data on disk that exactly > matches its layout in memory. We could change that, but no one has > shown it would be a win. Out of curiosity, what would be involved in hacking the backend enough to be able to test this theory out? I'm guessing you'd want to convert between on-disk and in-memory formats as you read pages in, so either on-disk pages would become variable size (and smaller than memory pages) or in-memory pages would become variable size (and larger than on-disk pages). Or maybe as an alternative, would it be possible to determine how much space in a given relation was being wasted due to padding? That could be used to figure out how much IO could be saved on different transactions. While there would be a slight CPU penalty every time you read or write a page, I find it hard to believe it could come close to equaling IO cost. On a side note, I think it might be useful to have a seperate TODO catagory for ideas that need to be tested to see if they're worth implementing for real. This is a case where it's probably substantially easier to estimate (or maybe even measure) how much there is to gain from this than to do the actual work and then see if it helps. It's also likely that a less experienced hacker could test the theory out. Some likely items for this list: Reduce WAL traffic so only modified values are written rather than entire rows? Find a way to reduce rotational delay when repeatedly writing last WAL page Precompile SQL functions to avoid overheadDo async I/O for faster random read-ahead of data Not on todo: Estimate gains from not using the in-memory format of data for on-disk storage Estimate gains from reducing the amount of space used by visibility information in each tuple BTW, what ever happened to the idea of having a list of projects for beginners? (Or maybe it'd be better to assign a numeric difficulty to each TODO item?) -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
pgsql-general by date: