Thread: Client/Server compression?
Just curious, and honestly I haven't looked, but is there any form of compression between clients and servers? Has this been looked at? Greg
Greg Copeland wrote: > Just curious, and honestly I haven't looked, but is there any form of > compression between clients and servers? Has this been looked at? This issues has never come up before. It is sort of like compressing an FTP session. No one really does that. Is there value in trying it with PostgreSQL? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Well, it occurred to me that if a large result set were to be identified before transport between a client and server, a significant amount of bandwidth may be saved by using a moderate level of compression. Especially with something like result sets, which I tend to believe may lend it self well toward compression. Unlike FTP which may be transferring (and often is) previously compressed data, raw result sets being transfered between the server and a remote client, IMOHO, would tend to compress rather well as I doubt much of it would be true random data. This may be of value for users with low bandwidth connectivity to their servers or where bandwidth may already be at a premium. The zlib exploit posting got me thinking about this. Greg On Thu, 2002-03-14 at 12:20, Bruce Momjian wrote: > Greg Copeland wrote: > > Just curious, and honestly I haven't looked, but is there any form of > > compression between clients and servers? Has this been looked at? > > This issues has never come up before. It is sort of like compressing an > FTP session. No one really does that. Is there value in trying it with > PostgreSQL? > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Greg Copeland wrote: Checking application/pgp-signature: FAILURE -- Start of PGP signed section. > Well, it occurred to me that if a large result set were to be identified > before transport between a client and server, a significant amount of > bandwidth may be saved by using a moderate level of compression. > Especially with something like result sets, which I tend to believe may > lend it self well toward compression. > > Unlike FTP which may be transferring (and often is) previously > compressed data, raw result sets being transfered between the server and > a remote client, IMOHO, would tend to compress rather well as I doubt > much of it would be true random data. > I should have said compressing the HTTP protocol, not FTP. > This may be of value for users with low bandwidth connectivity to their > servers or where bandwidth may already be at a premium. But don't slow links do the compression themselves, like PPP over a modem? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > Greg Copeland wrote: > > Well, it occurred to me that if a large result set were to be identified > > before transport between a client and server, a significant amount of > > bandwidth may be saved by using a moderate level of compression. > > Especially with something like result sets, which I tend to believe may > > lend it self well toward compression. > > I should have said compressing the HTTP protocol, not FTP. > > > This may be of value for users with low bandwidth connectivity to their > > servers or where bandwidth may already be at a premium. > > But don't slow links do the compression themselves, like PPP over a > modem? Yes, but that's packet level compression. You'll never get even close to the result you can achieve compressing the set as a whole. Speaking of HTTP, it's fairly common for web servers (Apache has mod_gzip) to gzip content before sending it to the client (which unzips it silently); especially when dealing with somewhat static content (so it can be cached zipped). This can provide great bandwidth savings. I'm sceptical of the benefit such compressions would provide in this setting though. We're dealing with sets that would have to be compressed every time (no caching) which might be a bit expensive on a database server. Having it as a default off option for psql migtht be nice, but I wonder if it's worth the time, effort, and cpu cycles.
Bruce Momjian wrote: > > Greg Copeland wrote: > > Checking application/pgp-signature: FAILURE > -- Start of PGP signed section. > > Well, it occurred to me that if a large result set were to be identified > > before transport between a client and server, a significant amount of > > bandwidth may be saved by using a moderate level of compression. > > Especially with something like result sets, which I tend to believe may > > lend it self well toward compression. > > > > Unlike FTP which may be transferring (and often is) previously > > compressed data, raw result sets being transfered between the server and > > a remote client, IMOHO, would tend to compress rather well as I doubt > > much of it would be true random data. > > > > I should have said compressing the HTTP protocol, not FTP. > > > This may be of value for users with low bandwidth connectivity to their > > servers or where bandwidth may already be at a premium. > > But don't slow links do the compression themselves, like PPP over a > modem? Yes, and not really. Modems have very very very small buffers, so the compression is extremely ineffectual. Link-level compression can be *highly* effective in making client/server communication snappy, since faster processors are tending to push the speed bottleneck onto the wire. We use HTTP Content-Encoding of gzip for our company and the postgis.refractions.net site, and save about 60% on all the text content on the wire. For highly redundant data (like result sets) the savings would be even greater. I have nothing but good things to say about client/server compression. -- __ / | Paul Ramsey | Refractions Research | Email: pramsey@refractions.net | Phone: (250) 885-0632 \_
On Thu, 2002-03-14 at 14:35, Bruce Momjian wrote: > Greg Copeland wrote: > > Checking application/pgp-signature: FAILURE > -- Start of PGP signed section. > > Well, it occurred to me that if a large result set were to be identified > > before transport between a client and server, a significant amount of > > bandwidth may be saved by using a moderate level of compression. > > Especially with something like result sets, which I tend to believe may > > lend it self well toward compression. > > > > Unlike FTP which may be transferring (and often is) previously > > compressed data, raw result sets being transfered between the server and > > a remote client, IMOHO, would tend to compress rather well as I doubt > > much of it would be true random data. > > I should have said compressing the HTTP protocol, not FTP. Except that lots of people compress HTTP traffic (or rather should, if they were smart). Bandwidth is much more expensive than CPU time, and most browsers have built-in support for gzip-encoded data. Take a look at mod_gzip or mod_deflate (2 Apache modules) for more info on this. IMHO, compressing data would be valuable iff there are lots of people with a low-bandwidth link between Postgres and their database clients. In my experience, that is rarely the case. For example, people using Postgres as a backend for a dynamically generated website usually have their database on the same server (for a low-end site), or on a separate server connected via 100mbit ethernet to a bunch of webservers. In this situation, compressing the data between the database and the webservers will just add more latency and increase the load on the database. Perhaps I'm incorrect though -- are there lots of people using Postgres with a slow link between the database server and the clients? Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> This may be of value for users with low bandwidth connectivity to their >> servers or where bandwidth may already be at a premium. > But don't slow links do the compression themselves, like PPP over a > modem? Even if the link doesn't compress, shoving the feature into PG itself isn't necessarily the answer. I'd suggest running such a connection through an ssh tunnel, which would give you encryption as well as compression. regards, tom lane
You can get some tremendous gains by compressing HTTP sessions - mod_gzip for Apache does this very well. I believe SlashDot saves in the order of 30% of their bandwidth by using compression, as do sites like http://www.whitepages.com.au/ and http://www.yellowpages.com.au/ The mod_gzip trick is effectively very similar to what Greg is proposing. Of course, how often would you connect to your database over anything less than a fast (100mbit+) LAN connection? In any case the conversation regarding FE/BE protocol changes occurs frequently, and this thread would certainly impact that protocol. Has any thought ever been put into using an existing standard such as HTTP instead of the current postgres proprietary protocol? There are a lot of advantages: * You could leverage the existing client libraries (java.net.URL etc) to make writing PG clients (JDBC/ODBC/custom) an absolute breeze. * Results sets / server responses could be returned in XML. * The protocol handles extensions well (X-* headers) * Load balancing across a postgres cluster would be trivial with any number of software/hardware http load balancers. * The prepared statement work needs to hit the FE/BE protocol anyway... If the project gurus thought this was worthwhile, I could certainly like to have a crack at it. Regards, Mark > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian > Sent: Friday, 15 March 2002 6:36 AM > To: Greg Copeland > Cc: PostgresSQL Hackers Mailing List > Subject: Re: [HACKERS] Client/Server compression? > > > Greg Copeland wrote: > > Checking application/pgp-signature: FAILURE > -- Start of PGP signed section. > > Well, it occurred to me that if a large result set were to be identified > > before transport between a client and server, a significant amount of > > bandwidth may be saved by using a moderate level of compression. > > Especially with something like result sets, which I tend to believe may > > lend it self well toward compression. > > > > Unlike FTP which may be transferring (and often is) previously > > compressed data, raw result sets being transfered between the server and > > a remote client, IMOHO, would tend to compress rather well as I doubt > > much of it would be true random data. > > > > I should have said compressing the HTTP protocol, not FTP. > > > This may be of value for users with low bandwidth connectivity to their > > servers or where bandwidth may already be at a premium. > > But don't slow links do the compression themselves, like PPP over a > modem? > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
On Thu, 2002-03-14 at 13:35, Bruce Momjian wrote: > Greg Copeland wrote: > > Checking application/pgp-signature: FAILURE > -- Start of PGP signed section. > > Well, it occurred to me that if a large result set were to be identified > > before transport between a client and server, a significant amount of > > bandwidth may be saved by using a moderate level of compression. > > Especially with something like result sets, which I tend to believe may > > lend it self well toward compression. > > > > Unlike FTP which may be transferring (and often is) previously > > compressed data, raw result sets being transfered between the server and > > a remote client, IMOHO, would tend to compress rather well as I doubt > > much of it would be true random data. > > > > I should have said compressing the HTTP protocol, not FTP. > > > This may be of value for users with low bandwidth connectivity to their > > servers or where bandwidth may already be at a premium. > > But don't slow links do the compression themselves, like PPP over a > modem? Yes and no. Modem compression doesn't understand the nature of the data that is actually flowing through it. As a result, a modem is going to speed an equal amount of time trying to compress the PPP/IP/NETBEUI protocols as it does trying to compress the data contained within those protocol envelopes. Furthermore, modems tend to have a very limited amount of time to even attempt to compress, combined with the fact that they have very limited buffer space, usually limits its ability to provide effective compression. Because of these issues, it not uncommon for a modem to actually yield a larger compressed block than was the input. I'd also like to point out that there are also other low speed connections available which are in use which do not make use of modems as well as modems which do not support compression (long haul modems for example). As for your specific example of HTTP versus FTP, I would also like to point out that it is becoming more and more common for gzip'd data to be transported within the HTTP protocol whereby each end is explicitly aware of the compression taking place on the link with knowledge of what to do with it. Also, believe it or not, one of the common uses of SSH is to provide session compression. It is not unheard of for people to disable the encryption to simply use it for a compression tunnel which also provides for modest session obscurantism. Greg
On Thu, 2002-03-14 at 14:14, Neil Conway wrote: > On Thu, 2002-03-14 at 14:35, Bruce Momjian wrote: > > Greg Copeland wrote: > > > > Checking application/pgp-signature: FAILURE > > -- Start of PGP signed section. > > > Well, it occurred to me that if a large result set were to be identified > > > before transport between a client and server, a significant amount of > > > bandwidth may be saved by using a moderate level of compression. > > > Especially with something like result sets, which I tend to believe may > > > lend it self well toward compression. > > > > > > Unlike FTP which may be transferring (and often is) previously > > > compressed data, raw result sets being transfered between the server and > > > a remote client, IMOHO, would tend to compress rather well as I doubt > > > much of it would be true random data. > > > > I should have said compressing the HTTP protocol, not FTP. > > Except that lots of people compress HTTP traffic (or rather should, if > they were smart). Bandwidth is much more expensive than CPU time, and > most browsers have built-in support for gzip-encoded data. Take a look > at mod_gzip or mod_deflate (2 Apache modules) for more info on this. > > IMHO, compressing data would be valuable iff there are lots of people > with a low-bandwidth link between Postgres and their database clients. > In my experience, that is rarely the case. For example, people using > Postgres as a backend for a dynamically generated website usually have > their database on the same server (for a low-end site), or on a separate > server connected via 100mbit ethernet to a bunch of webservers. In this > situation, compressing the data between the database and the webservers > will just add more latency and increase the load on the database. > > Perhaps I'm incorrect though -- are there lots of people using Postgres > with a slow link between the database server and the clients? > What about remote support of these databases where a VPN may not be available? In my past experience, this was very common as many companies do not was to expose their database, even via a VPN to the out side world, while allowing only modem access. Not to mention, road warriors that may need to remotely support their databases may find value here too. Would they not? ...I think I'm pretty well coming to the conclusion that it may be of some value...even if only for a limited number of users. Greg
On Thu, 2002-03-14 at 14:29, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> This may be of value for users with low bandwidth connectivity to their > >> servers or where bandwidth may already be at a premium. > > > But don't slow links do the compression themselves, like PPP over a > > modem? > > Even if the link doesn't compress, shoving the feature into PG itself > isn't necessarily the answer. I'd suggest running such a connection > through an ssh tunnel, which would give you encryption as well as > compression. > > regards, tom lane Couldn't the same be said for SSL support? I'd also like to point out that it's *possible* that this could also be a speed boost under certain work loads where extra CPU is available as less data would have to be transfered through the OS, networking layers, and device drivers. Until zero copy transfers becomes common on all platforms for all devices, I would think that it's certainly *possible* that this *could* offer a possible improvement...well, perhaps a break even at any rate... Such claims, again, given specific workloads for compressed file systems are not unheard off as less device I/O has to take place. Greg
On the subject on client/server compression, does the server decompress toast data before sending it to the client? Is so, why (other than requiring modifications to the protocol)? On the flip side, does/could the client toast insert/update data before sending it to the server? -Kyle
Kyle wrote: > On the subject on client/server compression, does the server > decompress toast data before sending it to the client? Is so, why > (other than requiring modifications to the protocol)? > > On the flip side, does/could the client toast insert/update data > before sending it to the server? It has to decrypt it so the server functions can process it too. Hard to avoid that. Of course, in some cases, it doesn't need to be processed on the server, just passed, so it would have to be done conditionally. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, 2002-03-14 at 14:03, Arguile wrote: [snip] > I'm sceptical of the benefit such compressions would provide in this setting > though. We're dealing with sets that would have to be compressed every time > (no caching) which might be a bit expensive on a database server. Having it > as a default off option for psql migtht be nice, but I wonder if it's worth > the time, effort, and cpu cycles. > I dunno. That's a good question. For now, I'm making what tends to be a safe assumption (opps...that word), that most database servers will be I/O bound rather than CPU bound. *IF* that assumption hold true, it sounds like it may make even more sense to implement this. I do know that in the past, I've seen 90+% compression ratios on many databases and 50% - 90% compression ratios on result sets using tunneled compression schemes (which were compressing things other than datasets which probably hurt overall compression ratios). Depending on the work load and the available resources on a database system, it's possible that latency could actually be reduced depending on where you measure this. That is, do you measure latency as first packet back to remote or last packet back to remote. If you use last packet, compression may actually win. My current thoughts are to allow for enabled/disabled compression and variable compression settings (1-9) within a database configuration. Worse case, it may be fun to implement and I'm thinking there may actually be some surprises as an end result if it's done properly. In looking at the communication code, it looks like only an 8k buffer is used. I'm currently looking to bump this up to 32k as most OS's tend to have a sweet throughput spot with buffer sizes between 32k and 64k. Others, depending on the devices in use, like even bigger buffers. Because of the fact that this may be a minor optimization, especially on a heavily loaded server, we may want to consider making this a configurable parameter. Greg
On Thu, 2002-03-14 at 19:43, Bruce Momjian wrote: > Kyle wrote: > > On the subject on client/server compression, does the server > > decompress toast data before sending it to the client? Is so, why > > (other than requiring modifications to the protocol)? > > > > On the flip side, does/could the client toast insert/update data > > before sending it to the server? > > It has to decrypt it so the server functions can process it too. Hard > to avoid that. Of course, in some cases, it doesn't need to be > processed on the server, just passed, so it would have to be done > conditionally. > Along those lines, it occurred to me if the compressor somehow knew the cardinality of the data rows involved with the result set being returned, a compressor data dictionary (...think of it as a heads up on patterns to be looking for) could be created using the unique cardinality values which, I'm thinking, could dramatically improve the level of compression for data being transmitted. Just some food for thought. After all, these two seem to be somewhat related as you wouldn't want the communication layer attempting to recompress data which was natively compressed and needed to be transparently transmitted. Greg
Greg Copeland wrote: > On Thu, 2002-03-14 at 14:03, Arguile wrote: > > [snip] > > > I'm sceptical of the benefit such compressions would provide in this setting > > though. We're dealing with sets that would have to be compressed every time > > (no caching) which might be a bit expensive on a database server. Having it > > as a default off option for psql migtht be nice, but I wonder if it's worth > > the time, effort, and cpu cycles. > > > > I dunno. That's a good question. For now, I'm making what tends to be > a safe assumption (opps...that word), that most database servers will be > I/O bound rather than CPU bound. *IF* that assumption hold true, it If you have too much CPU idle time you wasted money by oversizing the machine. And as soon as you add SORT BY to your queries, you'll see some CPU used. I only make the assumption that whenever there is a database server, there is an application server as well (or multiple of them). Scenarios that require direct end-user connectivity to the database server (alas Access->MSSQL) should NOT be encouraged. The db and app should be very close together, coupled with a dedicated backbone net. No need for encryption, and ifvolume is a problem, gigabit is the answer. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Greg Copeland wrote: > [cut] > My current thoughts are to allow for enabled/disabled compression and > variable compression settings (1-9) within a database configuration. > Worse case, it may be fun to implement and I'm thinking there may > actually be some surprises as an end result if it's done properly. > > [cut] > > Greg Wouldn't Tom's suggestion of riding on top of ssh would give similar results? Anyway, it'd probably be a good proof of concept of whether or not it's worth the effort. And that brings up the question: how would you measure the benefit? I'd assume you'd get a good cut in network traffic, but you'll take a hit in cpu time. What's an acceptable tradeoff? That's one reason I was thinking about the toast stuff. If the backend could serve toast, you'd get an improvement in server to client network traffic without the server spending cpu time on compression since the data has previously compressed. Let me know if this is feasible (or slap me if this is how things already are): when the backend detoasts data, keep both copies in memory. When it comes time to put data on the wire, instead of putting the whole enchilada down give the client the compressed toast instead. And yeah, I guess this would require a protocol change to flag the compressed data. But it seems like a way to leverage work already done. -kf
On Fri, 2002-03-15 at 19:44, Kyle wrote: [snip] > Wouldn't Tom's suggestion of riding on top of ssh would give similar > results? Anyway, it'd probably be a good proof of concept of whether > or not it's worth the effort. And that brings up the question: how > would you measure the benefit? I'd assume you'd get a good cut in > network traffic, but you'll take a hit in cpu time. What's an > acceptable tradeoff? Good question. I've been trying to think of meaningful testing methods, however, I can still think of reasons all day long where it's not an issue of a "tradeoff". Simply put, if you have a low bandwidth connection, as long as there are extra cycles available on the server, who really cares...except for the guy at the end of the slow connection. As for SSH, well, that should be rather obvious. It often is simply not available. While SSH is nice, I can think of many situations this is a win/win. At least in business settings...where I'm assuming the goal is to get Postgres into. Also, along those lines, if SSH is the answer, then surely the SSL support should be removed too...as SSH provides for encryption too. Simply put, removing SSL support makes about as much sense as asserting that SSH is the final compression solution. Also, it keeps being stated that a tangible tradeoff between CPU and bandwidth must be realized. This is, of course, a false assumption. Simply put, if you need bandwidth, you need bandwidth. Its need is not a function of CPU, rather, it's a lack of bandwidth. Having said that, I of course would still like to have something meaningful which reveals the impact on CPU and bandwidth. I'm talking about something that would be optional. So, what's the cost of having a little extra optional code in place? The only issue, best I can tell, is can it be implemented in a backward compatible manner. > > That's one reason I was thinking about the toast stuff. If the > backend could serve toast, you'd get an improvement in server to > client network traffic without the server spending cpu time on > compression since the data has previously compressed. > > Let me know if this is feasible (or slap me if this is how things > already are): when the backend detoasts data, keep both copies in > memory. When it comes time to put data on the wire, instead of > putting the whole enchilada down give the client the compressed toast > instead. And yeah, I guess this would require a protocol change to > flag the compressed data. But it seems like a way to leverage work > already done. > I agree with that, however, I'm guessing that implementation would require a significantly larger effort than what I'm suggesting...then again, probably because I'm not aware of all the code yet. Pretty much, the basic implementation could be in place by the end of this weekend with only a couple hours worth of work...and then, mostly because I still don't know lots of the code. The changes you are talking about is going to require not only protocol changes but changes at several layers within the engine. Of course, something else to keep in mind is that using the TOAST solution requires that TOAST already be in use. What I'm suggesting benefits (size wise) all types of data being sent back to a client. Greg
Greg Copeland <greg@CopelandConsulting.Net> writes: > I'm talking about something that would be optional. So, what's the cost > of having a little extra optional code in place? It costs just as much in maintenance effort even if hardly anyone uses it. Actually, probably it costs *more*, since seldom-used features tend to break without being noticed until late beta or post-release, when it's a lot more painful to fix 'em. FWIW, I was not in favor of the SSL addition either, since (just as you say) it does nothing that couldn't be done with an SSH tunnel. If I had sole control of this project I would rip out the SSL code, in preference to fixing its many problems. For your entertainment I will attach the section of my private TODO list that deals with SSL problems, and you may ask yourself whether you'd rather see that development time expended on fixing a feature that really adds zero functionality, or on fixing things that are part of Postgres' core functionality. (Also note that this list covers *only* problems in libpq's SSL support. Multiply this by jdbc, odbc, etc to get an idea of what we'd be buying into to support our own encryption handling across-the-board.) The short answer: we should be standing on the shoulders of the SSH people, not reimplementing (probably badly) what they do well. regards, tom lane SSL support problems -------------------- Fix USE_SSL code in fe-connect: move to CONNECTION_MADE case, always do initial connect() in nonblock mode. Per my msg 10/26/01 21:43 Even better would be to be able to do the SSL negotiation in nonblock mode. Seems like it should be possible from looking at openssl man pages: SSL_connect is documented to work on a nonblock socket. Need to pay attention to SSL_WANT_READ vs WANT_WRITE return codes, however, to determine how to set polling flag. Error handling for SSL connections is a joke in general, not just lack of attention to WANT READ/WRITE. Nonblock socket operations are somewhat broken by SSL because of assumption that library will only block waiting for read-ready. Under SSL it could theoretically block waiting for write-ready, though that should be a relatively small problem normally. Possibly add some API to distinguish which case applies? Not clear that it's needed, since worst possible penalty is a busy-wait loop, and it doesn't seem probable that we could ever so block. (Sure? COPY IN could well block that way ... of course COPY IN hardly works in nonblock mode anyway ...) Fix docs that probably say SSL-enabled lib doesn't support nonblock. Note extreme sloppiness of SSL docs in general, eg the PQREQUIRESSL env var is not docd... Ought to add API to set allow_ssl_try = FALSE to suppress initial SSL try in an SSL-enabled lib. (Perhaps requiressl = -1? Probably a separate var is better.) Also fix connectDB so that params are accepted but ignored if no SSL support --- or perhaps better, should requiressl=1 fail in that case? Connection restart after protocol error is a tad ugly: closing/reopening sock is bad for callers, cf note at end of PQconnectPoll, if the sock # should happen to have changed. Fortunately that's just a legacy-server case (pre-7.0)
Some questions for you at the end of this Tom...which I'd been thinking about...and you touched on...hey, you did tell me to ask! :) On Sat, 2002-03-16 at 14:38, Tom Lane wrote: > Greg Copeland <greg@CopelandConsulting.Net> writes: > > I'm talking about something that would be optional. So, what's the cost > > of having a little extra optional code in place? > > It costs just as much in maintenance effort even if hardly anyone uses > it. Actually, probably it costs *more*, since seldom-used features > tend to break without being noticed until late beta or post-release, > when it's a lot more painful to fix 'em. That wasn't really what I was asking... > > FWIW, I was not in favor of the SSL addition either, since (just as you > say) it does nothing that couldn't be done with an SSH tunnel. If I had > sole control of this project I would rip out the SSL code, in preference Except we seemingly don't see eye to eye on it. SSH just is not very useful in many situations simply because it may not always be available. Now, bring Win32 platforms into the mix and SSH really isn't an option at all...not without bringing extra boxes to the mix. Ack! I guess I don't really understand why you seem to feel that items such as compression and encryption don't belong...compression I can sorta see, however, without supporting evidence one way or another, I guess I don't understand resistance without knowing the whole picture. I would certainly hope the jury would be out on this until some facts to paint a picture are at least available. Encryption, on the other hard, clearly DOES belong in the database (and not just I think so) and should not be thrust onto other applications, such as SSH, when it may not be available or politically risky to use. That of course, doesn't even address the issues of where it may be unpractical for some users, types of applications or platforms. SSH is a fine application which addresses many issues, however, it certainly is not an end-all do all encryption/compression solution. Does that mean SSL should be the native encryption solution? I'm not sure I have an answer to that, however, encryption should be natively available IMOHO. As for the laundry list of items...those are simply issues that should of been worked out prior to it being merged into the code..it migrated to being a maintenance issue. That's not really applicable to most situations if an implementation is well coded and complete prior to it being merged into the code base. Lastly, stating that a maintenance cost of one implementation is a shared cost for all unrelated sections of code is naive at best. Generally speaking, the level of maintenance is inversely proportional to the quality of a specific design and implementation. At this point in time, I'm fairly sure I'm going to code up a compression layer to play with. If it never gets accepted, I'm pretty sure I'm okay with that. I guess if it's truly worthy, it can always reside in the contributed section. On the other hand, if value can be found in such an implementation and all things being equal, I guess I wouldn't understand why it wouldn't be accepted. ================================ questions ================================ If I implement compression between the BE and the FE libpq, does that mean that it needs to be added to the other interfaces as well? Do all interfaces (JDBC, ODBC, etc) receive the same BE messages? Is there any documentation which covers the current protocol implementation? Specifically, I'm interested in the negotiation section...I have been read code already. Have you never had to support a database via modem? I have and I can tell you, compression was God-sent. You do realize that this situation if more common that you seem to think it is? Maybe not for Postgres databases now...but for databases in general. Greg
You can also use stunnel for SSL. Preferable to having SSL in postgresql I'd think. Cheerio, Link. At 03:38 PM 3/16/02 -0500, Tom Lane wrote: >FWIW, I was not in favor of the SSL addition either, since (just as you >say) it does nothing that couldn't be done with an SSH tunnel. If I had
Greg Copeland <greg@copelandconsulting.net> writes: > Except we seemingly don't see eye to eye on it. SSH just is not very > useful in many situations simply because it may not always be > available. Now, bring Win32 platforms into the mix and SSH really isn't > an option at all...not without bringing extra boxes to the mix. Ack! Not so. See http://www.openssh.org/windows.html. > If I implement compression between the BE and the FE libpq, does that > mean that it needs to be added to the other interfaces as well? Yes. > Is there any documentation which covers the current protocol > implementation? Yes. See the protocol chapter in the developer's guide. > Have you never had to support a database via modem? Yes. ssh has always worked fine for me ;-) > You do realize that this situation > if more common that you seem to think it is? I was not the person claiming that low-bandwidth situations are of no interest. I was the person claiming that the Postgres project should not expend effort on coding and maintaining our own solutions, when there are perfectly good solutions available that we can sit on top of. Yes, a solution integrated into Postgres would be easier to use and perhaps a bit more efficient --- but do the incremental advantages of an integrated solution justify the incremental cost? I don't think so. The advantages seem small to me, and the long-term costs not so small. regards, tom lane