Re: libpq compression - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: libpq compression |
Date | |
Msg-id | f02d6f64-fe5e-f507-4870-f433cf253441@postgrespro.ru Whole thread Raw |
In response to | Re: libpq compression (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: libpq compression
Re: libpq compression |
List | pgsql-hackers |
On 09.01.2021 23:31, Justin Pryzby wrote: > On Thu, Dec 17, 2020 at 05:54:28PM +0300, Konstantin Knizhnik wrote: >> I am maintaining this code in >> git@github.com:postgrespro/libpq_compression.git repository. >> I will be pleased if anybody, who wants to suggest any bug >> fixes/improvements of libpq compression, create pull requests: it will be >> much easier for me to merge them. > Thanks for working on this. > I have a patch for zstd compression in pg_dump so I looked at your patch. > I'm attaching some language fixes. Thank you very much. I applied your patch on top of pull request of Daniil Zakhlystov who has implemented support of using different compressors in different direction. Frankly speaking I still very skeptical concerning too much flexibility in compression configuration: - toggle compression on the fly - using different compression algorithms in both directions - toggle compression on the fly According to Daniil's results there is only 30% differences in compression ration between zstd:1 and zstd:19. But making it possible to specify arbitrary compression level we give user of a simple tool to attack server (cause CPU/memory exhaustion). > >> +zstd_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size) >> +zlib_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size) >> +build_compressors_list(PGconn *conn, char** client_compressors, bool build_descriptors) > Are you able to run pg_indent to fix all the places where "*" is before the > space ? (And similar style issues). Also done by Daniil. > > There are several compression patches in the commitfest, I'm not sure how much > they need to be coordinated, but for sure we should coordinate the list of > compressions available at compile time. > > Maybe there should be a central structure for this, or maybe just the ID > numbers of compressors should be a common enum/define. In my patch, I have: > > +struct compressLibs { > + const CompressionAlgorithm alg; > + const char *name; /* Name in -Z alg= */ > + const char *suffix; /* file extension */ > + const int defaultlevel; /* Default compression level */ > +}; > > Maybe we'd also want to store the "magic number" of each compression library. > Maybe there'd also be a common parsing of compression options. > > You're supporting a syntax like zlib,zstd:5, but zstd also supports long-range, > checksum, and rsyncable modes (rsyncable is relevant to pg_dump, but not to > libpq). There are at least three places in Postgres where compression is used right : 1. TOAST and extened attributes compression. 2. pg_basebackup compression 3. pg_dump compression And there are also patches for 4. page compression 5. protocol compression It is awful that all this five places are using compression in their own way. It seems to me that compression (as well as all other system dependent stuff as socket IO, file IO, synchronization primitives,...) should be extracted to SAL (system-abstract layer) where then can be used both by backend, frontend and utilities. Including external utilities, like pg_probackup, pg_bouncer, Odyssey, ... Unfortunately such refactoring requires so much efforts, that it can have any chance to be committed if this work will be coordinated by one of the core committers. > I think your patch has an issue here. You have this: > > src/interfaces/libpq/fe-connect.c > > + pqGetc(&resp, conn); > + index = resp; > + if (index == (char)-1) > + { > + appendPQExpBuffer(&conn->errorMessage, > + libpq_gettext( > + "server is not supported requested compression algorithms%s\n"), > + conn->compression); > + goto error_return; > + } > + Assert(!conn->zstream); > + conn->zstream = zpq_create(conn->compressors[index].impl, > + conn->compressors[index].level, > + (zpq_tx_func)pqsecure_write, (zpq_rx_func)pqsecure_read,conn, > + &conn->inBuffer[conn->inCursor], conn->inEnd-conn->inCursor); > > This takes the "index" returned by the server and then accesses > conn->compressors[index] without first checking if the index is out of range, > so a malicious server could (at least) crash the client by returning index=666. Thank you for pointed this problem. I will add the check, although I think that problem of malicious server is less critical than malicious client. Also such "byzantine" server may return wrong data, which in many cases is more fatal than crash of a client. > I suggest that there should be an enum of algorithms, which is constant across > all servers. They would be unconditionally included and not #ifdef depending > on compilation options. I do not think that it is possible (even right now, it is possible to build Postgres without zlib support). Also if new compression algorithms are added, then in any case we have to somehow handle situation when old client is connected to new server and visa versa. > > That would affect the ZpqAlgorithm data structure, which would include an ID > number similar to > src/bin/pg_dump/compress_io.h:typedef enum...CompressionAlgorithm; > > The CompressionAck would send the ID rather than the "index". > A protocol analyzer like wireshark could show "Compression: Zstd". > You'd have to verify that the ID is supported (and not bogus). > > Right now, when I try to connect to an unpatched server, I get: > psql: error: expected authentication request from server, but received v Thank you for pointing it: fixed. > +/* > + * Array with all supported compression algorithms. > + */ > +static ZpqAlgorithm const zpq_algorithms[] = > +{ > +#if HAVE_LIBZSTD > + {zstd_name, zstd_create, zstd_read, zstd_write, zstd_free, zstd_error, zstd_buffered_tx, zstd_buffered_rx}, > +#endif > +#if HAVE_LIBZ > + {zlib_name, zlib_create, zlib_read, zlib_write, zlib_free, zlib_error, zlib_buffered_tx, zlib_buffered_rx}, > +#endif > + {no_compression_name} > +}; > > In config.sgml, it says that libpq_compression defaults to on (on the server > side), but in libpq.sgml it says that it defaults to off (on the client side). > Is that what's intended ? I would've thought the defaults would match, or that > the server would enforce a default more conservative than the client's (the DBA > should probably have to explicitly enable compression, and need to "opt-in" > rather than "opt-out"). Yes, it is intended behavior: libpq_compression GUC allows to prohibit compression requests fro clients if due to some reasons (security, CPU consumption is not desired). But by default server should support compression if it is requested by client. But client should not request compression by default: it makes sense only for queries returning large result sets or transferring a lot of data (liek COPY). > > Maybe instead of a boolean, this should be a list of permitted compression > algorithms. This allows the admin to set a "policy" rather than using the > server's hard-coded preferences. This could be important to disable an > algorithm at run-time if there's a vulnerability found, or performance problem, > or buggy client, or for diagnostics, or performance testing. Actually, I think > it may be important to allow the admin to disable not just an algorithm, but > also its options. It should be possible for the server to allow "zstd" > compression but not "zstd --long", or zstd:99 or arbitrarily large window > sizes. This seems similar to SSL cipher strings. I think we'd want to be able > to allow (or prohibit) at least alg:* (with options) or alg (with default > options). Sorry, may be you are looking not at the latest version of the patch? Right now "compression" parameter accepts not only boolean values but also list of suggested algorithms with optional compression level, like "zstd:1,zlib" > Your patch documents "libpq_compression=auto" but that doesn't work: > WARNING: none of specified algirthms auto is supported by client > I guess you mean "any" which is what's implemented. > I suggest to just remove that. It is some inconsistency with documentation. It seems to me that I have already fixed it. "auto" was renamed to "any", > I think maybe your patch should include a way to trivially change the client's > compile-time default: > "if (!conn->compression) conn->compression = DefaultCompression" It can be done using PG_COMPRESSION environment variable. Do we need some other mechanism for it? > > Your patch warns if *none* of the specified algorithms are supported, but I > wonder if it should warn if *any* of them are unsupported (like if someone > writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess > about half of us will do). Ugh... Handling mistyping in connection string seems to be not so good idea (from my point of view). And situation when some of algorithms is not supported by server seems to normal (if new client connects to old server). So I do not want to produce warning in this case. I once again want to repeat my opinion: choosing of compression algorithms should be done automatically. Client should just make an intention to use compression (using compression=on or compression=any) and server should choose most efficient algorithm which is supported by both of them. > $ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,def src/bin/psql/psql 'host=localhost port=1111' > WARNING: none of the specified algorithms are supported by client: abc,def > $ LD_LIBRARY_PATH=./src/interfaces/libpq PGCOMPRESSION=abc,zlib src/bin/psql/psql 'host=localhost port=1111' > (no warning) > > The libpq_compression GUC can be set for a user, like ALTER ROLE .. SET ... > Is there any utility in making it configurable by client address? Or by > encryption? I'm thinking of a policy like "do not allow compression from LOCAL > connection" or "do not allow compression on encrypted connection". Maybe this > would be somehow integrated into pg_hba. But maybe it's not needed (a separate > user could be used to allow/disallow compression). There is definitely no sense to use compression together with encryption: if compression is desired in this case, it cane be done at SSL level. But I do not think that we need more sophisticated mechanism to prohibit compression requests at server level. Yes, it is possible to grant privileges to use compression to some particular role. Or to prohibit it for SSL connections. But I can't imagine some natural arguments for it. May be I am wrong. I will be pleased of somebody can describe some realistic scenarios when any of this features may be needed (taken in account that compression of libpq traffic is not something very bad, insecure or expensive, which can harm server). Misuse of libpq compression may just consume that extra CPU (but not so much to overload server). In any case: we do not have such mechanism to restrict of use SSL connections for some particular roles or disable it for local connection. Why do we need it for compression, which is very similar with encryption? New version of libpq compression patch is attached. It can be also be found at git@github.com:postgrespro/libpq_compression.git -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: