Thread: How to store files into the DB with PHP. (bytea ?)
Hi, I know this question came quite often, but I never found a clear answer in the mailing list. My problem : - I need to store files (pdf, png, jpg) from a web page in PHP. - I have a strong need of security and high availability. I must minimize the risk of having these documents stolen if the web server is compromised. My questions : - Which datatype ? (once again, sorry.). (Text, bytea, lo) ? - Most of samples redirect to lo (large objects), and PHP has a set of dedicated functions for them. But in the documentation we can read : All large objects are placed in a single system table called pg_largeobject. PostgreSQL also supports a storage system called "TOAST" that automatically stores values larger than a single database page into a secondary storage area per table. This makes the large object facility partially obsolete. Why obsolete ? Is it now because of TOAST that we can store up to 1GB into bytea ? If bytea is the easiest solution, why do so many people redirect to LO ? LO stores the datas within only one table, pg_largeobject, I really prefer to store the big datas within my custom table stored on a dedicated tablespace. Due to the high security level of my web application, all the datas must be accessed through stored procedures. Does it change something to the choice ? Where can I find samples of files stored through PHP & stored procedures ? Regards, -- François Delpierre
> Hi, > > I know this question came quite often, but I never found a clear answer in > the > mailing list. > > My problem : > - I need to store files (pdf, png, jpg) from a web page in PHP. > - I have a strong need of security and high availability. I must minimize > the > risk of having these documents stolen if the web server is compromised. If the web server is compromised, the attacker will have full access to the database just as the web application (the connection details are stored in some configuration file, and he may load all the data he wants using a custom PHP script uploaded to the server). You may increase the security by encrypting the files, but the password must not be saved in the database. For example if the files are not shared by multiple users, each file may be encrypted using the user password. > My questions : > - Which datatype ? (once again, sorry.). (Text, bytea, lo) ? > - Most of samples redirect to lo (large objects), and PHP has a set of > dedicated functions for them. But in the documentation we can read : > > All large objects are placed in a single system table called > pg_largeobject. > PostgreSQL also supports a storage system called "TOAST" that > automatically > stores values larger than a single database page into a secondary storage > area > per table. This makes the large object facility partially obsolete. > > Why obsolete ? Is it now because of TOAST that we can store up to 1GB into > bytea ? If bytea is the easiest solution, why do so many people redirect > to LO > ? > LO stores the datas within only one table, pg_largeobject, I really prefer > to > store the big datas within my custom table stored on a dedicated > tablespace. Choice between BYTEA and LOB is mostly about memory requirements - when loading / storing BYTEA columns, you need enough memory to keep the whole object (data stored in the column). So when constructing the INSERT statement you have to place the (escaped) data into it, and that may exceed the memory_limit. With LOB objects this is not a problem thanks to streaming (you may open the LOB as a stream and read / write). On the other side, LOB columns do not have some interesting features (foreign keys, etc.) > Due to the high security level of my web application, all the datas must > be > accessed through stored procedures. Does it change something to the choice > ? > > Where can I find samples of files stored through PHP & stored procedures ? Sorry, but you have some custom layer of stored procedures, so we may only guess. But LOBs are identified using an OID, so the procedures might return OIDs. regards Tomas
On Monday 02 February 2009 12:55:04 tv@fuzzy.cz wrote: > > Hi, > > > > I know this question came quite often, but I never found a clear answer > > in the > > mailing list. > > > > My problem : > > - I need to store files (pdf, png, jpg) from a web page in PHP. > > - I have a strong need of security and high availability. I must minimize > > the > > risk of having these documents stolen if the web server is compromised. > > If the web server is compromised, the attacker will have full access to > the database just as the web application (the connection details are > stored in some configuration file, and he may load all the data he wants > using a custom PHP script uploaded to the server). > > You may increase the security by encrypting the files, but the password > must not be saved in the database. For example if the files are not shared > by multiple users, each file may be encrypted using the user password. I do not agree, this is in fact the reason why stored procedures are mandatory here. If you use stored procedures for security, the credentials that are stored on your web server are restricted to execute procedures, and eventually have a read only access to some tables with no critical informations (ie translations), all other tables are simply forbidden for the web server user. For this security level to work, it means that you use a "login" stored procedure that returns a "ticket" to the web server and/or web browser, valid for a certain amount of time and for the source IP. Every other procedures will require a valid "ticket" in the arguments before it returns data.) This way, the sessions are handled within the database, and the user/password stored in the front end web server cannot read any data from the database without going through a stored procedure that enforce the security. An other option maybe would be to authenticate users on the database directly, and only use a stored procedure (with security definer) to create new users. I do not know exactly the pro/cons to work this way. I would prefer to keep my users within an application table instead of the DB (and in a system table), but this may be the easiest solutions as the stored procedures would no more be required. > > > My questions : > > - Which datatype ? (once again, sorry.). (Text, bytea, lo) ? > > - Most of samples redirect to lo (large objects), and PHP has a set of > > dedicated functions for them. But in the documentation we can read : > > > > All large objects are placed in a single system table called > > pg_largeobject. > > PostgreSQL also supports a storage system called "TOAST" that > > automatically > > stores values larger than a single database page into a secondary storage > > area > > per table. This makes the large object facility partially obsolete. > > > > Why obsolete ? Is it now because of TOAST that we can store up to 1GB > > into bytea ? If bytea is the easiest solution, why do so many people > > redirect to LO > > ? > > LO stores the datas within only one table, pg_largeobject, I really > > prefer to > > store the big datas within my custom table stored on a dedicated > > tablespace. > > Choice between BYTEA and LOB is mostly about memory requirements - when > loading / storing BYTEA columns, you need enough memory to keep the whole > object (data stored in the column). So when constructing the INSERT > statement you have to place the (escaped) data into it, and that may > exceed the memory_limit. With LOB objects this is not a problem thanks to > streaming (you may open the LOB as a stream and read / write). So, there is no way to push binary directly into BYTEA without the need to escape ? I assume that you're talking about memory_limit parameter of PHP on the web server. Can we still use streaming through a stored procedure ? > > On the other side, LOB columns do not have some interesting features > (foreign keys, etc.) > > > Due to the high security level of my web application, all the datas must > > be > > accessed through stored procedures. Does it change something to the > > choice ? > > > > Where can I find samples of files stored through PHP & stored procedures > > ? > > Sorry, but you have some custom layer of stored procedures, so we may only > guess. But LOBs are identified using an OID, so the procedures might > return OIDs. The web server user cannot read (or write) any table directly, so it's forced to use a stored procedure to read or write data. So, all the PHP fuctions on LOB are useless I guess. > > regards > Tomas Regards, -- François Delpierre
>> Choice between BYTEA and LOB is mostly about memory requirements - when >> loading / storing BYTEA columns, you need enough memory to keep the >> whole >> object (data stored in the column). So when constructing the INSERT >> statement you have to place the (escaped) data into it, and that may >> exceed the memory_limit. With LOB objects this is not a problem thanks >> to >> streaming (you may open the LOB as a stream and read / write). > So, there is no way to push binary directly into BYTEA without the need to > escape ? AFAIK no (at least in PHP) - you have to escape the data before using them in SQL statement. > I assume that you're talking about memory_limit parameter of PHP on the > web > server. > Can we still use streaming through a stored procedure ? The LOB principle is quite simple - each LOB is identified by OID, split into multiple segments and stored in a table pg_largeobject. I am just guessing but there are functions used to read / store the LOBs, so the streaming might work (unless you want to use the tickets). But I am just guessing - about a month ago I wrote an article about ways to store files in a PostgreSQL database: http://www.fuzzy.cz/en/articles/storing-files-in-a-postgresql-database/ The site is still under construction, so a lot of stuff is missing there (sorry), but the last section might give you a hint how to mimic LOBs, get rid of some of the BYTEA / LOB disadvantages. >> Sorry, but you have some custom layer of stored procedures, so we may >> only >> guess. But LOBs are identified using an OID, so the procedures might >> return OIDs. > The web server user cannot read (or write) any table directly, so it's > forced > to use a stored procedure to read or write data. So, all the PHP fuctions > on > LOB are useless I guess.
On Monday 02 February 2009 14:17:38 tv@fuzzy.cz wrote: > > But I am just guessing - about a month ago I wrote an article about ways > to store files in a PostgreSQL database: > > http://www.fuzzy.cz/en/articles/storing-files-in-a-postgresql-database/ > > The site is still under construction, so a lot of stuff is missing there > (sorry), but the last section might give you a hint how to mimic LOBs, get > rid of some of the BYTEA / LOB disadvantages. Thanks for this address, this clearly expose the problem. It looks like there is no perfect solution. As I can live with a file size limit of 15Mo for a memory_limit of 64Mo in PHP, I'll certainly go to BYTEA. Anyway, if I use stored procedures to enforce security, the PHP pg_lo_xxx functions will be useless. Thanks a lot, François Delpierre
On Mon, 2009-02-02 at 13:55 +0100, François Delpierre wrote: > On Monday 02 February 2009 12:55:04 tv@fuzzy.cz wrote: > > > > If the web server is compromised, the attacker will have full access to > > the database just as the web application (the connection details are > > stored in some configuration file, and he may load all the data he wants > > using a custom PHP script uploaded to the server). > > > > You may increase the security by encrypting the files, but the password > > must not be saved in the database. For example if the files are not shared > > by multiple users, each file may be encrypted using the user password. > I do not agree, this is in fact the reason why stored procedures are mandatory > here. I don't see that this changes things. Whether you use stored procedures, authenticate against the database, or whatever, your web application layer has access to the information on the way through and any compromise of your webserver will necessarily involve having a 'man in the middle' possibility. > If you use stored procedures for security, the credentials that are stored on > your web server are restricted to execute procedures, and eventually have a > read only access to some tables with no critical informations (ie > translations), all other tables are simply forbidden for the web server user. > For this security level to work, it means that you use a "login" stored > procedure that returns a "ticket" to the web server and/or web browser, valid > for a certain amount of time and for the source IP. Every other procedures > will require a valid "ticket" in the arguments before it returns data.) So an attacker would (e.g.) log the user's credentials as they pass through and then happily generate their own tickets to use to extract the data. > This way, the sessions are handled within the database, and the user/password > stored in the front end web server cannot read any data from the database > without going through a stored procedure that enforce the security. > > An other option maybe would be to authenticate users on the database directly, > and only use a stored procedure (with security definer) to create new users. I > do not know exactly the pro/cons to work this way. I would prefer to keep my > users within an application table instead of the DB (and in a system table), > but this may be the easiest solutions as the stored procedures would no more > be required. So in this case an attacker would (e.g.) log the user's credentials on the way through and use them for subsequent connections to the database... That's not to say that you aren't adding some security: you are. Making an intruder understand how your application works before being able to steal your data will naturally give you a longer window of opportunity to catch them. Probably you would have raised the bar sufficiently above the interest level of an opportunistic script kiddy. Odds are, though, that the level of difference between someone who would be deliberately cracking your server to pull the data from your database, and someone who would be deliberately cracking your server, understanding your application and then pulling the data from your database isn't that great. If you take your paranoia one step further, do the user authentication in the database using some kind of one-time password device then *maybe* you will be getting a more repeatable level of security. In that case someone would only be able to sniff the files as they were accessed - and would not be able to extract them from the database. Cheers, Andrew McMillan. ------------------------------------------------------------------------ andrew (AT) morphoss (DOT) com +64(272)DEBIAN Open Source: the difference between trust and antitrust ------------------------------------------------------------------------
Hi Andrew, > I don't see that this changes things. Whether you use stored > procedures, authenticate against the database, or whatever, your web > application layer has access to the information on the way through and > any compromise of your webserver will necessarily involve having a 'man > in the middle' possibility. You're right, authenticating against the DB will not change anything, my mistake. As far as the user can read a table, he can read all records. > So an attacker would (e.g.) log the user's credentials as they pass > through and then happily generate their own tickets to use to extract > the data. Totally agree, the attacker will be able to access the files of the users that are connecting from the time he put the sniffer in place BUT NOT dump the whole content with thousands of documents from the previous months from users that did not connect recently. So, this limit the impact. To go back to the initial subject of this post, I'm now able to store/read files from the DB up to 20MB without problem. Without using stored procedures yet. (Maybe I can post the code here.) Only an annoying warning about escaping that I can't figure out yet. François.