Thread: Storing Video's or vedio file in DB.
Hello,
I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3.--
Regards :
Venktesh Guttedar.VENKTESH GUTTEDAR wrote: > I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3. > > I am working on a application where i will get video files from mobile app, and i have to store in the server, > so my question is how do i store video's or video files in DB, or do i need to store only the link > of that video file, if yes then where i have to store the file and how to access that file to display > in an html page. > > if anyone is helping me out then give me all details related to storing videos in db, not just > hints. like (what datatype i have to use.? how to encode and decode.? how to accept files through http > post) I cannot help you with the web end of things (wrong forum), but on the database side you have, as you said, the choice between binary data in the database or files on the file system and links to them in the database. I'll tell you the pros and cons. If you store data in the database, you don't have to worry about consistency, which will make development simpler. The downside is that you might end up with a huge database that you will have to backup and maintain, and the performance will be worse than reading files from the file system. I'd consider - how big are the files - how many of them are there - whether performance is critical - how much time you can invest in development - how critical is consistency The data type to use would be "bytea", unless the files are very large or you need support for streaming in the database, in which case "large objects" would have advantages. The big downsides of large objects are that they are not as simple as bytea and you'll have to maintain consistency between large objects and references to them in tables. Yours, Laurenz Albe
On Wed, 17 Dec 2014 15:09:40 +0530 VENKTESH GUTTEDAR <venkteshguttedar@gmail.com> wrote: > so my question is how do i store video's or video files in DB, or do i > need to store only the link of that video file, if yes then where i have to > store the file and how to access that file to display in an html page. If you store your file in the file system, all you need is to store the path to it. Display a link to the file in your html,and the server will deliver the file to the user's web server, which in turn will open it with the proper program onthe user's machine. I do it with all sorts of extensions (docx, xlxs, pdf...), it works very well. -- Salutations, Vincent Veyron https://marica.fr/ Gestion des contentieux, des dossiers de sinistres assurance et des contrats pour le service juridique
Albe Laurenz schrieb am 17.12.2014 um 11:07: > and the performance will be worse than reading files from the file system. There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server. It showed that the database might actually be faster than the file system. As this topic comes up at my workplace every now and then as well, I created a little web application (Java/JDBC) to testthis on Postgres and possibly other DBMS. Turns out the Postgres as well isn't really slower at this than the file system. For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms whereasthe average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs usingthe PK of the table. "Reading from the filesystem" means looking up the path for the file in the database table and then reading the file fromthe filesystem. For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column vs. 260ms reading thefile from disk. The tests were done locally on my Windows laptop. I didn't have time yet to do this on a Linux server. I expect the filesystem to have some impact on the figures and NTFSis not known for being blazingly fast. So maybe those figures will change. My tests however do not take into account the actual time it takes to send the binary data from the server to the client(=browser). It might well be possible that serving the file through an Apache Web Server directly is faster than servingthe file through a JEE Servlet. My intention was to measure the raw read speed of the binary data from the mediumwhere it is stored. > The downside is that you might end up with a huge database > that you will have to backup and maintain I don't really buy the argument with the backup: the amount of data to be backed up is essentially the same. With both solutions you can have incremental backups. Another downside you didn't mentioned is the fact that you have to distribute the files in the filesystem properly. Having thousands or even millions of files in a single directory is not going to be maintenance friendly either. Regards Thomas [1] http://research.microsoft.com/apps/pubs/default.aspx?id=64525
On 17 December 2014 at 13:55, Thomas Kellerer <spam_eater@gmx.net> wrote: > Albe Laurenz schrieb am 17.12.2014 um 11:07: >> and the performance will be worse than reading files from the file system. > > There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server. > It showed that the database might actually be faster than the file system. > > As this topic comes up at my workplace every now and then as well, I created a little web application (Java/JDBC) to testthis on Postgres and possibly other DBMS. > > Turns out the Postgres as well isn't really slower at this than the file system. > > For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms whereasthe average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs usingthe PK of the table. > > "Reading from the filesystem" means looking up the path for the file in the database table and then reading the file fromthe filesystem. With how many blobs/files did you test this? I'm asking because PG stores all blobs in a single table. On a file-system, if all files are stored in a single directory, the situation is similar. However, a file-system has the ability to store files in several directories instead of just one, which is often claimed to improve file-locating performance. Seeing as the read performance of a file (once it's been located) from the file-system versus a blob appears similar, the difference in time for locating the file might well be relevant here. Interesting to see this was tested with MS SQL and therefore limited to NTFS. It's probably useful to test this with other file-systems, such as ZFS or UFS (with DIRHASH!), etc. Regards, Alban Hertroys -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Thomas Kellerer wrote: > Albe Laurenz schrieb am 17.12.2014 um 11:07: >> and the performance will be worse than reading files from the file system. > > There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server. > It showed that the database might actually be faster than the file system. > > As this topic comes up at my workplace every now and then as well, I created a little web application > (Java/JDBC) to test this on Postgres and possibly other DBMS. > > Turns out the Postgres as well isn't really slower at this than the file system. > > For small files around 50k both perform similar: the average time to read the blob from a bytea column > was around 2ms whereas the average time to read the blob from the filesystem was around 1ms. The test > uses 50 threads to read the blobs using the PK of the table. > > "Reading from the filesystem" means looking up the path for the file in the database table and then > reading the file from the filesystem. > > For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column > vs. 260ms reading the file from disk. > > The tests were done locally on my Windows laptop. > I didn't have time yet to do this on a Linux server. I expect the filesystem to have some impact on > the figures and NTFS is not known for being blazingly fast. So maybe those figures will change. That must be some strangeness of the web application, que no? PostgreSQL must do everything that a direct file access does, right? Plus some extra processing (load the data into shared_buffers, ...). Given that, do you have any explanation for what you observed? > My tests however do not take into account the actual time it takes to send the binary data from the > server to the client (=browser). It might well be possible that serving the file through an Apache Web > Server directly is faster than serving the file through a JEE Servlet. My intention was to measure the > raw read speed of the binary data from the medium where it is stored. Why not compare 'SELECT ...' with psql (or libpq) with 'cat ...' to measure the difference? >> The downside is that you might end up with a huge database >> that you will have to backup and maintain > > I don't really buy the argument with the backup: the amount of data to be backed up is essentially the > same. > With both solutions you can have incremental backups. The amount to back up will stay roughly the same, granted. But isn't backup/restore of a large database more cumbersome than backup/restore of a file system? And a major upgrade of a large database is more painful, right? > Another downside you didn't mentioned is the fact that you have to distribute the files in the > filesystem properly. > Having thousands or even millions of files in a single directory is not going to be maintenance > friendly either. That's right, you have to spend some thought on how to store the files as well. Yours, Laurenz Albe
On Wed, 17 Dec 2014 13:55:46 +0100 Thomas Kellerer <spam_eater@gmx.net> wrote: > > Another downside you didn't mentioned is the fact that you have to distribute the files in the filesystem properly. > Having thousands or even millions of files in a single directory is not going to be maintenance friendly either. > That's not a very hard problem to solve. Just write a script that splits the directory into 1000 file chunks, I did it withperl from scratch in a few hours. Now my database dumps in seconds, and rsync takes care of new files just as fast. -- Salutations, Vincent Veyron https://libremen.com/ Legal case, contract and insurance claim management software
On Wed, 17 Dec 2014 16:51:10 +0530 VENKTESH GUTTEDAR <venkteshguttedar@gmail.com> wrote: Hi Venktesh, [you should not reply directly to me; hit reply all to post your messages to the list] > > can you help me in achieving this, i mean how to store it in a file > system how to fetch it from there. Nope. It's not postgresql related, it really depends on what tools you use to build your web site (in your case Django whichI know nothing about) > and storing in a file system and giving the path will be secure.? > > That's part of the configuration of your web server, you'll have to study their documentation/lists -- Salutations, Vincent Veyron https://libremen.com/ Legal case, contract and insurance claim management software
On Wednesday, December 17, 2014 01:55:46 PM Thomas Kellerer wrote: > Albe Laurenz schrieb am 17.12.2014 um 11:07: > > and the performance will be worse than reading files from the file system. > > There is a Microsoft research [1] (from 2006) which tested this "myth" using > SQL Server. It showed that the database might actually be faster than the > file system. > > As this topic comes up at my workplace every now and then as well, I created > a little web application (Java/JDBC) to test this on Postgres and possibly > other DBMS. > > Turns out the Postgres as well isn't really slower at this than the file > system. > > For small files around 50k both perform similar: the average time to read > the blob from a bytea column was around 2ms whereas the average time to > read the blob from the filesystem was around 1ms. The test uses 50 threads > to read the blobs using the PK of the table. > > "Reading from the filesystem" means looking up the path for the file in the > database table and then reading the file from the filesystem. > > For larger files around 250k Postgres was actually faster in my tests: 130ms > reading the bytea column vs. 260ms reading the file from disk. > > The tests were done locally on my Windows laptop. > I didn't have time yet to do this on a Linux server. I expect the filesystem > to have some impact on the figures and NTFS is not known for being > blazingly fast. So maybe those figures will change. > > > My tests however do not take into account the actual time it takes to send > the binary data from the server to the client (=browser). It might well be > possible that serving the file through an Apache Web Server directly is > faster than serving the file through a JEE Servlet. My intention was to > measure the raw read speed of the binary data from the medium where it is > stored. You can get the data from disk about as fast, but actually serving it results in a large CPU hit that isn't present when serving files. And if you're using bytea, your app server has to allocate memory to hold at least one full copy of the file (I seem to recall that it works out to 2 copies, actually, but it's been a while since I tried it). Most languages aren't good about releasing that memory, so that hit stays around until the process gets recycled. For a low volume app, both might be acceptable - any modern CPU can swamp most outbound bandwidth even while decoding bytea. But it is a large amount of overhead compared to a web server just dumping files into a network buffer straight from disk cache. Also, maintaining large tables still sucks. You can partition them to make things friendlier. pg_upgrade makes things nicer, but it can't always be used, so major version upgrades can still be a problem. On the plus side, all your data is in one place, which makes it cluster- friendly and easy to delete files when needed, and makes taking consistent backups much simpler.
On 12/17/2014 4:55 AM, Thomas Kellerer wrote: > Turns out the Postgres as well isn't really slower at this than the file system. > > For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms whereasthe average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs usingthe PK of the table. > > "Reading from the filesystem" means looking up the path for the file in the database table and then reading the file fromthe filesystem. > > For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column vs. 260ms readingthe file from disk. with videos, we're likely looking at file sizes in the 100MB to multi-gigabyte range, unless these are just short snippets. I'm not very familiar with django and python, does it have facilities to stream a very large record, or does it always transfer the whole thing as a chunk in memory? Does it have PostgreSQL Large Object support? Also, serving video via a webserver, this is generally done with a html5 or flash streaming server, where the web application generates the embedded link to the video, but the video itself comes from said streaming thing.... those streaming things are less likely to be able to read a object out of postgres than they are to stream from the file system. -- john r pierce 37N 122W somewhere on the middle of the left coast
On Wed, Dec 17, 2014 at 3:39 AM, VENKTESH GUTTEDAR <venkteshguttedar@gmail.com> wrote: > Hello, > > I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3. > > I am working on a application where i will get video files from mobile > app, and i have to store in the server, > so my question is how do i store video's or video files in DB, or do i > need to store only the link of that video file, if yes then where i have to > store the file and how to access that file to display in an html page. > > if anyone is helping me out then give me all details related to storing > videos in db, not just hints. like (what datatype i have to use.? how to > encode and decode.? how to accept files through http post) Storing files in the db: *) 1 gb limit (2 gb for large objects) *) how you read and write the data really matters. to do this seriously, writing the client in C and using binary wire format is suggested *) you can manage binary transfers in context of transaction, which is nice (no db/fs sync issues) *) backups will become a real headache using standard pg_dump methods *) database will do some compression for you. however be advised pg compression is a cpu cruncher *) memory consumption can be multiples of largest file size Storing files in the fs *) have to manage database records as pointers. synchronization issue suck *) backups will still suck *) performance will be somewhat faster depending on how exactly the data is read *) memory consumption is minimal Personally having done it both ways I personally don't recommend storing files in the database if they are going to be very large which I would estimate to around 10mb or so assuming you have lots of data to store. There are just too many cases where pg will open up the whole file in memory where a well written streaming interface against the o/s will send it right off the disk. Also in my opinion to have anything approximating good performance in pg with binary transfers we are definitely talking a C client. File systems have been aggressively enhanced over the years to try and keep fragmentation low in the face of high write activity. The database doesn't do this and if you are dealing with a large amount of files in the face of concurrent updating and deleting I expect you could start running into severe fragmentation issues over time. merlin
I wouldn't even store it on the filesystem if I could avoid that. Most people I know will assign the video a unique identifier (which is stored in the database) and then store the video filewith a 3rd party (e.g. Amazon S3). 1. This is often cheaper. Videos take up a lot of disk space. Having to ensure 2-3 copies of a file as a failover is notfun. 2. It offloads work from internal servers. Why deal with connections that are serving a static file if you can avoid it? In terms of FS vs DB (aside from the open vs streaming which was already brought up) I think the big issue with storing large files in the database is the input/output connection. Postgres has a specified number of max connections available, and each one has some overhead to operate. Meanwhile, a serverlike nginx can handle 10k connections easily, and with little or no overhead. While the speed is comparable to theOS, you end up using a resource from a limited database connection pool. And you run the risk of a slow/dropped clienttying up the connection. Why allocate a resource to these operations, when there are more lightweight alternatives that won't tie up a database connection?
This! I'm surprised it took so long to somebody suggest an object store.
On Dec 17, 2014 9:22 PM, "Jonathan Vanasco" <postgres@2xlp.com> wrote:
I wouldn't even store it on the filesystem if I could avoid that.
Most people I know will assign the video a unique identifier (which is stored in the database) and then store the video file with a 3rd party (e.g. Amazon S3).
1. This is often cheaper. Videos take up a lot of disk space. Having to ensure 2-3 copies of a file as a failover is not fun.
2. It offloads work from internal servers. Why deal with connections that are serving a static file if you can avoid it?
In terms of FS vs DB (aside from the open vs streaming which was already brought up)
I think the big issue with storing large files in the database is the input/output connection.
Postgres has a specified number of max connections available, and each one has some overhead to operate. Meanwhile, a server like nginx can handle 10k connections easily, and with little or no overhead. While the speed is comparable to the OS, you end up using a resource from a limited database connection pool. And you run the risk of a slow/dropped client tying up the connection.
Why allocate a resource to these operations, when there are more lightweight alternatives that won't tie up a database connection ?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12/17/2014 07:37 PM, Arthur Silva wrote: > This! I'm surprised it took so long to somebody suggest an object store. I thought they did, a file system:) > > On Dec 17, 2014 9:22 PM, "Jonathan Vanasco" <postgres@2xlp.com > <mailto:postgres@2xlp.com>> wrote: > > > I wouldn't even store it on the filesystem if I could avoid that. > Most people I know will assign the video a unique identifier (which > is stored in the database) and then store the video file with a 3rd > party (e.g. Amazon S3). > > 1. This is often cheaper. Videos take up a lot of disk space. > Having to ensure 2-3 copies of a file as a failover is not fun. > 2. It offloads work from internal servers. Why deal with > connections that are serving a static file if you can avoid it? > > In terms of FS vs DB (aside from the open vs streaming which was > already brought up) > > I think the big issue with storing large files in the database is > the input/output connection. > Postgres has a specified number of max connections available, and > each one has some overhead to operate. Meanwhile, a server like > nginx can handle 10k connections easily, and with little or no > overhead. While the speed is comparable to the OS, you end up using > a resource from a limited database connection pool. And you run the > risk of a slow/dropped client tying up the connection. > Why allocate a resource to these operations, when there are more > lightweight alternatives that won't tie up a database connection ? > > > > -- -- Adrian Klaver adrian.klaver@aklaver.com