Thread: easy way to acquire height / width from images (PNG, JPEG) stored asbytea?
easy way to acquire height / width from images (PNG, JPEG) stored asbytea?
From
Achilleas Mantzios
Date:
Hello Dear List, we have a table holding email attachments as bytea, and we would like to filter out images of small dimensions, which are not of any value to our logic. I took a look at pg_image extension, tested it, and it proved problematic, it killed my 200+ days uptime FreeBSD box :( . I dropped the extension and uninstalled this as soon as fsck finally finished. So I would like to ask you, basically we have PNGs and JPEGs, is there an easy way of parsing their headers and getting info about their dimensions? I could write a C function for that. For PNG it is quite easy but for JPEG it gets a little bit complicated, albeit doable, just asking for something out of the box. Currently we load images (in our java enterprise system) and filter them in Java, but this brings wildfly down to its knees pretty easy and quickly. Thank you and happy Easter.
Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?
From
Adam Brusselback
Date:
Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.
You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.
Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?
From
Achilleas Mantzios
Date:
On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:
Yes I thought of that, but those are coming automatically from our mail server (via synonym), we have written an alias : a program that parses and stores emails. This is generic, I wouldn't like to add specific code (or specific columns) just for image attachments. However I dig the idea of the indexes.Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.
As I describe above, those attachments are nowhere as files. They are email attachments. Also we got about half TB of them.You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.
Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?
From
Steve Atkins
Date:
On 17/04/2020 13:37, Achilleas Mantzios wrote: > Hello Dear List, > > we have a table holding email attachments as bytea, and we would like > to filter out images of small dimensions, which are not of any value > to our logic. > > I took a look at pg_image extension, tested it, and it proved > problematic, it killed my 200+ days uptime FreeBSD box :( . I dropped > the extension and uninstalled this as soon as fsck finally finished. If running an extension crashed your server you should look at how / why, especially if it corrupted your filesystem. That shouldn't happen on a correctly configured system, so the underlying issue might cause you other problems. Crashing postgresql, sure, but not anything that impacts the rest of the server. Cheers, Steve
Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?
From
Achilleas Mantzios
Date:
On 17/4/20 5:47 μ.μ., Steve Atkins wrote: > > If running an extension crashed your server you should look at how / > why, especially if it corrupted your filesystem. > That shouldn't happen on a correctly configured system, so the > underlying issue might cause you other problems. Crashing postgresql, > sure, but not anything that impacts the rest of the server. > Hello, This machine runs several extensions with no issues (even pljava for Christ's sake, our heavy modified version of DBMirror, and lots of our own C functions included among others), two bhyve VMs running ubuntu, and one jail. + it functions as my workstation as well (wildfly, eclipse, etc). And it can run for years, without reboot. Apparently lousy memory management (consumed all 32GB of RAM + 8GB swap) by pg_image didn't crash postgresql but brought the system to its knees. Plus this extension was lastly touched in 2013, go figure. > Cheers, > Steve > > >
> it killed my 200+ days uptime FreeBSD box :( .
> As I describe above, those attachments are nowhere as files. > They are email attachments. Also we got about half TB of them.
it is possible - that some image is a "decompression bomb" ?
"Because of the efficient compression method used in Portable Network Graphics (PNG) files, a small PNG file can expand tremendously, acting as a "decompression bomb". Malformed PNG chunks can consume a large amount of CPU and wall-clock time and large amounts of memory, up to all memory available on a system, causing a Denial of Service (DoS). Libpng-1.4.1 has been revised to use less CPU time and memory, and provides functions that applications can use to further defend against such files."
Regards,
Imre
Achilleas Mantzios <achill@matrix.gatewaynet.com> ezt írta (időpont: 2020. ápr. 17., P, 16:39):
On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:
Yes I thought of that, but those are coming automatically from our mail server (via synonym), we have written an alias : a program that parses and stores emails. This is generic, I wouldn't like to add specific code (or specific columns) just for image attachments. However I dig the idea of the indexes.Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.As I describe above, those attachments are nowhere as files. They are email attachments. Also we got about half TB of them.You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.
Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?
From
Achilleas Mantzios
Date:
On 17/4/20 6:14 μ.μ., Imre Samu wrote:
> it killed my 200+ days uptime FreeBSD box :( .> As I describe above, those attachments are nowhere as files.> They are email attachments. Also we got about half TB of them.it is possible - that some image is a "decompression bomb" ?"Because of the efficient compression method used in Portable Network Graphics (PNG) files, a small PNG file can expand tremendously, acting as a "decompression bomb". Malformed PNG chunks can consume a large amount of CPU and wall-clock time and large amounts of memory, up to all memory available on a system, causing a Denial of Service (DoS). Libpng-1.4.1 has been revised to use less CPU time and memory, and provides functions that applications can use to further defend against such files."Regards,Imre
Thank you a lot Imre. Great info.
Achilleas Mantzios <achill@matrix.gatewaynet.com> ezt írta (időpont: 2020. ápr. 17., P, 16:39):On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:
Yes I thought of that, but those are coming automatically from our mail server (via synonym), we have written an alias : a program that parses and stores emails. This is generic, I wouldn't like to add specific code (or specific columns) just for image attachments. However I dig the idea of the indexes.Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.As I describe above, those attachments are nowhere as files. They are email attachments. Also we got about half TB of them.You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.