Thread: How to get an md5/sha256 hash of a really large object in psql?
Hello,
In my DB I have a large object over 4GB in size.
I need to get its MD5 or SHA256 from within psql query, i.e. without exporting it to FS first.
“SELECT md5(lo_get(loid));” doesnt work — “large object is too large”.
Is there any other way to do it?
Regards,
Al
> On 29/07/2023 08:42 CEST Alex Shan <3341018@gmail.com> wrote: > > In my DB I have a large object over 4GB in size. > I need to get its MD5 or SHA256 from within psql query, i.e. without > exporting it to FS first. > > “SELECT md5(lo_get(loid));” doesnt work — “large object is too large”. > > Is there any other way to do it? Is plpython3u [1] an option for you? In that case you can use Python's hashlib in a custom function and feed every page from pg_largeobject to a selected hash function. For example: BEGIN; CREATE EXTENSION plpython3u; CREATE OR REPLACE FUNCTION lo_hash(loid oid, name text) RETURNS bytea LANGUAGE plpython3u AS $$ import hashlib hash = hashlib.new(name) # Check if large object exists. plan = plpy.prepare(""" SELECT FROM pg_largeobject_metadata WHERE oid = $1 """, ['oid']) rv = plpy.execute(plan, [loid]) if rv.nrows() == 0: raise ValueError(f"large object {loid} does not exist") # Get all pages (possibly zero). plan = plpy.prepare(""" SELECT data FROM pg_largeobject WHERE loid = $1 ORDER BY pageno """, ['oid']) pages = plpy.cursor(plan, [loid]) for page in pages: hash.update(page['data']) return hash.digest() $$; COMMIT; Testing with 65 KiB null bytes: BEGIN; SELECT lo_from_bytea(0, decode(repeat('00', 1 << 16), 'hex')) AS test_loid \gset SELECT loid, count(*) AS n_pages, sum(length(data)) AS n_bytes FROM pg_largeobject WHERE loid = :test_loid GROUP BY loid; loid | n_pages | n_bytes --------+---------+--------- 365958 | 32 | 65536 (1 row) SELECT :test_loid AS loid, lo_hash(:test_loid, 'md5') AS md5; loid | md5 --------+------------------------------------ 365958 | \xfcd6bcb56c1689fcef28b57c22475bad (1 row) SELECT :test_loid AS loid, lo_hash(:test_loid, 'sha256') AS sha256; loid | sha256 --------+-------------------------------------------------------------------- 365958 | \xde2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31 (1 row) ROLLBACK; Verifying the hashes: $ head -c65536 /dev/zero | md5sum fcd6bcb56c1689fcef28b57c22475bad - $ head -c65536 /dev/zero | sha256sum de2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31 - [1] https://www.postgresql.org/docs/15/plpython.html -- Erik
Great, thanks a lot!
I will test it on my system.
Myself, I tried to do it in C with libpq, but got stuck at reading a LO...
On Sat, 29 Jul 2023 at 19:57, Erik Wienhold <ewie@ewie.name> wrote:
> “SELECT md5(lo_get(loid));” doesnt work — “large object is too large”.
>
> Is there any other way to do it?
Is plpython3u [1] an option for you? In that case you can use Python's hashlib
in a custom function and feed every page from pg_largeobject to a selected hash
function.
...
--
Erik