[MASSMAIL]Storing and comparing columns of cryptographic hashes? - Mailing list pgsql-general

From Josh Triplett
Subject [MASSMAIL]Storing and comparing columns of cryptographic hashes?
Date
Msg-id ZhP6OmFXob6W3uLK@localhost
Whole thread Raw
Responses Re: Storing and comparing columns of cryptographic hashes?  (Greg Sabino Mullane <htamfids@gmail.com>)
Re: Storing and comparing columns of cryptographic hashes?  (Justin <zzzzz.graf@gmail.com>)
List pgsql-general
I'm planning to store cryptographic hashes (BLAKE3) in a column of a
postgresql table. I'm going to be doing a large number of operations of
roughly this form:

- Receive a large number of hashes via a web API call.
- Check which hashes aren't already in the database.
- Send back a bitmap to the user of which hashes they need to send.
- Receive data from the user corresponding to those hashes.
- Store the data (not in postgresql).
- Add the new hashes to the database, along with a tiny value indicating
  what group of objects they were stored in.

A few questions:

- Is there a way to tell postgresql "this column contains cryptographic
  hashes, so you can do hash joins using any subset of the bits, without
  having to hash them again"? If not, should there be?
- Is `bit(256)` the right type to use to store 32-byte hash values with
  no additional overhead?
- What would be the simplest way, given an input array of hashes (which
  I may have to pass in as an array and use `unnest`), to filter out all
  the values that already exist, *and* generate a corresponding bitmap
  in the same order for present/not-present for the entire array (to
  send back to the user)? Filtering seems easy enough, but generating
  the bitmap less so.
- Does it make more sense to store the values as one row per value, or
  as one row per group of values? I know that postgresql can store an
  entire array in one column; could that efficiently support operations
  like "tell me which of these objects don't exist in any array in this
  column" or "for all of these objects, tell me all the group-id values
  for rows containing them"?

Thank you,
Josh Triplett



pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: What is referential_action?
Next
From: mark bradley
Date:
Subject: [MASSMAIL]PEM install error