Greenmask v0.2.0b1 Release - Mailing list pgsql-announce

From Greenmask.io via PostgreSQL Announce
Subject Greenmask v0.2.0b1 Release
Date
Msg-id 171981733221.699.12562411608736373211@wrigleys.postgresql.org
Whole thread Raw
List pgsql-announce
 

Greenmask v0.2.0b1 Release

Greenmask v0.2.0b1 Release

PostgreSQL Logical Dump and Anonymization Tool

This major beta release introduces new features and refactored transformers, significantly enhancing Greenmask's flexibility to better meet business needs. Help us improve GreenMask and tailor it to meet community needs. We welcome your feedback in the release discussion on GitHub.

Greenmask Overview

Greenmask is a versatile open-source tool for database backup, anonymization, and restoration. Written in pure Go with ported PostgreSQL libraries, it is platform-independent and stateless, requiring no schema modifications. It is customizable and compatible with existing PostgreSQL utilities.

Greenmask is ideally suited for:

  • Routine backup and restoration tasks, ensuring data integrity and availability.
  • Anonymization and data masking for staging environments and analytics, protecting sensitive information while maintaining data utility.

Key features

  • Deterministic transformers — deterministic approach to data transformation based on the hash functions. This ensures that the same input data will always produce the same output data. Almost each transformer supports either random or hash engine making it universal for any use case.
  • Dynamic parameters — almost each transformer supports dynamic parameters, allowing to parametrize the transformer dynamically from the table column value. This is helpful for resolving the functional dependencies between columns and satisfying the constraints.
  • Database type safe - Ensures data integrity by validating data and utilizing the database driver for encoding and decoding operations. This approach guarantees the preservation of data formats.
  • Transformation validation and easy maintainable - During obfuscation development, Greenmask provides validation warnings and a transformation diff feature, allowing you to monitor and maintain transformations effectively throughout the software lifecycle.
  • Partitioned tables transformation inheritance - Define transformation configurations once and apply them to all partitions within partitioned tables, simplifying the obfuscation process.
  • Stateless - Greenmask operates as a logical dump and does not impact your existing database schema.
  • Backward compatible - It fully supports the same features and protocols as existing vanilla PostgreSQL utilities. Dumps created by Greenmask can be successfully restored using the pg_restore utility.
  • Extensible - Users have the flexibility to implement domain-based transformations in any programming language or use predefined templates.
  • Provide a variety of storage - Greenmask offers a variety of storage options for local and remote data storage, including directories and S3-like storage solutions.

Playground usage for the beta version

If you want to run a Greenmask playground for the beta version execute:

git checkout tags/v0.2.0b1 -b v0.2.0b1 docker-compose run greenmask-from-source

Changes overview

  • Introduced dynamic parameters in the transformers
    • Most transformers now support dynamic parameters where applicable.
    • Dynamic parameters are strictly enforced. If you need to cast values to another type, Greenmask provides templates and predefined cast functions accessible via cast_to. These functions cover frequent operations such as UnixTimestampToDate and IntToBool.
  • The transformation logic has been significantly refactored, making transformers more customizable and flexible than before.
  • Introduced transformation engines

    • random - generates transformer values based on pseudo-random algorithms.
    • hash - generates transformer values using hash functions. Currently, it utilizes sha3 hash functions, which are secure but perform slowly. In the stable release, there will be an option to choose between sha3 and SipHash.
  • Introduced static parameters value template

Notable changes

Core

  • Introduced the Parametrizer interface, now implemented for both dynamic and static parameters.
  • Renamed most of the toolkit types for enhanced clarity and comprehensive documentation coverage.
  • Refactored the Driver initialization logic.
  • Added validation warnings for overridden types in the Driver.
  • Migrated existing built-in transformers to utilize the new Parametrizer interface.
  • Implemented a new abstraction, TransformationContext, as the first step towards enabling new feature transformation conditions (#34).
  • Optimized most transformers for performance in both dynamic and static modes. While dynamic mode offers flexibility, static mode ensures performance remains high. Using only the necessary transformation features helps keep transformation time predictable.

Documentation

Documentation has been significantly refactored. New information about features and updates to transformer descriptions have been added.

Transformers

  • RandomEmail - Introduces a new transformer that supports both random and deterministic engines. It allows for flexible email value generation; you can use column values in the template and choose to keep the original domain or select any from the domains parameter.

  • NoiseDate, NoiseFloat, NoiseInt - These transformers support both random and deterministic engines, offering dynamic mode parameters that control the noise thresholds within the min and max range. Unlike previous implementations which used a single ratio parameter, the new release features min_ratio and max_ratio parameters to define noise values more precisely. Utilizing the hash engine in these transformers enhances security by complicating statistical analysis for attackers, especially when the same salt is used consistently over long periods.

  • NoiseNumeric - A newly implemented transformer, sharing features with NoiseInt and NoiseFloat, but specifically designed for numeric values (large integers or floats). It provides a decimal parameter to handle values with fractions.

  • RandomChoice - Now supports the hash engine

  • RandomDate, RandomFloat, RandomInt - Now enhanced with hash engine support. Threshold parameters min and max have been updated to support dynamic mode, allowing for more flexible configurations.

  • RandomNumeric - A new transformer specifically designed for numeric types (large integers or floats), sharing similar features with RandomInt and RandomFloat, but tailored for handling huge numeric values.

  • RandomString - Now supports hash engine mode

  • RandomUnixTimestamp - This new transformer generates Unix timestamps with selectable units (second, millisecond, microsecond, nanosecond). Similar in function to RandomDate, it supports the hash engine and dynamic parameters for min and max thresholds, with the ability to override these units using min_unit and max_unit parameters.

  • RandomUuid - Added hash engine support

  • RandomPerson - Implemented a new transformer that replaces RandomName, RandomLastName, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomTitleMale, and RandomTitleFemale. This new transformer offers enhanced customizability while providing similar functionalities as the previous versions. It generates personal data such as FirstName, LastName, and Title, based on the provided gender parameter, which now supports dynamic mode. Future minor versions will allow for overriding the default names database.

  • Added tsModify - a new template function for time.Time objects modification

  • Introduced a new RandomIp transformer capable of generating a random IP address based on the specified netmask.

  • Added a new RandomMac transformer for generating random Mac addresses.

  • Deleted transformers include RandomMacAddress, RandomIPv4, RandomIPv6, RandomUnixTime, RandomTitleMale, RandomTitleFemale, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomLastName, and RandomName due to the introduction of more flexible and unified options.

Useful Links

 

pgsql-announce by date:

Previous
From: Gilles Darold via PostgreSQL Announce
Date:
Subject: pg_dumpbinary v2.18 released
Next
From: PostgreSQL Europe via PostgreSQL Announce
Date:
Subject: PGConf.EU 2024 Registration is open