Re: Varlena with recursive data structures? - Mailing list pgsql-general

From George Neuner
Subject Re: Varlena with recursive data structures?
Date
Msg-id f8ov3etf9jt1jc1pktqnbegj3loiovgm37@4ax.com
Whole thread Raw
In response to Varlena with recursive data structures?  (Sam Patterson <katoriasdev@gmail.com>)
List pgsql-general
On Wed, 16 Jan 2019 22:08:35 +0000, Sam Patterson
<katoriasdev@gmail.com> wrote:

>Hi all,
>
>I've recently started developing an extension for Postgres for which I'll
>need to create a new variable-length base type. The type will require a
>tree-like structure in order to parse sufficiently, which of course
>probably means having some sort of recursive data structure, like a struct
>that has members which are pointers to itself for child nodes. After doing
>some research, specifically looking at how other variable-length data types
>store their data, it seems almost all of them store the data in a binary
>representation, using bit masks and offsets etc in order to store/access
>the data whilst having an in-memory representation that's used to
>manipulate the data.
>
>I presume the purpose for using this approach is because all the data in a
>varlena type has to be contiguous, and the moment you start using pointers
>this is no longer possible. So my question is, given a structure that looks
>something like this,
>
>typedef struct Node
>{
>    char *data;
>    Node *left;
>    Node *right;
>} Node;
>
>am I right in saying that I wouldn't be able to store that representation
>on-disk, but instead I'd have to transform it into some binary
>representation and back again when writing/reading respectively, are there
>any alternatives?
>
>Regards,
>
>Karl


You might want to consider using a linear tree representation that can
be searched directly.  See:
https://en.wikipedia.org/w/index.php?title=Binary_tree§ion=9#Arrays


E.g.,

Make the tree nodes:  { unsigned data, left, right }
Store the nodes contiguously as an array, and represent the left/right
branches using simple array indexing.

Write the node data contiguously into a separate array/stream, keeping
an offset [or index] to each data element in its correponding tree
node.

And then write the node array and the data array/stream contiguously
onto the disk.


For reading/searching this structure can be used directly: read in the
homogenous node array, and suck the variable length data stream into a
separate buffer.

If it isn't important that the tree be optimally ordered, then insert
is simple: just append a new node to the array, append its data to the
data stream, and update branch indexes to link the new node at the
correct spot in the tree.

If you want the data to be as small as possible on disk, then update
or delete of a node requires repacking the data and adjusting the data
offsets of the "unchanged" nodes.
https://en.wikipedia.org/wiki/Mark-compact_algorithm

[Obviously, you can delete without repacking by just unlinking the
tree node and rewriting the node array.  The "deleted" node data is
left in place on disk but it becomes unreachable.]

If you can afford to (temporarily) sort the tree nodes by their data
offset, repacking the data can be done in the same buffer by sliding
each element forward to be contiguous with the previous.  If you need
to take the nodes in whatever order they occur, then to pack the data
you need to copy it to a new buffer.


Hope this helps.
George



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Varlena with recursive data structures?
Next
From: Brent Wood
Date:
Subject: Re: Can anyone please provide me list of customers using postgreSQL