Re: Pluggable toaster - Mailing list pgsql-hackers

From Nikita Malakhov
Subject Re: Pluggable toaster
Date
Msg-id CAN-LCVNOFTyCKKqNSOKwdOt4+CDo+KEy+mvNf5zw_oWNMjXNvg@mail.gmail.com
Whole thread Raw
In response to Re: Pluggable toaster  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: Pluggable toaster
List pgsql-hackers
Hi!

Thank you very much for the feedback.
Answering accordingly to questions/remarks order:
 
>I'm in the process of testing and reviewing the v23 patchset. So far I
>just wanted to report that there seems to be certain issues with the
>formatting and/or trailing whitespaces in the patches:

Accepted, will be done.

>There are also some compiler warnings, please see the attachment.

This too. There warnings showed up recently due to fresh changes in macros.

>1.1. Could you please clarify what is the motivation behind such a
>verbose syntax?

Toaster is set for the table column. Each TOASTable column could have
a different Toaster, so column option is the most obvious place to add it.

>1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
>strategies as well. On top of that, why should custom TOASTers have
>any knowledge of the default four-stage algorithm and the storage
>strategies? If the storage strategy is actually ignored, it shouldn't
>be used in the syntax.

EXTENDED storage strategy means that TOASTed value is compressed
before being TOASTed, so no knowledge of its internals could be of any
use. EXTERNAL strategy means that value is being TOASTed in original
form.  Storage strategy is the thing internal to AM used, and TOAST
mechanics is not meant to interfere with it. Again, STORAGE EXTERNAL
explicitly shows that value will be stored out-of-line.

>2. Although it's possible to implement some encryption in a TOASTer I
>don't think the documentation should advertise this.

It is a good example of what could the Toaster be responsible for, because
we proposed moving compression into Toasters as a TOAST option -
for example, being set while initializing the Toaster.

>3.1. I believe we should rename this to something like `struct
>ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
>not a routine.

It was done similar to Table AM Routine (please check Pluggable
Storage API), along with some other decisions.

>3.2. The names of the fields should be made consistent - e.g. if you
>have update_toast and copy_toast then del_toast should be renamed to
>delete_toast.
>3.2. Additionally, in some parts of the path del_toast is used, while
>in others - deltoast.

Accepted, will be fixed in the next rebase.

>4. The user documentation should have clear answers on the following questions:
>4.1. What will happen if the user tries to delete a TOASTer while
>still having data that was TOASTed using this TOASTer? Or this is not
>supported and the TOASTers should exist in the database indefinitely
>after creation?
>4.2. Is it possible to delete and/or alter the default TOASTer?
>4.3. Please make sure the previously discussed clarification regarding
>"N TOASTers vs M TableAMs" and the validation function is explicitly
>present.

Thank you very much for checking the documentation package. These are
very good comments, I'll include these topics in the next patchset.

>5.1. The documentation should clarify how many times init() is called
>- is it done once for the TOASTer, once per relation, etc.
>5.2. Why there is no free() method?

These too. About free() method - Toasters are not meant to be deleted,
we mentioned this several times. They exist once they are created as long
as the DB itself. Have I answered your question?

>6. It's not clear for what reason update_toast() will be executed to
>begin with. This should be clarified in the documentation. How does it
>differ from copy_toast()?

It is not clear because current TOAST mechanics does not have UPDATE
functionality - it doesn't actually update TOASTed value, it marks this value
"dead" and inserts a new one. This is the cause of TOAST tables bloating
that is being complained about by many users. Update method is provided
for implementation of UPDATE operation.

>7. It should be explicitly stated when validate() is called and what
>happens depending on the return value.

Accepted, will correct this topic.

>8. IMO this section does a very poor job in explaining what this is
>for and when I may want to use this; what particular problem are we
>addressing here?

I already answered this question, maybe the answer was not very clear.
This is just an extension entry point, because for some more advanced
functionality existing pre-defined set of methods would be not enough, i.e.
special Toasters for complex datatypes like JSONb, that have complex
internal structure and may require additional ways to interact with it along
toast/detoast/update/delete.

Also, I'll check README and documentation for typos.

On Thu, Nov 3, 2022 at 1:39 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi Nikita,

> I'm going to submit a more detailed code review soon.

As promised, here is my feedback.

```
+      Toaster could be assigned to toastable column with expression
+      <literal>STORAGE external TOASTER
<replaceable>toaster_name</replaceable></literal>
```

1.1. Could you please clarify what is the motivation behind such a
verbose syntax?

```
+Please note that custom Toasters could be assigned only to External
+storage type.
```

1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
strategies as well. On top of that, why should custom TOASTers have
any knowledge of the default four-stage algorithm and the storage
strategies? If the storage strategy is actually ignored, it shouldn't
be used in the syntax.

```
+Toasters could use any storage, advanced compression, encryption, and
```

2. Although it's possible to implement some encryption in a TOASTer I
don't think the documentation should advertise this.

```
+typedef struct TsrRoutine
+{
+    NodeTag        type;
+
+    /* interface functions */
+    toast_init init;
+    toast_function toast;
+    update_toast_function update_toast;
+    copy_toast_function copy_toast;
+    detoast_function detoast;
+    del_toast_function deltoast;
+    get_vtable_function get_vtable;
+    toastervalidate_function toastervalidate;
+} TsrRoutine;
```

3.1. I believe we should rename this to something like `struct
ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
not a routine.
3.2. The names of the fields should be made consistent - e.g. if you
have update_toast and copy_toast then del_toast should be renamed to
delete_toast.
3.2. Additionally, in some parts of the path del_toast is used, while
in others - deltoast.

4. The user documentation should have clear answers on the following questions:
4.1. What will happen if the user tries to delete a TOASTer while
still having data that was TOASTed using this TOASTer? Or this is not
supported and the TOASTers should exist in the database indefinitely
after creation?
4.2. Is it possible to delete and/or alter the default TOASTer?
4.3. Please make sure the previously discussed clarification regarding
"N TOASTers vs M TableAMs" and the validation function is explicitly
present.

```
Toaster initialization. Initial TOAST table creation and other startup
preparations.
```

5.1. The documentation should clarify how many times init() is called
- is it done once for the TOASTer, once per relation, etc.
5.2. Why there is no free() method?

```
Updates TOASTed value. Returns new TOAST pointer.
```

6. It's not clear for what reason update_toast() will be executed to
begin with. This should be clarified in the documentation. How does it
differ from copy_toast()?

```
Validates Toaster for given data type, storage and compression options.
```

7. It should be explicitly stated when validate() is called and what
happens depending on the return value.

```
+Virtual Functions Table API Extension
```

8. IMO this section does a very poor job in explaining what this is
for and when I may want to use this; what particular problem are we
addressing here?

9. There are typos in the comments and the documentation,
s/Vaitual/Virtual/, s/vurtual/virtual/ etc. Also there are missing
articles. Please run your patches through a spellchecker.

I suggest we address this piece of feedback before proceeding further.

--
Best regards,
Aleksander Alekseev


--
Regards,
Nikita Malakhov
Postgres Professional 

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: confirmed_flush_lsn shows LSN of the data that has not yet been received by the logical subscriber.
Next
From: David Geier
Date:
Subject: Add explicit casts in four places to simplehash.h