Thread: pgarchives: Bug report + Patches: loader can't handle message in multiple lists

pgarchives: Bug report + Patches: loader can't handle message in multiple lists

From
Célestin Matte
Date:
The messages loader from pgarchives may crash when importing 2 mailing lists with messages in common. In another word:
importingscript will fail when importing list1 and list2 if a message to list1 has list2 in CC:.
 

I attach a patch (0001-loader-attempt-to-handle-message-in-multiple-lists.patch) that starts addressing the issue, but
doesnot fully fixes it, as the script can later crash (in storage.py line 234) because a message cannot be imported
twice.Fixing this would require changing the way messages are stored in the database, using (messageid, listid) as a
primarykey instead of messageid, or allowing a message to belong to several threads using message_thread table instead
ofa threadid column in messages.
 
This patch is only for discussion.

I also attach patch 0001-load_message-catch-postgres-UniqueViolation-errors-w.patch as a workaround to this issue, to
catchand log such errors and keep importing an mbox without crashing when it happens (message will then only appear in
thefirst imported list).
 
This patch can be used/applied.
-- 
Célestin Matte
Attachment
On Wed, Mar 22, 2023 at 4:13 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
>
> The messages loader from pgarchives may crash when importing 2 mailing lists with messages in common. In another
word:importing script will fail when importing list1 and list2 if a message to list1 has list2 in CC:. 

We certainly imported many thousands of such messages when we
initially loaded the postgres archives, so at *some* point it worked
-- it must be some change that came in later that broke it.


> I attach a patch (0001-loader-attempt-to-handle-message-in-multiple-lists.patch) that starts addressing the issue,
butdoes not fully fixes it, as the script can later crash (in storage.py line 234) because a message cannot be imported
twice.Fixing this would require changing the way messages are stored in the database, using (messageid, listid) as a
primarykey instead of messageid, or allowing a message to belong to several threads using message_thread table instead
ofa threadid column in messages. 
> This patch is only for discussion.

How does it even get that far?  The code on line 32 of storage.py will
check if the message is already in the archives. How does it even get
to the point of trying to insert the message again?

I don't see why you'd need a different storage -- the entire point in
this case is that the message is stored only once, but it's tagged
with being on both lists (in list_threads). That is, a message belongs
to a thread, and the thread can belong to one or more lists.


> I also attach patch 0001-load_message-catch-postgres-UniqueViolation-errors-w.patch as a workaround to this issue, to
catchand log such errors and keep importing an mbox without crashing when it happens (message will then only appear in
thefirst imported list). 
> This patch can be used/applied.

I'm not sure I like the idea of committing after every message and
then rolling back in the event of an error. If nothing else, this
should be done using a savepoint instead of a complete transaction.


--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



> How does it even get that far?  The code on line 32 of storage.py will
> check if the message is already in the archives. How does it even get
> to the point of trying to insert the message again?

Because my patch bypasses this by checking whether the message exists *in the given list*

This was the beginning of an attempt to fix a crash I had. As far as I remember, the crash was that the message already
existed.

But as said in previous email, this was not sufficient to solve the issue.

> I'm not sure I like the idea of committing after every message and
> then rolling back in the event of an error. If nothing else, this
> should be done using a savepoint instead of a complete transaction.

Attached
-- 
Célestin Matte


Attachment
On Mon, Jun 5, 2023 at 3:39 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
>
> > How does it even get that far?  The code on line 32 of storage.py will
> > check if the message is already in the archives. How does it even get
> > to the point of trying to insert the message again?
>
> Because my patch bypasses this by checking whether the message exists *in the given list*
>
> This was the beginning of an attempt to fix a crash I had. As far as I remember, the crash was that the message
alreadyexisted. 
>
> But as said in previous email, this was not sufficient to solve the issue.

OK, I think I need to take a step back to try to figure out what's
actually wrong here.

What is it you are actually trying to accomplish? Are you trying to
remove the single-store functionality for emails? As mentioned, the
whole design of the system is done around that a single email should
only be stored once - but it sounds like you're removing that for some
reason? Or am I misunderstanding?

It sounds to me like your patch that bypasses the check is either
broken or misguided, since that's what actually causes the problem?


> > I'm not sure I like the idea of committing after every message and
> > then rolling back in the event of an error. If nothing else, this
> > should be done using a savepoint instead of a complete transaction.
>
> Attached

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



> OK, I think I need to take a step back to try to figure out what's
> actually wrong here.
> 
> What is it you are actually trying to accomplish? Are you trying to
> remove the single-store functionality for emails? As mentioned, the
> whole design of the system is done around that a single email should
> only be stored once - but it sounds like you're removing that for some
> reason? Or am I misunderstanding?

Yes, because during my tests, the import script crashed when trying to import a message that belongs to two different
listsat once.
 
I'm having trouble reproducing the crash as properly cleaning the database would require some work. Not sure it's the
crashI had back then, but I now have something like this:
 
Traceback (most recent call last):
   File "/path/pgarchives/local//loader/load_message.py", line 158, in <module>
     ap.store(conn, listid, opt.overwrite, opt.overwrite)
   File "/path/pgarchives/local/loader/lib/storage.py", line 216, in store
     'listid': listid,
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "list_threads_pkey"
DETAIL:  Key (threadid)=(21) already exists.

After a bit of digging, the workaround I came around with was to store each email once *for each list* instead of just
once(which did not work).
 

I just performed more tests, and this does not seem to impact regular import of incoming message. I guess the impact is
limited.

> It sounds to me like your patch that bypasses the check is either
> broken or misguided, since that's what actually causes the problem?
> 
> 
>>> I'm not sure I like the idea of committing after every message and
>>> then rolling back in the event of an error. If nothing else, this
>>> should be done using a savepoint instead of a complete transaction.
>>
>> Attached
> 

-- 
Célestin Matte




On Mon, Jun 12, 2023 at 5:30 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
>
> > OK, I think I need to take a step back to try to figure out what's
> > actually wrong here.
> >
> > What is it you are actually trying to accomplish? Are you trying to
> > remove the single-store functionality for emails? As mentioned, the
> > whole design of the system is done around that a single email should
> > only be stored once - but it sounds like you're removing that for some
> > reason? Or am I misunderstanding?
>
> Yes, because during my tests, the import script crashed when trying to import a message that belongs to two different
listsat once. 
> I'm having trouble reproducing the crash as properly cleaning the database would require some work. Not sure it's the
crashI had back then, but I now have something like this: 
> Traceback (most recent call last):
>    File "/path/pgarchives/local//loader/load_message.py", line 158, in <module>
>      ap.store(conn, listid, opt.overwrite, opt.overwrite)
>    File "/path/pgarchives/local/loader/lib/storage.py", line 216, in store
>      'listid': listid,
> psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "list_threads_pkey"
> DETAIL:  Key (threadid)=(21) already exists.

That looks like your primary key on list_threads is wrong. The primary
key should be on (threadid, listid) but based on that error message it
looks like it's on (threadid).

The SQL scrpit for generating the db has:
CREATE TABLE list_threads(
   threadid int NOT NULL, /* comes from threadid_seq */
   listid int NOT NULL REFERENCES lists(listid),
   CONSTRAINT pg_list_threads PRIMARY KEY (threadid, listid)
);

But it sounds like you somehow ended up with the wrong index there?
(There should be no index named list_threads_pkey -- the name of that
primary key constraint is very strange and probably a typo in there,
but in particular it's different)

Do you still have the wrong index there, or has it been fixed at some
other time? If it has, that could explain the inability to
reproduce...


> After a bit of digging, the workaround I came around with was to store each email once *for each list* instead of
justonce (which did not work). 

Yeah, that is clearly an incorrect fix :/

//Magnus



>> Traceback (most recent call last):
>>     File "/path/pgarchives/local//loader/load_message.py", line 158, in <module>
>>       ap.store(conn, listid, opt.overwrite, opt.overwrite)
>>     File "/path/pgarchives/local/loader/lib/storage.py", line 216, in store
>>       'listid': listid,
>> psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "list_threads_pkey"
>> DETAIL:  Key (threadid)=(21) already exists.
> 
> That looks like your primary key on list_threads is wrong. The primary
> key should be on (threadid, listid) but based on that error message it
> looks like it's on (threadid).

Hmm. This is due to my deployment relying on my patch to merge schema.sql into a Django model (review still pending!
[1]).
As Django does not support composite primary keys [2], I had to use unique_together instead:

migrations.CreateModel(
             name='ListThreads',
             fields=[
                 ('threadid', models.IntegerField(primary_key=True, serialize=False)),
                 ('listid', models.ForeignKey(db_column='listid', on_delete=django.db.models.deletion.CASCADE,
to='mailarchives.List')),
             ],
             options={
                 'db_table': 'list_threads',
                 'unique_together': {('threadid', 'listid')},
             },
         ),

Which leads to the following indexes being created:
Indexes:
     "list_threads_pkey" PRIMARY KEY, btree (threadid)
     "list_threads_listid_44de2a94" btree (listid)
     "list_threads_listid_idx" btree (listid)
     "list_threads_threadid_listid_226e84bf_uniq" UNIQUE CONSTRAINT, btree (threadid, listid)


Not sure how to fix this. Inputs on previously mentioned patch would be helpful!


[1] https://www.postgresql.org/message-id/12eb75f0-3fc2-14f3-0931-4f29e145f182%40cmatte.me
[2] https://code.djangoproject.com/ticket/373
-- 
Célestin Matte




On Thu, Jul 13, 2023 at 3:36 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
>
> >> Traceback (most recent call last):
> >>     File "/path/pgarchives/local//loader/load_message.py", line 158, in <module>
> >>       ap.store(conn, listid, opt.overwrite, opt.overwrite)
> >>     File "/path/pgarchives/local/loader/lib/storage.py", line 216, in store
> >>       'listid': listid,
> >> psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "list_threads_pkey"
> >> DETAIL:  Key (threadid)=(21) already exists.
> >
> > That looks like your primary key on list_threads is wrong. The primary
> > key should be on (threadid, listid) but based on that error message it
> > looks like it's on (threadid).
>
> Hmm. This is due to my deployment relying on my patch to merge schema.sql into a Django model (review still pending!
[1]).
> As Django does not support composite primary keys [2], I had to use unique_together instead:
>
> migrations.CreateModel(
>              name='ListThreads',
>              fields=[
>                  ('threadid', models.IntegerField(primary_key=True, serialize=False)),
>                  ('listid', models.ForeignKey(db_column='listid', on_delete=django.db.models.deletion.CASCADE,
to='mailarchives.List')),
>              ],
>              options={
>                  'db_table': 'list_threads',
>                  'unique_together': {('threadid', 'listid')},
>              },
>          ),
>
> Which leads to the following indexes being created:
> Indexes:
>      "list_threads_pkey" PRIMARY KEY, btree (threadid)
>      "list_threads_listid_44de2a94" btree (listid)
>      "list_threads_listid_idx" btree (listid)
>      "list_threads_threadid_listid_226e84bf_uniq" UNIQUE CONSTRAINT, btree (threadid, listid)
>
>
> Not sure how to fix this. Inputs on previously mentioned patch would be helpful!

Yeah, that patch is then clearly broken :/ The unique_together part is
correct. And I'm not sure why that should lead to *two* indexes on
listid - that looks like a django bug, if that creation is purely from
that model.

You can't set threadid to the primary key. You have to either:

1. Add another column that's autonumbering and use that as primary key.
2. Use migrations.RunSQL() to create the table instead of
migrations.CreateModel().

I would recommend #2, as #1 is an ugly workaround to the limitations
of the django ORM - and limitations that we don't hit anywhere else
since we don't use it that way.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



> Yeah, that patch is then clearly broken :/ The unique_together part is
> correct. And I'm not sure why that should lead to *two* indexes on
> listid - that looks like a django bug, if that creation is purely from
> that model.

Indeed, this comes from the patch. Sending a fix on the original thread.
I'd appreciate a more complete review of the patch so that we can move on with it.

-- 
Célestin Matte