On Mon, Feb 9, 2015 at 5:38 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, Feb 9, 2015 at 11:09 AM, Marco Nenciarini > <marco.nenciarini@2ndquadrant.it> wrote: >> >> Il 08/02/15 17:04, Magnus Hagander ha scritto: >> > >> > Filenames are now shown for attachments, including a direct link to the >> > attachment itself. I've also run a job to populate all old threads. >> > >> >> I wonder what is the algorithm to detect when an attachment is a patch. >> >> If you look at https://commitfest.postgresql.org/4/94/ all the >> attachments are marked as "Patch: no", but many of them are >> clearly a patch. > > It uses the "magic" module, same as the "file" command. And that one claims: > > mha@mha-laptop:/tmp$ file 0003-File-based-incremental-backup-v9.patch > 0003-File-based-incremental-backup-v9.patch: ASCII English text, with very > long lines > > I think it doesn't consider it a patch because it's not actually a patch - > it looks like a git-format actual email message that *contains* a patch. It > even includes the unix From separator line. So if anything it should have > detected that it's an email message, which it apparently doesn't. > > Picking from the very top patch on the cf, an actual patch looks like this: > > mha@mha-laptop:/tmp$ file psql_fix_uri_service_004.patch > psql_fix_uri_service_004.patch: unified diff output, ASCII text, with very > long lines
Can we make it smarter, so that the kinds of things people produce intending for them to be patches are thought by the CF app to be patches?
Doing it wouldn't be too hard, as the code right now is simply:
# Attempt to identify the file using magic information
mtype = mag.buffer(contents)
if mtype.startswith('text/x-diff'):
a.ispatch = True
else:
a.ispatch = False
(where mag is the API call into the magic module)
So we could easily add for example our own regexp parsing or so. The question is do we want to - because we'll have to maintain it as well. But I guess if we have a restricted enough set of rules, we can probably live with that.