On 2/6/25 11:25, Robert Leach wrote:
>> Have you considered a validation app? Have it read the inputs and look in db for conflicts, rather than attempt the
insert.Zero transactions necessary
>
>
> I did consider that about a year or two ago when I first conceived the data validation interface. Doing that now
wouldsolve the problem of concurrent validations blocking one another, but selects would still get blocked if their
resultis pending the outcome of a load running in the background. If I mitigate that issue by running those loads over
nighton a schedule, I still lose out on the benefits of having the loading code do the validation for me...
>
> I would have to explicitly find and report on problems that the load exceptions currently do for me, without any
extracode.
>
> So what I'm saying is that the data being validated is inter-dependent. There are about a dozen different loading
scripts(one for each sheet of the uploaded excel file) whose runs are orchestrated by a master load script that ensures
theyare run in the right order so that the interdependent data can be checked. For example, these are some relative
ordersof what needs to be loaded so that data can be validated:
>
> Study > Animals
> Tracers > Infusates > Animals
> Treatments > Animals
> Tissues > Samples
>
> The Animal load script would fail if the new data in (the tables) Study, Tracers, Infusates, and Treatments aren't
inserted,because it links to those newly created records. And there's no way to detect problems in those new
relationshipsin the unchanged database if they aren't inserted. That's what doing this all in a transaction, and
actuallydoing the inserts (for which I use Django `get_or_create` method calls) provides.
>
> In other words, I would have to save and explicitly check the inter-related sheet data in data structures independent
ofthe database in order to find the equivalent of (for example) `ObjectDoesNotExist` errors that originate from the
database. Right now, I get those errors caught "for free". All I have to do is tell the user what sheet/row/column is
relatedto that error. And it saves me the overhead of having to maintain synchronicity between separate validation
codeand loading code when the loading code changes.
Seems to me this could be dealt with using a schema named validate that
contains 'shadow' tables of those in the live schema(s). Import into
their and see what fails.
>
> Robert William Leach
> Research Software Engineer
> 133 Carl C. Icahn Lab
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
> Princeton, NJ 08544
>
--
Adrian Klaver
adrian.klaver@aklaver.com