Opened 6 years ago

Closed 6 years ago

#1056 closed defect (fixed)

Naming of child items

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: blocker Milestone: Reggie v4.19
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

There seems to be a problem with the naming of new child items when this is mixed with deleting child items created earlier. For example, if we have a MergedSequences item and then run both the legacy pipeline and the Hisat pipeline, we get two MaskedSequecnces child items with suffix .k and .k2. Then, if there is a problem with the legacy pipeline that requires the .k subtree to be deleted, the next MaskedSequences item will get a .k2 suffix and now we have two items with the same name.

The problem seems to be that the suffix is created by a simple count of existing child items. This will only work if items are never deleted. I think the same issue is present in most of the places where new child items are created. We need to check this and create a stable fix.

It would also be a good idea to dump out the names of all items from the biosurce down to the rawbioassay level and check if there are any duplicates.

Change History (23)

comment:1 by Nicklas Nordborg, 6 years ago

(In [4880]) References #1056: Naming of child items

Marking up all places in the code (with @Deprecated and // #1056) that sets the name of items that may be affected by the naming problem. There seems to be two basic approaches:

  • Counting number of child items and adding +1 to get the next suffix. This is definitely not going to work if items are deleted. This approach is typically used for items that have a SCAN-B id prefix.
  • Finding the item with the highest suffix and adding +1 to get the next suffix. This will probably work in most cases. This approach is typically used for items which have their own naming strategy such as Pools, SequencingRuns, etc.

Both variants are implemented in multiple places and there may be some minor differences (for example, generating more than one name in a single batch).

In some cases, the name is generated and and used (saved to the database) immediately. In some cases, the name is passed to the wizard via JSON and not saved until later (this could lead to issues if the same wizard is used by more than one user at the same time).

There are also some other variants where the name seems to be generated by javascript in the browser and then submitted in the final registration step.

comment:2 by Nicklas Nordborg, 6 years ago

(In [4881]) References #1056: Naming of child items

Implemented a new algorithm for generating child item names. The ReggieItem.getNextChildItemName() method will check existing child item names and use the highest numeric suffix it can find to create the name for the next child item. It will also use the ReservedItems class to prevent multiple threads/transactions from creating the same name if they happen to run at the same time.

The new implementation is used in the Hisat/StringTie and Tophat/Cufflinks pipeline and seems to work as expected.

comment:3 by Nicklas Nordborg, 6 years ago

(In [4882]) References #1056: Naming of child items

The demux step now also uses the new child naming algorithm for MergedSequences and DemuxedSequences.

comment:4 by Nicklas Nordborg, 6 years ago

(In [4883]) References #1056: Naming of child items

Implemented a new algorithm for generating names for item types that have their own index sequence (eg. Pool, SequencingRun).

It is similar to the child name algorithm. A database query is used to find the item with the highest index which is then incremented. The generated names are then checked against reserved names in order to avoid duplicates if more than one thread is running at the same time.

comment:5 by Nicklas Nordborg, 6 years ago

(In [4884]) References #1056: Naming of child items

Found more code that should be checked.

comment:6 by Nicklas Nordborg, 6 years ago

(In [4885]) References #1056: Naming of child items

Fixed the implementation for mRNA, cDNA and Library child items.

comment:7 by Nicklas Nordborg, 6 years ago

(In [4886]) References #1056: Naming of child items

Fixed the implementation for temporary *.dil items used when pooling libraries. An unexpected problem was that calculations for the lab protocol and final registration only checked for .dil and stopped working when we got .dil2 etc. This has been fixed.

Also, the case summary and retraction servlet had checks that needed to be updated.

comment:8 by Nicklas Nordborg, 6 years ago

(In [4888]) References #1056: Naming of child items

Fixed the implementation for mRNA, cDNA and Library plates.

comment:9 by Nicklas Nordborg, 6 years ago

(In [4889]) References #1056: Naming of child items

Changing the name reservation strategy to also consider the current session and transaction. This change makes it possible to avoid "holes" in the name sequences if a wizard is aborted and then restarted, since an existing reservation can be replaced if the same name is checked again from the same session but a different transaction.

comment:10 by Nicklas Nordborg, 6 years ago

(In [4890]) References #1056: Naming of child items

Fixed the implementation for Flow cells and the pooled library aliquots used with them.

comment:11 by Nicklas Nordborg, 6 years ago

(In [4891]) References #1056: Naming of child items

Fixed the RNAQC items.

comment:12 by Nicklas Nordborg, 6 years ago

(In [4892]) References #1056: Naming of child items

Fixed for the Histology wizards.

comment:13 by Nicklas Nordborg, 6 years ago

(In [4893]) References #1056: Naming of child items

Fixed for the outtake wizards.

comment:14 by Nicklas Nordborg, 6 years ago

(In [4894]) References #1056: Naming of child items

Fixed retraction and consent wizards. Added a "prefix" setting to item subtypes to make it easier to handle patient-related subtypes (eg. 'No', 'Not asked', 'Retract', etc.). This change should also benefit other subtypes where the prefix is hard-coded into the servlets right now (eg. 'FlowCell', 'PooledLibrary', etc.).

comment:15 by Nicklas Nordborg, 6 years ago

(In [4895]) References #1056: Naming of child items

Use prefix from subtypes.

comment:16 by Nicklas Nordborg, 6 years ago

(In [4896]) References #1056: Naming of child items

Fixed for the partition wizard.

comment:17 by Nicklas Nordborg, 6 years ago

(In [4897]) References #1056: Naming of child items

Adding check for existing items that can be used in the final step of a wizard when the name was generated in an earlier step. This should prevent duplicates from being stored, but may cause an error in the final registration.

comment:18 by Nicklas Nordborg, 6 years ago

(In [4898]) References #1056: Naming of child items

Fixed for the specimen tube registration and the referral form regitration wizard. Storage boxes are also fixed in this change.

comment:19 by Nicklas Nordborg, 6 years ago

(In [4900]) References #1056: Naming of child items

Fixed for blood regitration wizards.

comment:20 by Nicklas Nordborg, 6 years ago

(In [4902]) References #1056: Naming of child items

External ID generation for Patient, Case, Blood and Specimen/NoSpecimen has been updated to use the same approach as for name generation. Basically the same as before but the common code has been moved to ReggieItem.

comment:21 by Nicklas Nordborg, 6 years ago

(In [4905]) References #1056: Naming of child items

Fixed for the extraction wizards. A special hack was needed for FlowThrough items due to an old mistake. Before changes made in #908 ([4095]) FlowThrough items were (incorrectly) registered with '.f' as suffix. After that '.ft' was used. Labels for tubes have always used '.ft'.

comment:22 by Nicklas Nordborg, 6 years ago

(In [4906]) References #1056: Naming of child items

Removing call to commit() since that could cause different names to be generated for RNA, DNA and FlowThrough if printing lab protocol or labels from a different computer.

comment:23 by Nicklas Nordborg, 6 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.