Opened 6 years ago
Closed 6 years ago
#1056 closed defect (fixed)
Naming of child items
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | blocker | Milestone: | Reggie v4.19 |
Component: | net.sf.basedb.reggie | Keywords: | |
Cc: |
Description
There seems to be a problem with the naming of new child items when this is mixed with deleting child items created earlier. For example, if we have a MergedSequences
item and then run both the legacy pipeline and the Hisat pipeline, we get two MaskedSequecnces
child items with suffix .k
and .k2
. Then, if there is a problem with the legacy pipeline that requires the .k
subtree to be deleted, the next MaskedSequences
item will get a .k2
suffix and now we have two items with the same name.
The problem seems to be that the suffix is created by a simple count of existing child items. This will only work if items are never deleted. I think the same issue is present in most of the places where new child items are created. We need to check this and create a stable fix.
It would also be a good idea to dump out the names of all items from the biosurce down to the rawbioassay level and check if there are any duplicates.
Change History (23)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
(In [4881]) References #1056: Naming of child items
Implemented a new algorithm for generating child item names. The ReggieItem.getNextChildItemName()
method will check existing child item names and use the highest numeric suffix it can find to create the name for the next child item. It will also use the ReservedItems
class to prevent multiple threads/transactions from creating the same name if they happen to run at the same time.
The new implementation is used in the Hisat/StringTie and Tophat/Cufflinks pipeline and seems to work as expected.
comment:3 by , 6 years ago
comment:4 by , 6 years ago
(In [4883]) References #1056: Naming of child items
Implemented a new algorithm for generating names for item types that have their own index sequence (eg. Pool, SequencingRun).
It is similar to the child name algorithm. A database query is used to find the item with the highest index which is then incremented. The generated names are then checked against reserved names in order to avoid duplicates if more than one thread is running at the same time.
comment:5 by , 6 years ago
comment:6 by , 6 years ago
comment:7 by , 6 years ago
(In [4886]) References #1056: Naming of child items
Fixed the implementation for temporary *.dil items used when pooling libraries. An unexpected problem was that calculations for the lab protocol and final registration only checked for .dil
and stopped working when we got .dil2
etc. This has been fixed.
Also, the case summary and retraction servlet had checks that needed to be updated.
comment:8 by , 6 years ago
comment:9 by , 6 years ago
(In [4889]) References #1056: Naming of child items
Changing the name reservation strategy to also consider the current session and transaction. This change makes it possible to avoid "holes" in the name sequences if a wizard is aborted and then restarted, since an existing reservation can be replaced if the same name is checked again from the same session but a different transaction.
comment:10 by , 6 years ago
comment:11 by , 6 years ago
comment:12 by , 6 years ago
comment:13 by , 6 years ago
comment:14 by , 6 years ago
(In [4894]) References #1056: Naming of child items
Fixed retraction and consent wizards. Added a "prefix" setting to item subtypes to make it easier to handle patient-related subtypes (eg. 'No', 'Not asked', 'Retract', etc.). This change should also benefit other subtypes where the prefix is hard-coded into the servlets right now (eg. 'FlowCell', 'PooledLibrary', etc.).
comment:15 by , 6 years ago
comment:16 by , 6 years ago
comment:17 by , 6 years ago
comment:18 by , 6 years ago
comment:19 by , 6 years ago
comment:20 by , 6 years ago
(In [4902]) References #1056: Naming of child items
External ID generation for Patient, Case, Blood and Specimen/NoSpecimen has been updated to use the same approach as for name generation. Basically the same as before but the common code has been moved to ReggieItem
.
comment:21 by , 6 years ago
(In [4905]) References #1056: Naming of child items
Fixed for the extraction wizards. A special hack was needed for FlowThrough items due to an old mistake. Before changes made in #908 ([4095]) FlowThrough items were (incorrectly) registered with '.f' as suffix. After that '.ft' was used. Labels for tubes have always used '.ft'.
comment:22 by , 6 years ago
comment:23 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [4880]) References #1056: Naming of child items
Marking up all places in the code (with
@Deprecated
and// #1056
) that sets the name of items that may be affected by the naming problem. There seems to be two basic approaches:Both variants are implemented in multiple places and there may be some minor differences (for example, generating more than one name in a single batch).
In some cases, the name is generated and and used (saved to the database) immediately. In some cases, the name is passed to the wizard via JSON and not saved until later (this could lead to issues if the same wizard is used by more than one user at the same time).
There are also some other variants where the name seems to be generated by javascript in the browser and then submitted in the final registration step.