-
Notifications
You must be signed in to change notification settings - Fork 23
#328 - Remove annotations in a specified range #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- added functions for removing annoations from cas - cut_sofa_string_to_range - remove_annotations_in_range - added tests
|
Thanks for the PR. "Normally" (i.e. in particular in Java UIMA), the sofa is considered to be immutable once set. Have you considered generating a new CAS that contains only the cut-out section instead of doing an in-place modification? |
|
I didn't expect such a quick response, thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds two convenience methods to the Cas class for managing annotations based on their position in the sofa string: cut_sofa_string_to_range and remove_in_range.
Key changes:
cut_sofa_string_to_range: Cuts the sofa string to a specified range and removes/adjusts annotations accordinglyremove_in_range: Removes annotations within a specified range, optionally filtered by type- Comprehensive test coverage for both methods with different scenarios
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| cassis/cas.py | Adds two new methods: cut_sofa_string_to_range for cutting sofa string with annotation adjustment, and remove_in_range for removing annotations in a range |
| tests/test_cas.py | Adds four test functions covering different scenarios for both new methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| self.add_all(annotations) | ||
|
|
||
| def cut_sofa_string_to_range(self, sofa_begin:int, sofa_end:int, overlap=True): |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after colon in type annotations. Should be sofa_begin: int and sofa_end: int to comply with PEP 8 style guidelines.
| def cut_sofa_string_to_range(self, sofa_begin:int, sofa_end:int, overlap=True): | |
| def cut_sofa_string_to_range(self, sofa_begin: int, sofa_end: int, overlap=True): |
| self.remove(annotation) | ||
|
|
||
|
|
||
| def remove_in_range(self, cut_begin:int, cut_end:int, type_: Union[Type, str]=None): |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing spaces around type annotations. Should be cut_begin: int, cut_end: int, and type_: Union[Type, str] = None to comply with PEP 8 style guidelines.
| def remove_in_range(self, cut_begin:int, cut_end:int, type_: Union[Type, str]=None): | |
| def remove_in_range(self, cut_begin: int, cut_end: int, type_: Union[Type, str] = None): |
| annotation.end = annotation.end - sofa_begin | ||
| elif overlap and sofa_begin < annotation.end <= sofa_end: | ||
| annotation.begin = 0 | ||
| annotation.end = sofa_end - sofa_begin |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect calculation for overlapping annotation end position. When an annotation starts before sofa_begin but ends within the range, the end should be annotation.end - sofa_begin, not sofa_end - sofa_begin. The current code incorrectly extends the annotation to the full cutout range.
| annotation.end = sofa_end - sofa_begin | |
| annotation.end = annotation.end - sofa_begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this suggestion.
| annotations = self.select_all() if type_ is None else self.select(type_) | ||
| if 0 <= cut_begin < cut_end <= len(self.sofa_string): | ||
| for annotation in annotations: | ||
| if cut_begin <= annotation.begin < annotation.end <= cut_end: | ||
| self.remove(annotation) |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating and removing from collection during iteration can cause issues. When using self.select_all(), the result is a list, so this works. However, converting to a list explicitly (e.g., list(annotations)) before iteration would be safer and more explicit to prevent potential issues if the return type changes in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I believe the code should fail if the return types of cas.select_all and cas.select change, because it signifies a major change in functionality. But I am on the fence on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the Copilot. Just because select_all() returns a List, it does not mean that the List may not be a live list. Removing stuff from a live list would be problematic. When removing stuff from a container such as the CAS, it is a good practice to make a snapshot of the element list over which to iterate before modifying the container.
| for annotation in self.select_all(): | ||
| if sofa_begin <= annotation.begin and annotation.end <= sofa_end: | ||
| annotation.begin = annotation.begin - sofa_begin | ||
| annotation.end = annotation.end - sofa_begin | ||
| elif overlap and sofa_begin < annotation.end <= sofa_end: | ||
| annotation.begin = 0 | ||
| annotation.end = sofa_end - sofa_begin | ||
| elif overlap and sofa_begin <= annotation.begin < sofa_end: | ||
| annotation.begin = annotation.begin - sofa_begin | ||
| annotation.end = len(self.sofa_string) | ||
| elif overlap and annotation.begin <= sofa_begin and sofa_end <= annotation.end: | ||
| annotation.begin = 0 | ||
| annotation.end = len(self.sofa_string) | ||
| else: | ||
| self.remove(annotation) |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over select_all() and removing items during iteration can cause issues. The list returned by select_all() is modified during iteration when self.remove(annotation) is called. Consider iterating over a copy: for annotation in list(self.select_all()):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select_all returns an independent list of annotations. It should be safe to directly remove an annotation from the CAS while iterating over the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface only promises to return a List, it does not promise that this list is an independent object. See comment above.
|
|
||
| assert len(cas.select_all()) == len(expected_leftover_annotations) | ||
|
|
||
| print(cas.sofa_string) |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement left in test code. This should be removed or replaced with an assertion that verifies the expected sofa_string value.
| print(cas.sofa_string) |
|
|
||
| expected_leftover_annotations = [annotation for annotation in cas.select_all() | ||
| if (begin <= annotation.begin < end) | ||
| or (annotation.begin < begin < end <= annotation.end)] |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected leftover annotations logic doesn't match the implementation. The condition doesn't account for all overlap cases. For example, annotations where annotation.end overlaps but annotation.begin < begin and annotation.end > end should be included but aren't. The condition should be: if (begin <= annotation.begin < end) or (begin < annotation.end <= end) or (annotation.begin < begin and annotation.end > end)
| or (annotation.begin < begin < end <= annotation.end)] | |
| or (begin < annotation.end <= end) | |
| or (annotation.begin < begin and annotation.end > end)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this suggestion.
|
I agree with most suggestions, with the exception of the issue iteration/remove (384-398). But I am, again, unsure how to proceed from here on :) |
|
Thanks again :) Same as with the other PR:
This change is a tricky one because you can have annotations that are transitively referenced through a feature of an indexed feature structure, not that themselves are not on the index. That means if you use only select_all(), you would not find them. You would have to use |
This PR adds the following convenience functions to the cas object:
The methods can be called on any cas object directly.
Tests for each function was added to test_cas.py