Skip to content

Conversation

@dmnc-grdnr
Copy link

This PR adds the deep copy functionality to the CAS object.

Tests for the function was added to test_cas.py

I have verified the functionality on different CAS files, but due to the complexity of the data structure, there might be something I have overlooked.

Therefore feedback is highly appreciated!

@reckart reckart changed the title feature/330-enable-deep-copy #330 - Enable deep copy Nov 10, 2025
@reckart reckart requested a review from Copilot November 10, 2025 19:36
@reckart reckart added this to the 0.11.0 milestone Nov 10, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds deep copy functionality to the CAS (Common Analysis Structure) object, allowing users to create independent copies of CAS instances with or without duplicating the associated typesystem.

Key changes:

  • Added a deep_copy() method to the Cas class with optional typesystem copying
  • Implemented comprehensive test coverage for both copying modes (with and without typesystem)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
cassis/cas.py Implements the deep_copy() method to create deep copies of CAS objects, handling sofas, views, feature structures, and their references
tests/test_cas.py Adds two test cases verifying deep copy functionality with and without typesystem copying

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cassis/cas.py Outdated
for item in fs[feature.name]:
if hasattr(item, 'xmiID') and item.xmiID is not None:
referenced_list.append(item.xmiID)
if len(referenced_list) > 0:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing setdefault call for referenced_lists dictionary. This will raise a KeyError when trying to append to a non-existent key. Should be referenced_lists.setdefault(feature.name, []) before the append, similar to how it's done for references on line 896 and referenced_arrays on line 910.

Suggested change
if len(referenced_list) > 0:
if len(referenced_list) > 0:
referenced_lists.setdefault(feature.name, [])

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is correct. However, lines 877, 903 and 904 are actually dead code. These lines should be removed instead.

cassis/cas.py Outdated
try:
all_copied_fs[current_ID][feature] = all_copied_fs[reference_ID]
except KeyError as e:
print("Reference feature", current_ID, "not found.", feature, e)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print for error reporting is inappropriate. This should use proper logging (e.g., warnings.warn() or logger.error()) or re-raise the exception. Silent failures via print statements can hide critical bugs in production code.

Suggested change
print("Reference feature", current_ID, "not found.", feature, e)
warnings.warn(f"Reference feature {current_ID} not found for feature '{feature}': {e}")

Copilot uses AI. Check for mistakes.
cassis/cas.py Outdated
Comment on lines 927 to 928
ts = cas_copy.typesystem
array_copy = ts.get_type("FSArray")()
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable shadowing: ts is already defined in the outer scope (line 841) and used throughout the function. Reusing the same variable name here creates confusion. Consider using a more descriptive name like target_ts or removing this line entirely since cas_copy.typesystem is already the same as ts.

Suggested change
ts = cas_copy.typesystem
array_copy = ts.get_type("FSArray")()
array_copy = cas_copy.typesystem.get_type("FSArray")()

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is correct. However, line 927 is an unnecessary assignment and should be removed instead.

cassis/cas.py Outdated
for feature in t.all_features:
if ts.is_primitive(feature.rangeType) or ts.is_primitive_collection(feature.rangeType):
fs_copy[feature.name] = fs.get(feature.name)
elif feature.name not in ["FSArray", "sofa"]:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The condition feature.name not in [\"FSArray\", \"sofa\"] is unclear. Consider using named constants or adding a comment explaining why these specific features are excluded from reference tracking.

Suggested change
elif feature.name not in ["FSArray", "sofa"]:
# Exclude FSArray and sofa features from reference tracking as they are handled specially
elif feature.name not in [TYPE_NAME_FS_ARRAY, TYPE_NAME_SOFA]:

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if introducing these constants is helpful, but the comment may very well be.

cassis/cas.py Outdated
Comment on lines 838 to 839
Returns a deep copy of the current Cas
:param copy_typesystem: whether to copy the original typesystem or not
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is incomplete and doesn't follow the existing style in the codebase. It should use the standard format with Args and Returns sections, similar to other methods like to_json(). Also consider adding a note about what 'deep copy' means in this context (e.g., all feature structures, views, and sofas are copied).

Suggested change
Returns a deep copy of the current Cas
:param copy_typesystem: whether to copy the original typesystem or not
Create and return a deep copy of this CAS object.
All feature structures, views, and sofas are copied. If `copy_typesystem` is True, the typesystem is also deep-copied;
otherwise, the original typesystem is shared between the original and the copy.
Args:
copy_typesystem (bool): Whether to copy the original typesystem or not. If True, the typesystem is deep-copied.
Returns:
Cas: A deep copy of this CAS object.

Copilot uses AI. Check for mistakes.
cassis/cas.py Outdated
Comment on lines 906 to 907
referenced_list = []
for item in fs[feature.name].elements:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The variable name referenced_list is reused for both list-type features (line 899) and array-type features (line 906). While they're in different scopes, using distinct names like referenced_array_elements for the array case would improve code clarity.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this argument. However it does not matter, because lines 898-902 are actually dead code (connected to unused variable referenced_lists see comment above). These lines should be removed instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please remove them in a follow-up commit.

@reckart
Copy link
Member

reckart commented Nov 10, 2025

@dmnc-grdnr Could you please consider the Copilot review. If you believe and of its review remarks are wrong, please comment them.

@reckart
Copy link
Member

reckart commented Nov 11, 2025

Oh, and thank you again for another PR :)

@dmnc-grdnr
Copy link
Author

I reviewed the Copilot suggestions agreed with most, but found that a few lines are actually obsolete and should be removed instead of modified. This is my first pull request and I am unsure how to make the changes without breaking things. Could you please point me to what to do next?

Copy link
Member

@reckart reckart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. To continue, it would be great if you could simply address the comments by updating the code and then committing the changes into this branch. Once the changes are in, the review will be repeated. Please do not use the "amend" function of git, just add another commit. Avoid force pushes, then you can't really break anything. Thanks!

cassis/cas.py Outdated
if hasattr(fs[feature.name], 'xmiID') and fs[feature.name].xmiID is not None:
references.setdefault(feature.name, [])
references[feature.name].append((fs.xmiID, fs[feature.name].xmiID))
elif ts.is_list(feature.rangeType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an "if" and an "elif" case here but no "else". Maybe the "else" can never happen in practice, but if it happens, we may be missing to copy some data. So if the "else" case can never, happen, then it should be implemented and thrown an exception at least so that if it happens, we know something is going very wrong.

cassis/cas.py Outdated
Comment on lines 906 to 907
referenced_list = []
for item in fs[feature.name].elements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please remove them in a follow-up commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add randomized tests for the deep copy. We have test_multi_type_random_serialization_deserialization and test_multi_feature_random_serialization_deserialization which create randomized CASes for (de)serialization and compare them afterwards. These can be used as templates for randomized deep_copy tests.

@dmnc-grdnr
Copy link
Author

I was finally able to iron out all of the issues that appeared after including the random CAS tests. They were extremely helpful! I think, I made all the changes that you and Copilot required. Could you please have another look? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants