Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollback of changes with errors during the VM assign #7061

Open
wants to merge 11 commits into
base: 4.20
Choose a base branch
from

Conversation

stephankruggg
Copy link
Contributor

Description

When assigning a VM to another account (assignVirtualMachine API), if a network error happens during the process, no rollback is executed; thus, leaving the DB in an inconsistent state.

This PR fixes this by processing all steps to change the ownership of a VM inside the transaction. It also refactors code related to this procedure.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

In a local lab, I created a VM and an account of user type. I then attempted to assign the admin's VM to the user by using a network not accessible to the user.

Before applying the changes, an error was raised, but the VM was listed as belonging to the user.

After applying the changes, an error was raised, and the VM was listed as belonging to the admin - no changes occurred.

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of refactoring.
I am not sure if the code is good or not. Maybe others can check.

@stephankruggg
can you point out the key point of your change ?

server/src/main/java/com/cloud/vm/UserVmManagerImpl.java Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 5, 2023

Codecov Report

Attention: Patch coverage is 86.41304% with 50 lines in your changes missing coverage. Please review.

Project coverage is 16.13%. Comparing base (ab76d3c) to head (7d882a9).

Files with missing lines Patch % Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 87.60% 38 Missing and 7 partials ⚠️
...e/cloudstack/api/command/admin/vm/AssignVMCmd.java 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.20    #7061      +/-   ##
============================================
+ Coverage     16.07%   16.13%   +0.06%     
- Complexity    12883    12965      +82     
============================================
  Files          5639     5639              
  Lines        494193   494241      +48     
  Branches      59925    59896      -29     
============================================
+ Hits          79419    79753     +334     
+ Misses       405944   405667     -277     
+ Partials       8830     8821       -9     
Flag Coverage Δ
uitests 4.02% <ø> (ø)
unittests 16.98% <86.41%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stephankruggg
Copy link
Contributor Author

lots of refactoring. I am not sure if the code is good or not. Maybe others can check.

@stephankruggg can you point out the key point of your change ?

@weizhouapache, when assigning a VM to another account (assignVirtualMachine API), if a network error happens during the process, no rollback is executed; thus, leaving the DB in an inconsistent state. This PR fixes this by processing all steps to change the ownership of a VM inside the transaction.

The refactoring is intended to make the code more testable, readable, and easier to maintain.

@weizhouapache
Copy link
Member

lots of refactoring. I am not sure if the code is good or not. Maybe others can check.
@stephankruggg can you point out the key point of your change ?

@weizhouapache, when assigning a VM to another account (assignVirtualMachine API), if a network error happens during the process, no rollback is executed; thus, leaving the DB in an inconsistent state. This PR fixes this by processing all steps to change the ownership of a VM inside the transaction.

The refactoring is intended to make the code more testable, readable, and easier to maintain.

@stephankruggg
thanks.

it seems the key block is

        Transaction.execute(new TransactionCallbackNoReturn() {
            @Override
            public void doInTransactionWithoutResult(TransactionStatus status) {
                executeStepsToChangeOwnershipOfVm(cmd, caller, oldAccount, newAccount, vm, offering, volumes, template, domainId);
            }
        });

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rohityadavcloud a Jenkins job has been kicked to build packages. It will be bundled with SystemVM template(s). I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 5302

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall code lgtm.
left 2 minor comments.

@stephankruggg
can you ask your team to test this and share the testing result as a comment ?

@github-actions
Copy link

github-actions bot commented Feb 8, 2023

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 9, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

75.3% 75.3% Coverage
0.0% 0.0% Duplication

@github-actions
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland DaanHoogland added this to the 4.19.0.0 milestone Jun 22, 2023
@DaanHoogland
Copy link
Contributor

@stephankruggg can you look at the conflicts? (cc @BryanMLima )

@DaanHoogland
Copy link
Contributor

@stephankruggg is anybody looking at this (or should we postpone)?

@stephankruggg
Copy link
Contributor Author

@stephankruggg is anybody looking at this (or should we postpone)?

@GaOrtiga can you take at this one, please?

@stephankruggg stephankruggg removed their assignment Nov 15, 2023
@GaOrtiga
Copy link
Contributor

Hi guys, sorry for the delay.

@stephankruggg is anybody looking at this (or should we postpone)?

@GaOrtiga can you take at this one, please?

Yes.

@stephankruggg is anybody looking at this (or should we postpone)?

I have fixed the conflicts and will run some tests in the next few days before pushing the changes.

@kiranchavala
Copy link
Contributor

@blueorangutan package

Copy link
Collaborator

@winterhazel winterhazel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. I compared the original code and the refactored code line by line, and verified that their behavior is overall the same.

Also, I did some basic VM assign tests and verified that, before the patch, if an error occurred when updating the VM's network, the VM would be left in an inconsistent state (to result in an error, I passed the ID of a network that does not exist to the networkids parameter). After the patch, a rollback is performed.

@stephankruggg @BryanMLima @DaanHoogland could someone change this PR to ready?

@winterhazel
Copy link
Collaborator

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@BryanMLima BryanMLima marked this pull request as ready for review November 29, 2024 20:08
@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 11663

@winterhazel
Copy link
Collaborator

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11679

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11839)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 54892 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7061-t11839-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm, though a lot of changes there are unit tests to support those. I don't think there is an integration test supporting this scenario, but as it is a bit niche I think we can risk merging. Do you agree @weizhouapache ?

Copy link

github-actions bot commented Jan 6, 2025

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@winterhazel
Copy link
Collaborator

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 12014

@winterhazel
Copy link
Collaborator

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12016

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12067)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53801 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7061-t12067-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.27 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.42 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.42 test_ipv4_routing.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.