Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from crossbar/autobahn to gRPC #1469

Merged
merged 38 commits into from
Aug 16, 2024
Merged

Move from crossbar/autobahn to gRPC #1469

merged 38 commits into from
Aug 16, 2024

Conversation

Bastian-Krause
Copy link
Member

Description
As announced in #1417 and in the v24.0 release, labgrid will move from crossbar/autobahn to gRPC for communication between client/exporter and coordinator.

Crossbar/autobahn are unfortunately not very well maintained anymore. The crossbar component was moved to its own virtualenv to cope with the high number of dependencies leading to conflicts. Support for Python 3.13 is still not
available in a crossbar release on PyPI.

That's why labgrid moves to gRPC with this release. gRPC is a well maintained RPC framework with a lot of users. As a side effect, the message transfer is more performant and the import times are shorter.

Maintaining support for both crossbar/autobahn as well as gRPC in labgrid would be a lot of effort due to the different architectures of those frameworks. Therefore, a hard migration to gRPC is deemed the lesser issue.

Due to the migration, this PR introduces the following breaking changes:

  • The labgrid environment config option crossbar_url was renamed to coordinator_address. The environment variable LG_CROSSBAR was renamed to LG_COORDINATOR.
  • The labgrid environment config option crossbar_realm is now obsolete as well as the environment variable LG_CROSSBAR_REALM.
  • The coordinator is available as labgrid-coordinator (instead of crossbar start). No additional configuration file is required.
  • The systemd services in contrib/systemd/ were updated.

Our plan is to merge this before the next release v24.1.

Checklist

  • Documentation for the feature
  • Tests for the feature
  • PR has been tested (We've tested the grpc branch successfully during the last 7 weeks.)
  • Man pages have been regenerated

Closes #1059
Closes #1260
Fixes #1429

jluebbe and others added 26 commits August 13, 2024 14:26
Crossbar and autobahn are used to communicate between client and
coordinator as well as between exporter and coordinator. Both are
unfortunately not very well maintained anymore. The crossbar component
was moved to its own virtualenv for quite a while to cope with
conflicting dependencies. python3.13 support for crossbar is still not
available (at least not in a version releases on PyPi).

That's why labgrid will now move to gRPC. It's a well maintained RPC
framework. As a side effect, the message transfer is more performant and
the import times are shorter.

gRPC relies on the protocol buffers compiler (protoc) to generate code.
Use grpcio-tools 1.62.2 to do that. It's used in Yocto scarthgap and is
incompatible with generated code from newer grpcio-tools.

Co-developed-by: Rouven Czerwinski <[email protected]>
Co-developed-by: Bastian Krause <[email protected]>
Signed-off-by: Jan Luebbe <[email protected]>
The coordinator binds to port 20408 by default. The client tries to
connect to that port if no port is specified via --cordinator/-x,
LG_COORDINATOR or via coordinator_address option in the environment
config.

Only the exporter does not yet default to 20408. Change that by
appending the default port 20408 to the address given via
--coordinator/-c or LG_COORDINATOR if no port was specified.

Closes #1429

Signed-off-by: Bastian Krause <[email protected]>
If one of these fails, the others should not be skipped.

Signed-off-by: Jan Luebbe <[email protected]>
_update_acquired_places() is only called when resources are added by the
exporter or when they are removed after it has disconnected. So we
never want to call back to the exporter for removed resources.

Signed-off-by: Jan Luebbe <[email protected]>
In future commits, more code will call the newly introduced
_acquire_resource() method.

Signed-off-by: Jan Luebbe <[email protected]>
In cases when an exporter used by an acquired place disconnects, the
place still references those orphaned resources. If the exporter
reconnects, they are reaquired by the coordinator automatically.

Signed-off-by: Jan Luebbe <[email protected]>
If an exporter disconnects, corresponding resources that are acquired
become orphaned. If the exporter reconnects, the resources are not
locked. Requiring the user to unlock and lock the corresponding place
again is inconvenient and would change the previous behavior.

Prior to this commit, reacquiring did not work due to a logic error:
ExporterCommand.complete() und ExporterCommand.wait() are both called in
ExporterStream.request_task(). The blocking wait() prevents further
processing of exporter messages. That also means responses for
ExporterSetAcquiredRequest are not handled anymore. This leads to a
state where resources cannot be acquired/released by their place anymore.
Ultimatively, this leads to an inconsistent state requiring a
coordinator restart.

Refactor handling of these orphaned resources to solve this.

Signed-off-by: Jan Luebbe <[email protected]>
It's not really clear how keepalive_timeout_ms and the ping_timeout_ms
experiment should interact, so we set them both.

Signed-off-by: Jan Luebbe <[email protected]>
We only want to run one instance of the coordinator, so enabling
so_reuseport can lead to confusing situations where multiple coordinator
are running in parallel.

Signed-off-by: Jan Luebbe <[email protected]>
Instead of using asyncio.get_event_loop() in various places in the
Coordinator class, store it in an attribute called loop.

Signed-off-by: Bastian Krause <[email protected]>
These checks are already performed by get_idle_place().

Signed-off-by: Bastian Krause <[email protected]>
…n allow()/release_from()

Signed-off-by: Bastian Krause <[email protected]>
Calling asyncio.get_event_loop() with no current event loop is deprecated
since Python 3.10 and will be an error in some future Python release
[1].

Whenever we don't expect to run with an event loop, create one
explicitly. In coroutine and callbacks from asynchronous code, use the
more explicit asyncio.get_running_loop() to get the loop.

Note that this does not work in
labgrid.resources.ethernetport.EthernetPortManager: This code is usually
not called in coroutines and callbacks from asynchronous code, so
asyncio.get_running_loop() does not work there. So stick to
asyncio.get_event_loop() there and just expect to be called with a
running event loop (which is the non-deprecated use case for this
function).
Users that do not have an event loop running will see a justified
DeprecationWarning with Python >= 3.12 and an error in some future Python
version.

[1] https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop

Signed-off-by: Bastian Krause <[email protected]>
Calling asyncio.get_event_loop() with no current event loop is deprecated
since Python 3.10 and will be an error in some future Python release
[1]. Using it in labgrid.remote.client.start_session() causes errors in
IPython when using a RemotePlace:

  In [1]: from labgrid.resource.remote import RemotePlace
     ...: from labgrid import Target
     ...:
     ...: target = Target("example")
     ...: RemotePlace(target, name="example-place")
  [...]
  RuntimeError: There is no current event loop in thread 'MainThread'.

For labgrid.remote.client.start_session() there is no reliable way of
retrieving the thread's event loop without being called from an async
context (which we cannot assume here). Instead of using
asyncio.get_event_loop(), use a new helper function ensure_event_loop()
that returns the first available loop instance from:

- externally provided event loop
- stashed event loop
- OS thread's running event loop (when called from async code)
- new event loop

The returned loop is stashed for future calls. See also [2] for a
similar approach.

start_session() now accepts a new optional argument "loop" for providing
an external event loop.

[1] https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop
[2] jupyter/jupyter_core#387

Signed-off-by: Bastian Krause <[email protected]>
The Environment was always optional. Before users of ClientSession had
to pass an explicit None for this attribute. While we're changing the
ClientSession for gRPC anyway, let's make env really optional.

This again allows us to make start_session()'s extra argument optional,
too. It is used to pass extra arguments to the ClientSession, which
means it can be an empty dictionary now.

Signed-off-by: Bastian Krause <[email protected]>
During the move to gRPC, start_session()'s arguments changed. Since this
is one of the functions used from outside of labgrid, add typing hints
and force the kwargs to be passed with names. This should make users
aware of the changes, so their code can be adapted.

Signed-off-by: Bastian Krause <[email protected]>
Calling asyncio.get_event_loop() with no current event loop is deprecated
since Python 3.10 and will be an error in some future Python release
[1].

SNMPEthernetPort expects a running event loop. So create and set one.

[1] https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop

Signed-off-by: Bastian Krause <[email protected]>
This allows stopping and starting the exporter during a test. More
functionality will be moved into the class in future commits.

Signed-off-by: Bastian Krause <[email protected]>
…idComponent

Most functionality of this class is not exporter- or
coordinator-specific, so move the common parts into a common class.
Both Exporter (and a future) Coordinator class will inherhit it.

Signed-off-by: Bastian Krause <[email protected]>
The same was previously implemented for the exporter. Since we need the
same functionality also for the coordinator (along with
suspend_tree()/resume_tree() functionality to be moved in a future
commit), let's refactor it now.

Signed-off-by: Bastian Krause <[email protected]>
Previously only tests/test_client.py could use the suspend_tree() and
resume_tree() functionality. Since this is also useful for
exporter + coordinator tests, move it into the generic LabgridComponent
class.

Signed-off-by: Bastian Krause <[email protected]>
Previously the exporter had blocking issues when the coordinator was not
available. Add a test to prevent future regressions.

Signed-off-by: Bastian Krause <[email protected]>
A previous commit added a test for exporter startup with an unreachable
coordinator. Now also add a test simulating a dissappearing coordinator
during operation. The exporter should notice the coordinator
disappearing and should exit with exitcode 100. This way systemd can
try restarting the exporter regularly until the coordinator is available
again.

Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Rouven Czerwinski <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Rouven Czerwinski <[email protected]>
pylint finds false positives for Python 3.8:

  labgrid/remote/coordinator.py:188:21: E1136: Value 'dict' is unsubscriptable (unsubscriptable-object)
  labgrid/remote/coordinator.py:194:24: E1136: Value 'dict' is unsubscriptable (unsubscriptable-object)
  labgrid/remote/coordinator.py:195:22: E1136: Value 'dict' is unsubscriptable (unsubscriptable-object)

Since Python 3.8 reaches EOL in October 2024 anyways, let's just skip
pylint for this version until then.

Signed-off-by: Bastian Krause <[email protected]>
The latest crossbar release was not compatible with python3.12. Now that
the crossbar dependency is gone, we can finally advertise python3.12
compatibility.

Closes #1260

Signed-off-by: Bastian Krause <[email protected]>
@Bastian-Krause
Copy link
Member Author

Rebased on latest master (due to conflicts with #1470) .

With the autobahn/crossbar dependencies removed, numpy is no longer an
indirect dependency of labgrid.

Signed-off-by: Bastian Krause <[email protected]>
Copy link
Member

@jluebbe jluebbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although there are probably still some issues in this PR, I think it's better to fix those with smaller PRs on master instead of iterating here. Merging now would also make testing easier and hopefully lead to more feedback.

Copy link
Contributor

@sjg20 sjg20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on my lab setup

@Emantor
Copy link
Member

Emantor commented Aug 16, 2024

Let's go.

@Emantor Emantor merged commit 1ab0abd into master Aug 16, 2024
15 of 16 checks passed
hnez pushed a commit to hnez/meta-lxatac that referenced this pull request Oct 22, 2024
This marks a breaking change compared to the previous release version of
labgrid, as this new development version now uses grpc instead of crossbar
to communicate between the client, coordinator and exporters.

This means all of these components need to be updated to the grpc-based
versions at the same time.

See labgrid-project/labgrid#1469 for more
information about this change.

Signed-off-by: Leonard Göhrs <[email protected]>
hnez pushed a commit to hnez/meta-lxatac that referenced this pull request Oct 22, 2024
This marks a breaking change compared to the previous release version of
labgrid, as this new development version now uses grpc instead of crossbar
to communicate between the client, coordinator and exporters.

This means all of these components need to be updated to the grpc-based
versions at the same time.

See labgrid-project/labgrid#1469 for more
information about this change.

Signed-off-by: Leonard Göhrs <[email protected]>
hnez pushed a commit to hnez/meta-lxatac that referenced this pull request Oct 22, 2024
This marks a breaking change compared to the previous release version of
labgrid, as this new development version now uses grpc instead of crossbar
to communicate between the client, coordinator and exporters.

This means all of these components need to be updated to the grpc-based
versions at the same time.

See labgrid-project/labgrid#1469 for more
information about this change.

Signed-off-by: Leonard Göhrs <[email protected]>
a3f pushed a commit to a3f/meta-lxatac that referenced this pull request Nov 12, 2024
This marks a breaking change compared to the previous release version of
labgrid, as this new development version now uses grpc instead of crossbar
to communicate between the client, coordinator and exporters.

This means all of these components need to be updated to the grpc-based
versions at the same time.

See labgrid-project/labgrid#1469 for more
information about this change.

Signed-off-by: Leonard Göhrs <[email protected]>
a3f pushed a commit to a3f/meta-lxatac that referenced this pull request Nov 12, 2024
This marks a breaking change compared to the previous release version of
labgrid, as this new development version now uses grpc instead of crossbar
to communicate between the client, coordinator and exporters.

This means all of these components need to be updated to the grpc-based
versions at the same time.

See labgrid-project/labgrid#1469 for more
information about this change.

Signed-off-by: Leonard Göhrs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assume port 20408 for GRPC exporter
4 participants