Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local network daemons seem to be connected but are not communicating #751

Open
KyloRen1 opened this issue Jan 11, 2025 · 9 comments
Open
Labels
bug Something isn't working cli CLI coordinator daemon python Python API

Comments

@KyloRen1
Copy link

Describe the bug
Local network daemons seem to be connected but are not communicating, except for the stop command. I am running Dora CLI version 0.3.8 on both machines: one is armv7l, and the other is x86_64. I followed the example in release v0.3.5 as provided in my previous issue.

To Reproduce
Steps to reproduce the behavior:

  1. On machine A (armv7l) I run:
dora coordinator
dora daemon --machine-id A 
  1. On machine B (x86_64) I run:
dora daemon --machine-id B --coordinator-addr 192.168.1.86 --local-listen-port 53290
dora start dataflow_small.yml --coordinator-addr 192.168.1.86 --coordinator-port 6012
  1. My dataflow_small.yml is supposed to read frames from the armv7l machine and transfer them to machine B. It looks like this:
nodes:
  - id: webcam
    _unstable_deploy:
      machine: A
    path: /ABSOLUTE_PATH/dora-main/node-hub/opencv-video-capture/opencv_video_capture/main.py
    inputs:
      tick: dora/timer/millis/16 # try to capture at 60fps
    outputs:
      - image # the captured image
    env:
      PATH: 0 # optional, default is 0
      IMAGE_WIDTH: 640 # optional, default is video capture width
      IMAGE_HEIGHT: 480 # optional, default is video capture height

  - id: plot
    _unstable_deploy:
      machine: B
    path: /ABSOLUTE_PATH/dora-main/node-hub/opencv-plot/opencv_plot/main.py
    inputs:
      image: webcam/image # image: Arrow array of size 1 containing the base image
    env:
      PLOT_WIDTH: 640 # optional, default is image input width
      PLOT_HEIGHT: 480 # optional, default is image input height
  1. After I run all these commands, there is no error and logs do not produce anything related to camera read and plot. Here is dora-daemon-B logs:
2025-01-11T14:56:46.701581Z  INFO dora_daemon::coordinator: Connected to dora-coordinator at 192.168.1.86:53290
2025-01-11T14:56:50.217336Z  INFO run_inner{self.machine_id=B}: dora_daemon::spawn: spawning: "/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer/venv/bin/python" /Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py
2025-01-11T14:56:52.263876Z  INFO run_inner{self.machine_id=B}: dora_daemon: node `plot` is ready
2025-01-11T14:56:52.263944Z  INFO run_inner{self.machine_id=B}: dora_daemon::pending: all local nodes are ready (exit before subscribe: []), waiting for remote nodes

I verified that the camera is working. Additionally, I tried running the armv7l Dora command both within and outside a Python virtual environment with OpenCV installed, but the behavior remained the same.

Environments (please complete the following information):

  • System info: armv7l and x86_64
  • Dora version: 0.3.8
@github-actions github-actions bot added bug Something isn't working cli CLI coordinator daemon python Python API labels Jan 11, 2025
@haixuanTao
Copy link
Collaborator

haixuanTao commented Jan 11, 2025

yes, so there is a problem with remote node where you need to pass the coordinator address to the daemon If it is on the same machine but need to be connected to remote daemon:

  1. On machine A (armv7l) I run:
dora coordinator
dora daemon --machine-id A --coordinator-addr 192.168.1.86
  1. On machine B (x86_64) I run:
dora daemon --machine-id B --coordinator-addr 192.168.1.86 --local-listen-port 53290
dora start dataflow_small.yml --coordinator-addr 192.168.1.86 --coordinator-port 6012

@KyloRen1
Copy link
Author

Thank you for the reply. It seems that machines A and B are now connected, but nothing is happening yet.

Here are the Dora coordinator logs:

2025-01-12T12:09:46.631559Z  INFO spawn_dataflow{dataflow=Descriptor { communication: CommunicationConfig { local: Tcp, remote: Tcp }, deploy: Deploy { machine: None }, nodes: [Node { id: NodeId("webcam"), name: None, description: None, env: Some({"IMAGE_HEIGHT": Integer(480), "IMAGE_WIDTH": Integer(640), "PATH": Integer(0)}), deploy: Deploy { machine: Some("A") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-video-capture/opencv_video_capture/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("tick"): Input { mapping: Timer { interval: 16ms }, queue_size: None }}, outputs: {DataId("image")} }, Node { id: NodeId("plot"), name: None, description: None, env: Some({"PLOT_HEIGHT": Integer(480), "PLOT_WIDTH": Integer(640)}), deploy: Deploy { machine: Some("B") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("image"): Input { mapping: User(UserInputMapping { source: NodeId("webcam"), output: DataId("image") }), queue_size: None }}, outputs: {} }] } working_dir="/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer"}: dora_core::descriptor::validate: skipping path check for remote node `webcam`
2025-01-12T12:09:46.631816Z  INFO spawn_dataflow{dataflow=Descriptor { communication: CommunicationConfig { local: Tcp, remote: Tcp }, deploy: Deploy { machine: None }, nodes: [Node { id: NodeId("webcam"), name: None, description: None, env: Some({"IMAGE_HEIGHT": Integer(480), "IMAGE_WIDTH": Integer(640), "PATH": Integer(0)}), deploy: Deploy { machine: Some("A") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-video-capture/opencv_video_capture/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("tick"): Input { mapping: Timer { interval: 16ms }, queue_size: None }}, outputs: {DataId("image")} }, Node { id: NodeId("plot"), name: None, description: None, env: Some({"PLOT_HEIGHT": Integer(480), "PLOT_WIDTH": Integer(640)}), deploy: Deploy { machine: Some("B") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("image"): Input { mapping: User(UserInputMapping { source: NodeId("webcam"), output: DataId("image") }), queue_size: None }}, outputs: {} }] } working_dir="/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer"}: dora_core::descriptor::validate: skipping path check for remote node `plot`
2025-01-12T12:09:46.673544Z  INFO spawn_dataflow{dataflow=Descriptor { communication: CommunicationConfig { local: Tcp, remote: Tcp }, deploy: Deploy { machine: None }, nodes: [Node { id: NodeId("webcam"), name: None, description: None, env: Some({"IMAGE_HEIGHT": Integer(480), "IMAGE_WIDTH": Integer(640), "PATH": Integer(0)}), deploy: Deploy { machine: Some("A") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-video-capture/opencv_video_capture/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("tick"): Input { mapping: Timer { interval: 16ms }, queue_size: None }}, outputs: {DataId("image")} }, Node { id: NodeId("plot"), name: None, description: None, env: Some({"PLOT_HEIGHT": Integer(480), "PLOT_WIDTH": Integer(640)}), deploy: Deploy { machine: Some("B") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("image"): Input { mapping: User(UserInputMapping { source: NodeId("webcam"), output: DataId("image") }), queue_size: None }}, outputs: {} }] } working_dir="/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer"}: dora_coordinator::run: successfully spawned dataflow `01945a6b-9187-78a9-8c2e-6ec5d9d3a892`
2025-01-12T12:12:27.945535Z  INFO dora_coordinator: received ctrlc signal
2025-01-12T12:12:27.945983Z  INFO dora_coordinator: Destroying coordinator after receiving Ctrl-C signal
2025-01-12T12:12:27.987770Z  INFO dora_coordinator: successfully send stop dataflow `01945a6b-9187-78a9-8c2e-6ec5d9d3a892` to all daemons
2025-01-12T12:12:27.989049Z  INFO dora_coordinator: successfully destroyed daemon `A`
2025-01-12T12:12:28.000062Z  INFO dora_coordinator: successfully destroyed daemon `B`
2025-01-12T12:12:28.006910Z  INFO dora_coordinator: stopped

And machine B logs

2025-01-12T12:09:43.905836Z  INFO dora_daemon::coordinator: Connected to dora-coordinator at 192.168.1.86:53290
2025-01-12T12:09:46.629597Z  INFO run_inner{self.machine_id=B}: dora_daemon::spawn: spawning: "/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer/venv/bin/python" /Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py
2025-01-12T12:09:48.200304Z  INFO run_inner{self.machine_id=B}: dora_daemon: node `plot` is ready
2025-01-12T12:09:48.200819Z  INFO run_inner{self.machine_id=B}: dora_daemon::pending: all local nodes are ready (exit before subscribe: []), waiting for remote nodes
2025-01-12T12:12:27.974299Z  INFO run_inner{self.machine_id=B}: dora_daemon: received destroy command -> exiting
2025-01-12T12:12:27.992008Z  WARN dora_daemon::node_communication: failed to receive reply from daemon

Location:
    /Users/runner/work/dora/dora/binaries/daemon/src/node_communication/mod.rs:508:30
2025-01-12T12:12:28.023829Z  WARN dora_daemon: process 67780 was killed on drop because it was still running

@haixuanTao
Copy link
Collaborator

Ok sorry. The right command should be:

dora coordinator
dora daemon --machine-id A --coordinator-addr 192.168.1.86  --inter-daemon-addr 0.0.0.0:20001
  1. On machine B (x86_64) I run:
dora daemon --machine-id B --coordinator-addr 192.168.1.86 --local-listen-port 53290   --inter-daemon-addr 0.0.0.0:20002
dora start dataflow_small.yml --coordinator-addr 192.168.1.86 --coordinator-port 6012

@KyloRen1
Copy link
Author

Still not working, but it seems that machine A now received dataflow.yml info

dora coordinator logs

2025-01-13T20:53:53.373982Z  INFO spawn_dataflow{dataflow=Descriptor { communication: CommunicationConfig { local: Tcp, remote: Tcp }, deploy: Deploy { machine: None }, nodes: [Node { id: NodeId("webcam"), name: None, description: None, env: Some({"IMAGE_HEIGHT": Integer(480), "IMAGE_WIDTH": Integer(640), "PATH": Integer(0)}), deploy: Deploy { machine: Some("A") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-video-capture/opencv_video_capture/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("tick"): Input { mapping: Timer { interval: 16ms }, queue_size: None }}, outputs: {DataId("image")} }, Node { id: NodeId("plot"), name: None, description: None, env: Some({"PLOT_HEIGHT": Integer(480), "PLOT_WIDTH": Integer(640)}), deploy: Deploy { machine: Some("B") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("image"): Input { mapping: User(UserInputMapping { source: NodeId("webcam"), output: DataId("image") }), queue_size: None }}, outputs: {} }] } working_dir="/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer"}: dora_core::descriptor::validate: skipping path check for remote node `webcam`
2025-01-13T20:53:53.374253Z  INFO spawn_dataflow{dataflow=Descriptor { communication: CommunicationConfig { local: Tcp, remote: Tcp }, deploy: Deploy { machine: None }, nodes: [Node { id: NodeId("webcam"), name: None, description: None, env: Some({"IMAGE_HEIGHT": Integer(480), "IMAGE_WIDTH": Integer(640), "PATH": Integer(0)}), deploy: Deploy { machine: Some("A") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-video-capture/opencv_video_capture/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("tick"): Input { mapping: Timer { interval: 16ms }, queue_size: None }}, outputs: {DataId("image")} }, Node { id: NodeId("plot"), name: None, description: None, env: Some({"PLOT_HEIGHT": Integer(480), "PLOT_WIDTH": Integer(640)}), deploy: Deploy { machine: Some("B") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("image"): Input { mapping: User(UserInputMapping { source: NodeId("webcam"), output: DataId("image") }), queue_size: None }}, outputs: {} }] } working_dir="/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer"}: dora_core::descriptor::validate: skipping path check for remote node `plot`
2025-01-13T20:53:53.414140Z  INFO spawn_dataflow{dataflow=Descriptor { communication: CommunicationConfig { local: Tcp, remote: Tcp }, deploy: Deploy { machine: None }, nodes: [Node { id: NodeId("webcam"), name: None, description: None, env: Some({"IMAGE_HEIGHT": Integer(480), "IMAGE_WIDTH": Integer(640), "PATH": Integer(0)}), deploy: Deploy { machine: Some("A") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-video-capture/opencv_video_capture/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("tick"): Input { mapping: Timer { interval: 16ms }, queue_size: None }}, outputs: {DataId("image")} }, Node { id: NodeId("plot"), name: None, description: None, env: Some({"PLOT_HEIGHT": Integer(480), "PLOT_WIDTH": Integer(640)}), deploy: Deploy { machine: Some("B") }, operators: None, custom: None, operator: None, path: Some("/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py"), args: None, build: None, send_stdout_as: None, inputs: {DataId("image"): Input { mapping: User(UserInputMapping { source: NodeId("webcam"), output: DataId("image") }), queue_size: None }}, outputs: {} }] } working_dir="/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer"}: dora_coordinator::run: successfully spawned dataflow `01946171-c45e-7842-8aa9-027bba032e73`
2025-01-13T20:54:38.046292Z  INFO dora_coordinator: successfully send stop dataflow `01946171-c45e-7842-8aa9-027bba032e73` to all daemons
2025-01-13T20:54:46.336396Z  WARN dora_coordinator: failed to send heartbeat message to daemon at `A`

Caused by:
   0: failed to send heartbeat message to daemon
   1: Broken pipe (os error 32)

Location:
    /home/bogdan/.cargo/registry/src/index.crates.io-1cd66030c949c28d/dora-coordinator-0.3.8/src/lib.rs:702:10
2025-01-13T20:54:46.336586Z  WARN dora_coordinator: failed to send heartbeat message to daemon at `B`

Caused by:
   0: failed to send heartbeat message to daemon
   1: Broken pipe (os error 32)

Location:
    /home/bogdan/.cargo/registry/src/index.crates.io-1cd66030c949c28d/dora-coordinator-0.3.8/src/lib.rs:702:10
2025-01-13T20:54:46.336667Z ERROR dora_coordinator: Disconnecting daemons that failed watchdog: {"A", "B"}
2025-01-13T20:54:47.269406Z  INFO dora_coordinator: received ctrlc signal
2025-01-13T20:54:47.269722Z  INFO dora_coordinator: Destroying coordinator after receiving Ctrl-C signal

Machine A logs

2025-01-13T20:53:43.087092Z  INFO dora_daemon::coordinator: Connected to dora-coordinator at 192.168.1.86:53290
2025-01-13T20:54:44.495946Z  INFO run_inner{self.machine_id=A}: dora_daemon: received ctrlc signal -> stopping all dataflows
2025-01-13T20:54:46.041600Z  WARN run_inner{self.machine_id=A}: dora_daemon: received second ctrlc signal -> exit immediately

Machine B logs

2025-01-13T20:53:50.484363Z  INFO dora_daemon::coordinator: Connected to dora-coordinator at 192.168.1.86:53290
2025-01-13T20:53:53.350424Z  INFO run_inner{self.machine_id=B}: dora_daemon::spawn: spawning: "/Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora_computer/venv/bin/python" /Users/bogdanivanyuk/Desktop/RaspberryPi-car/dora-main/node-hub/opencv-plot/opencv_plot/main.py
2025-01-13T20:53:54.415500Z  INFO run_inner{self.machine_id=B}: dora_daemon: node `plot` is ready
2025-01-13T20:53:54.415558Z  INFO run_inner{self.machine_id=B}: dora_daemon::pending: all local nodes are ready (exit before subscribe: []), waiting for remote nodes
2025-01-13T20:54:40.379641Z  INFO run_inner{self.machine_id=B}: dora_daemon: received ctrlc signal -> stopping all dataflows
2025-01-13T20:54:40.621653Z  WARN run_inner{self.machine_id=B}: dora_daemon: received second ctrlc signal -> exit immediately
2025-01-13T20:54:40.622900Z  WARN dora_daemon::node_communication: failed to receive reply from daemon

Location:
    /Users/runner/work/dora/dora/binaries/daemon/src/node_communication/mod.rs:508:30
2025-01-13T20:54:40.646787Z  WARN dora_daemon: process 83441 was killed on drop because it was still running

@haixuanTao
Copy link
Collaborator

Okay, then I don't know.
@LyonRust @phil-opp maybe could help?

@LeonRust
Copy link
Collaborator

LeonRust commented Jan 15, 2025

@KyloRen1 Hey, What kind of development board is the armv7l using? And what OS version are you running? I can grab the same version, install it, and try out your method to see if I can replicate your results.

@LeonRust
Copy link
Collaborator

From your log, I see you're using a Raspberry Pi. I've got a Raspberry Pi 5 now, and I'm gonna test it out on that first to see how it goes.

@KyloRen1
Copy link
Author

@LyonRust thank you! Yes, I am using a Raspberry Pi 3 Model B with Raspbian GNU/Linux 11 (Bullseye). I am not sure, but this Raspberry Pi is still using a 32-bit system. I had quite a few complications installing dora there, but I eventually managed to do it.

@LeonRust
Copy link
Collaborator

I don't have a Raspberry Pi 3B right now. I'm planning to buy one first, install a 32-bit system, and set up the same environment as yours to debug. This might take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli CLI coordinator daemon python Python API
Projects
None yet
Development

No branches or pull requests

3 participants