You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am getting a runtime error when trying to run on a MacBook Pro with just the CPU. It ends up being something like this:
/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[000] v n l2:0.0318 l1:0.1383 etap:0:0:0.00 eta:1:12:36.55 4 0
Process Process-4:8 l1:0.1383 etap:0:0:0.00 eta:1:12:36.55 4 1
Traceback (most recent call last):
File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "./train_and_evaluate_tasks.py", line 32, in run_trainer
t.train()
File "/Users/randallsmith/projects/git/relighting/trainers/gan_trainer.py", line 318, in train
self.task_cfg)
File "/Users/randallsmith/projects/git/relighting/trainers/gan_trainer.py", line 99, in perform_gan_step
loss_generator.backward()
File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply
return self._forward_cls.backward(self, *args)
File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/autograd/function.py", line 189, in wrapper
outputs = fn(ctx, *args)
File "/opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/inplace_abn/functions.py", line 112, in backward
y_act, dy_act, weight, bias, ctx.eps, ctx.activation, ctx.activation_param)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. (view at /Users/distiller/project/conda/conda-bld/pytorch_1579022061893/work/aten/src/ATen/native/TensorShape.cpp:1175)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x10d279267 in libc10.dylib)
frame #1: at::native::view(at::Tensor const&, c10::ArrayRef<long long>) + 827 (0x11862462b in libtorch.dylib)
frame #2: at::CPUType::(anonymous namespace)::view(at::Tensor const&, c10::ArrayRef<long long>) + 61 (0x1188457ed in libtorch.dylib)
frame #3: c10::detail::wrap_kernel_functor_unboxed_<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long long>), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long long> > >, at::Tensor (at::Tensor const&, c10::ArrayRef<long long>)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long long>) + 24 (0x11883cef8 in libtorch.dylib)
frame #4: at::Tensor c10::KernelFunction::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(at::Tensor const&, c10::ArrayRef<long long>) const + 63 (0x1182a607f in libtorch.dylib)
frame #5: std::__1::result_of<at::Tensor (ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>::type c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > >::read<at::Tensor c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>(at::Tensor&&) const + 175 (0x1182a5fcf in libtorch.dylib)
frame #6: std::__1::result_of<at::Tensor (c10::DispatchTable const&)>::type c10::LeftRight<c10::DispatchTable>::read<at::Tensor c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(c10::DispatchTable const&)>(at::Tensor&&) const + 115 (0x1182a5eb3 in libtorch.dylib)
frame #7: at::Tensor::view(c10::ArrayRef<long long>) const + 341 (0x1182a2ad5 in libtorch.dylib)
frame #8: torch::autograd::VariableType::(anonymous namespace)::view(at::Tensor const&, c10::ArrayRef<long long>) + 1566 (0x11ad0fdae in libtorch.dylib)
frame #9: c10::detail::wrap_kernel_functor_unboxed_<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long long>), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long long> > >, at::Tensor (at::Tensor const&, c10::ArrayRef<long long>)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long long>) + 24 (0x11883cef8 in libtorch.dylib)
frame #10: at::Tensor c10::KernelFunction::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(at::Tensor const&, c10::ArrayRef<long long>) const + 63 (0x11f28decf in _backend.cpython-37m-darwin.so)
frame #11: std::__1::result_of<at::Tensor (ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>::type c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > >::read<at::Tensor c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::__1::hash<c10::TensorTypeId>, std::__1::equal_to<c10::TensorTypeId>, std::__1::allocator<std::__1::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>(at::Tensor&&) const + 168 (0x11f28de18 in _backend.cpython-37m-darwin.so)
frame #12: std::__1::result_of<at::Tensor (c10::DispatchTable const&)>::type c10::LeftRight<c10::DispatchTable>::read<at::Tensor c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<long long> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<long long>) const::'lambda'(c10::DispatchTable const&)>(at::Tensor&&) const + 118 (0x11f28dd06 in _backend.cpython-37m-darwin.so)
frame #13: at::Tensor::view(c10::ArrayRef<long long>) const + 97 (0x11f28dac1 in _backend.cpython-37m-darwin.so)
frame #14: normalize_shape(at::Tensor const&) + 145 (0x11f28da31 in _backend.cpython-37m-darwin.so)
frame #15: std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> backward_reduce_impl<float, (Activation)0>(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, float) + 576 (0x11f28a1e0 in _backend.cpython-37m-darwin.so)
frame #16: backward_reduce_cpu(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float) + 194 (0x11f275d62 in _backend.cpython-37m-darwin.so)
frame #17: backward_reduce(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float) + 783 (0x11f28f73f in _backend.cpython-37m-darwin.so)
frame #18: void pybind11::cpp_function::initialize<std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*&)(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float), std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float, pybind11::name, pybind11::scope, pybind11::sibling, char [32]>(std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*&)(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float), std::__1::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (*)(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&, float, Activation, float), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [32])::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const + 109 (0x11f2ad70d in _backend.cpython-37m-darwin.so)
frame #19: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3088 (0x11f29d960 in _backend.cpython-37m-darwin.so)
<omitting python frames>
frame #32: torch::autograd::PyNode::apply(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> >&&) + 578 (0x1176fb5f2 in libtorch_python.dylib)
frame #33: torch::autograd::Node::operator()(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> >&&) + 464 (0x11aed48e0 in libtorch.dylib)
frame #34: torch::autograd::Engine::evaluate_function(std::__1::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 1381 (0x11aecc495 in libtorch.dylib)
frame #35: torch::autograd::Engine::thread_main(std::__1::shared_ptr<torch::autograd::GraphTask> const&, bool) + 532 (0x11aecb364 in libtorch.dylib)
frame #36: torch::autograd::Engine::thread_init(int) + 152 (0x11aecb118 in libtorch.dylib)
frame #37: torch::autograd::python::PythonEngine::thread_init(int) + 44 (0x1176f5cec in libtorch_python.dylib)
frame #38: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (torch::autograd::Engine::*)(int), torch::autograd::Engine*, int> >(void*) + 66 (0x11aed8a72 in libtorch.dylib)
frame #39: _pthread_start + 148 (0x7fff67ec7e65 in libsystem_pthread.dylib)
frame #40: thread_start + 15 (0x7fff67ec383b in libsystem_pthread.dylib)
I removed any calls to view and used contiguous to see if I could force the tensors to be contiguous, but it seems it gets to backward_reduce() and crashes there. Is there any issue with running this on a CPU only machine? I'm doing this because I want to do some trial runs on my local machine before trying it out on the remote machine.
The text was updated successfully, but these errors were encountered:
I tried reproducing the issue on my machine with no success (just run a forward / backward sequence on CPU). Can you please come up with a minimal reproducing example for me to debug?
PS: we don't have access to MacOS machines, so if this is something MacOS specific I'm afraid I won't be able to help you.
I am getting a runtime error when trying to run on a MacBook Pro with just the CPU. It ends up being something like this:
I removed any calls to view and used contiguous to see if I could force the tensors to be contiguous, but it seems it gets to backward_reduce() and crashes there. Is there any issue with running this on a CPU only machine? I'm doing this because I want to do some trial runs on my local machine before trying it out on the remote machine.
The text was updated successfully, but these errors were encountered: