1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| WARNING:torch.distributed.elastic.agent.server.api:Received 1 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4156332 closing signal SIGHUP WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4156333 closing signal SIGHUP Traceback (most recent call last): File “/home/user2/miniconda/envs/matting/lib/python3.7/runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “/home/user2/miniconda/envs/matting/lib/python3.7/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/launch.py”, line 193, in main() File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/launch.py”, line 189, in main launch(args) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/launch.py”, line 174, in launch run(args) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/run.py”, line 713, in run )(*cmd_args) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/launcher/api.py”, line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/launcher/api.py”, line 252, in launch_agent result = agent.run() File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/elastic/metrics/api.py”, line 125, in wrapper result = f(*args, **kwargs) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/elastic/agent/server/api.py”, line 709, in run result = self._invoke_run(role) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/elastic/agent/server/api.py”, line 843, in _invoke_run time.sleep(monitor_interval) File “/home/user2/miniconda/envs/matting/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/api.py”, line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1100295 got signal: 1
|