Deep learning dev environment

I wonder if it is just me who struggles so much with ML development environment.

There is no way I can do any development locally even if I had a GPU on my laptop: the models are too giant and reproducing the environment and data is a pain in the ass. So we work on servers and clusters.

My work environment is pretty much one sshfs mapping, one ssh+screen terminal and jupyter notebook for short experiments. It is a mess!

If I’m working in a notebook then I only have code that can be run in that notebook. There is no class structure or anything, just a giant pile of global variables and functions.

If I’m working with an actual project, especially someone else’s I have no decent way of debugging. The best I could come up with is pdb.set_trace() and then running it in jupyter through %run. So that’s fun when the code takes 20-ish minutes to load.

At some point I found a way to debug remotely through PyCharm and ssh, but it only works if there is a direct ssh connection, no jumps and containers. And now the thing started to complain about my remote filesystem case-sensitivity mismatch, gah.

Don’t even get me started with restrictions on sshfs and notebooks on some servers.

There should be a better way. Or am I just dumb and true devs figure debug models in their heads and commit correct code from the first try?