Deep learning dev environment

I wonder if it is just me who struggles so much with ML development environment.

There is no way I can do any development locally even if I had a GPU on my laptop: the models are too giant and reproducing the environment and data is a pain in the ass. So we work on servers and clusters.

My work environment is pretty much one sshfs mapping, one ssh+screen terminal and jupyter notebook for short experiments. It is a mess!

If I’m working in a notebook then I only have code that can be run in that notebook. There is no class structure or anything, just a giant pile of global variables and functions.

If I’m working with an actual project, especially someone else’s I have no decent way of debugging. The best I could come up with is pdb.set_trace() and then running it in jupyter through %run. So that’s fun when the code takes 20-ish minutes to load.

At some point I found a way to debug remotely through PyCharm and ssh, but it only works if there is a direct ssh connection, no jumps and containers. And now the thing started to complain about my remote filesystem case-sensitivity mismatch, gah.

Don’t even get me started with restrictions on sshfs and notebooks on some servers.

There should be a better way. Or am I just dumb and true devs figure debug models in their heads and commit correct code from the first try?

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s