Screen

Intro

Screen is a tool you can use to keep a session open on a server while you’re not logged in, this allows you to run long running jobs in the background without the fear of it exiting once you do eventually log off, are kicked off due to a network issue, etc.

Running/Detaching a Screen Session

To open a screen session all you have to is type ‘screen’ on the command line, this will open a new interactive session.

$ screen

All you need to do now is run your job and once that’s done you need to detach the screen, this will cause it to run in the background uninterrupted, to do this all you need to do is press ctrl+a d – (which is read ctrl and the letter ‘a’ together, and d after letting go of ctrl and ‘a’). This will lead you back into your previous non-screen session.  Now you can exit the server and the screen session will still be running.

Managing a Screen Session

Now to manage your screen session all you need to is type the following ‘screen -ls’

$ screen -ls
There is a screen on:
2431.pts-15.host (Detached)
1 Socket in /var/run/screen/S-user.

This will list all the current Screen sessions you have opened, to reattach on to the session again, all you need to do is type ‘screen -r’ if there is one screen session open or you can be more specific (helpful when you have multiple screen sessions open) ‘screen -r 2431’

$ screen -ls
There are screens on:
3273.pts-15.host (Detached)
2431.pts-15.host (Detached)
2 Sockets in /var/run/screen/S-user.

$ screen -r 2431

Killing a Screen Session

To kill a screen session you can either exit from the screen session itself or type the following ‘kill 2431’ where 2431 is the process number of the screen session

$ screen -ls
There are screens on:
3273.pts-15.host (Detached)
2431.pts-15.host (Detached)
2 Sockets in /var/run/screen/S-user.

$ kill 2431
$ screen -ls
There is a screen on:
3273.pts-15.host (Detached)
1 Socket in /var/run/screen/S-user.

Disclaimer

Always make sure to kill screen sessions if they’re not in use or it affects the server’s performance, and do not run this on the jump servers.

Scratch space

We offer scratch space (temporary disk space) for EECS MSc students when they the need to store a big amount of data for a short period of time. For example for Deep Learning projects, a big dataset might not fit in the student’s home directory (please see quotas).

The scratch space is NFS-based, exported over the network to all student desktops and student Compute servers. That means that if you add some data from a student desktop into your folder in the scratch space, it will be available when you SSH into a server.

The scratch space IS NOT being backed up , use it only as a temporary space for non critical data

Available scratch spaces (Students only)

Name Access Status
/import/scratch-01 MSc, Academics ONLINE
/import/scratch-02 MSc, Academics ONLINE
/import/scratch-03 MSc, Academics ONLINE
/import/scratch MSc, Academics OFFLINE (moved to /import/scratch-01)

How to request it

MSc students can request scratch space from EECS Systems by raising a Helpdesk Ticket , providing the following infromation:

  • Your Supervisor’s details
  • Your EECS username (your ITL login)
  • How much disk space you will require.

How to use it

The scratch disks are available under the ‘/import/’ folder from any EECS desktop/server . After the students request an additional scratch space, they will find a folder named after your EECS username and only they have access to that folder. The names and the status of the available scratch spaces are defined at the table bellow:

 

 

CNTK

What is the Cognitive Toolkit?

The Cognitive Toolkit, formerly known as CNTK, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs.

Running the Cognitive Toolkit

  • Run the container image. A typical command to launch the container is:
    nvidia-docker run -it --rm -v local_dir:container_dir nvcr.io/nvidia/cntk:<xx.xx>
    
    

    Where:

    • -it means run in interactive mode
    • --rm will delete the container when finished
    • -v is the mounting directory
    • local_dir is the directory or file from your host system (absolute path) that you want to access from inside your container. For example, the local_dir in the following path is /home/jsmith/data/mnist.
      -v /home/jsmith/data/mnist:/data/mnist
      
      

      If you are inside the container, for example, ls /data/mnist, you will see the same files as if you issued the ls /home/jsmith/data/mnist command from outside the container.

    • container_dir is the target directory when you are inside your container. For example, /data/mnist is the target directory in the example:
      -v /home/jsmith/data/mnist:/data/mnist
      
      
    • <xx.xx> is the tag. For example, 17.06.

    a. When running on a single GPU, the Cognitive Toolkit can be invoked using a command similar to the following: cntk configFile=myscript.cntk ...

    
    

    b. When running on multiple GPUs, run the Cognitive Toolkit through MPI. The following example uses four GPUs, numbered 0..3, for training:

    export OMP_NUM_THREADS=10
    export CUDA_DEVICE_ORDER=PCI_BUS_ID
    export CUDA_VISIBLE_DEVICES=0,1,2,3
    mpirun --allow-run-as-root --oversubscribe --npernode 4 \
           -x OMP_NUM_THREADS -x CUDA_DEVICE_ORDER -x CUDA_VISIBLE_DEVICES \
           cntk configFile=myscript.cntk ...
    
    

    c. When running all eight GPUs of DGX-1 together is even more simple:

    export OMP_NUM_THREADS=10
    mpirun --allow-run-as-root --oversubscribe --npernode 8 \
           -x OMP_NUM_THREADS cntk configFile=myscript.cntk ...
    
    

    When running the Cognitive Toolkit containers, it is important to include at least the following options:

    nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... nvcr.io/nvidia/cntk:17.02 ...
    
    

    You might want to pull in data and model descriptions from locations outside the container for use by the Cognitive Toolkit. To accomplish this, the easiest method is to mount one or more host directories as Docker data volumes. You have pulled the latest files and run the container image.

Note: In order to share data between ranks, NCCL may require shared system memory for IPC and pinned (page-locked) system memory resources. The operating system’s limits on these resources may need to be increased accordingly. Refer to your system’s documentation for details. In particular, Docker containers default to limited shared and pinned memory resources. When using NCCL inside a container, it is recommended that you increase these resources by issuing:

   --shm-size=1g --ulimit memlock=-1

in the command line to **`nvidia-docker run`**.

  1. See /workspace/README.md inside the container for information on customizing your the Cognitive Toolkit image.

Suggested Reading

For the latest Release Notes, see the Cognitive Toolkit Release Notes Documentation website.

For more information about the Cognitive Toolkit, including tutorials, documentation, and examples, see the CNTK wiki.