Preface

Why Linux

Linux is the most common platform for scientific computing and deployment of data science tools.

Distributions of Linux


Linux shells

Shells

  • A shell translates commands to OS instructions.

  • Most commonly used shells include bash, csh, tcsh, zsh, etc.

  • The default shell in MacOS changed from bash to zsh since MacOS v10.15.

  • Sometimes a command and a script does not run simply because it’s written for another shell.

  • We mostly use bash shell commands in this class.

  • Determine the current shell:

    echo $SHELL
    /bin/bash
  • List available shells:

    cat /etc/shells
    /bin/sh
    /bin/bash
    /usr/bin/sh
    /usr/bin/bash
  • Change to another shell:

    exec bash -l

    The -l option indicates it should be a login shell.

  • Change your login shell permanently:

    chsh -s /bin/bash [USERNAME]

    Then log out and log in.

Command history and bash completion

We can navigate to previous/next commands by the upper and lower keys, or maintain a command history stack using pushd and popd commands.

Bash provides the following standard completion for the Linux users by default. Much less typing errors and time!

  • Pathname completion.

  • Filename completion.

  • Variablename completion: echo $[TAB][TAB].

  • Username completion: cd ~[TAB][TAB].

  • Hostname completion ssh huazhou@[TAB][TAB].

  • It can also be customized to auto-complete other stuff such as options and command’s arguments. Google bash completion for more information.

man is man’s best friend

Online help for shell commands: man [COMMANDNAME].

# display documentation for the ls command
man ls
LS(1)                            User Commands                           LS(1)



NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort  is  speci‐
       fied.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print C-style escapes for nongraphic characters

       --block-size=SIZE
              scale sizes by SIZE before printing them; e.g., '--block-size=M'
              prints sizes in units of 1,048,576 bytes; see SIZE format below

       -B, --ignore-backups
              do not list implied entries ending with ~

       -c     with -lt: sort by, and show, ctime (time of last modification of
              file status information); with -l: show ctime and sort by  name;
              otherwise: sort by ctime, newest first

       -C     list entries by columns

       --color[=WHEN]
              colorize  the  output;  WHEN can be 'never', 'auto', or 'always'
              (the default); more info below

       -d, --directory
              list directories themselves, not their contents

       -D, --dired
              generate output designed for Emacs' dired mode

       -f     do not sort, enable -aU, disable -ls --color

       -F, --classify
              append indicator (one of */=>@|) to entries

       --file-type
              likewise, except do not append '*'

       --format=WORD
              across -x, commas -m, horizontal -x, long -l, single-column  -1,
              verbose -l, vertical -C

       --full-time
              like -l --time-style=full-iso

       -g     like -l, but do not list owner

       --group-directories-first
              group directories before files;

              can   be  augmented  with  a  --sort  option,  but  any  use  of
              --sort=none (-U) disables grouping

       -G, --no-group
              in a long listing, don't print group names

       -h, --human-readable
              with -l, print sizes in human readable format (e.g., 1K 234M 2G)

       --si   likewise, but use powers of 1000 not 1024

       -H, --dereference-command-line
              follow symbolic links listed on the command line

       --dereference-command-line-symlink-to-dir
              follow each command line symbolic link

              that points to a directory

       --hide=PATTERN
              do not list implied entries matching shell  PATTERN  (overridden
              by -a or -A)

       --indicator-style=WORD
              append indicator with style WORD to entry names: none (default),
              slash (-p), file-type (--file-type), classify (-F)

       -i, --inode
              print the index number of each file

       -I, --ignore=PATTERN
              do not list implied entries matching shell PATTERN

       -k, --kibibytes
              default to 1024-byte blocks for disk usage

       -l     use a long listing format

       -L, --dereference
              when showing file information for a symbolic link, show informa‐
              tion  for  the file the link references rather than for the link
              itself

       -m     fill width with a comma separated list of entries

       -n, --numeric-uid-gid
              like -l, but list numeric user and group IDs

       -N, --literal
              print raw entry names (don't treat e.g. control characters  spe‐
              cially)

       -o     like -l, but do not list group information

       -p, --indicator-style=slash
              append / indicator to directories

       -q, --hide-control-chars
              print ? instead of nongraphic characters

       --show-control-chars
              show nongraphic characters as-is (the default, unless program is
              'ls' and output is a terminal)

       -Q, --quote-name
              enclose entry names in double quotes

       --quoting-style=WORD
              use quoting style WORD for entry names: literal, locale,  shell,
              shell-always, c, escape

       -r, --reverse
              reverse order while sorting

       -R, --recursive
              list subdirectories recursively

       -s, --size
              print the allocated size of each file, in blocks

       -S     sort by file size

       --sort=WORD
              sort  by  WORD instead of name: none (-U), size (-S), time (-t),
              version (-v), extension (-X)

       --time=WORD
              with -l, show time as WORD instead of default modification time:
              atime or access or use (-u) ctime or status (-c); also use spec‐
              ified time as sort key if --sort=time

       --time-style=STYLE
              with -l, show times using style STYLE: full-iso, long-iso,  iso,
              locale,  or  +FORMAT;  FORMAT  is interpreted like in 'date'; if
              FORMAT  is  FORMAT1<newline>FORMAT2,  then  FORMAT1  applies  to
              non-recent  files  and FORMAT2 to recent files; if STYLE is pre‐
              fixed with 'posix-', STYLE takes effect only outside  the  POSIX
              locale

       -t     sort by modification time, newest first

       -T, --tabsize=COLS
              assume tab stops at each COLS instead of 8

       -u     with  -lt:  sort by, and show, access time; with -l: show access
              time and sort by name; otherwise: sort by access time

       -U     do not sort; list entries in directory order

       -v     natural sort of (version) numbers within text

       -w, --width=COLS
              assume screen width instead of current value

       -x     list entries by lines instead of by columns

       -X     sort alphabetically by entry extension

       -1     list one file per line

       SELinux options:

       --lcontext
              Display security context.   Enable -l. Lines  will  probably  be
              too wide for most displays.

       -Z, --context
              Display  security context so it fits on most displays.  Displays
              only mode, user, group, security context and file name.

       --scontext
              Display only security context and file name.

       --help display this help and exit

       --version
              output version information and exit

       SIZE is an integer and optional unit (example:  10M  is  10*1024*1024).
       Units  are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (pow‐
       ers of 1000).

       Using color to distinguish file types is disabled both by  default  and
       with  --color=never.  With --color=auto, ls emits color codes only when
       standard output is connected to a terminal.  The LS_COLORS  environment
       variable can change the settings.  Use the dircolors command to set it.

   Exit status:
       0      if OK,

       1      if minor problems (e.g., cannot access subdirectory),

       2      if serious trouble (e.g., cannot access command-line argument).

       GNU  coreutils  online  help:  <http://www.gnu.org/software/coreutils/>
       Report ls translation bugs to <http://translationproject.org/team/>

AUTHOR
       Written by Richard M. Stallman and David MacKenzie.

COPYRIGHT
       Copyright © 2013 Free Software Foundation, Inc.   License  GPLv3+:  GNU
       GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
       This  is  free  software:  you  are free to change and redistribute it.
       There is NO WARRANTY, to the extent permitted by law.

SEE ALSO
       The full documentation for ls is maintained as a  Texinfo  manual.   If
       the  info and ls programs are properly installed at your site, the com‐
       mand

              info coreutils 'ls invocation'

       should give you access to the complete manual.



GNU coreutils 8.22               November 2020                           LS(1)

Work with text files

View/peek text files

  • cat prints the contents of a file:

    cat runSim.R
    ## parsing command arguments
    for (arg in commandArgs(TRUE)) {
      eval(parse(text=arg))
    }
    
    ## check if a given integer is prime
    isPrime = function(n) {
      if (n <= 3) {
        return (TRUE)
      }
      if (any((n %% 2:floor(sqrt(n))) == 0)) {
        return (FALSE)
      }
      return (TRUE)
    }
    
    ## estimate mean only using observation with prime indices
    estMeanPrimes = function (x) {
      n = length(x)
      ind = sapply(1:n, isPrime)
      return (mean(x[ind]))
    }
    
    # simulate data
    x = rnorm(n)
    
    # estimate mean
    estMeanPrimes(x)

  • head prints the first 10 lines of a file:

    head runSim.R
    ## parsing command arguments
    for (arg in commandArgs(TRUE)) {
      eval(parse(text=arg))
    }
    
    ## check if a given integer is prime
    isPrime = function(n) {
      if (n <= 3) {
        return (TRUE)
      }

    head -l prints the first \(l\) lines of a file:

    head -15 runSim.R
    ## parsing command arguments
    for (arg in commandArgs(TRUE)) {
      eval(parse(text=arg))
    }
    
    ## check if a given integer is prime
    isPrime = function(n) {
      if (n <= 3) {
        return (TRUE)
      }
      if (any((n %% 2:floor(sqrt(n))) == 0)) {
        return (FALSE)
      }
      return (TRUE)
    }
  • tail prints the last 10 lines of a file:

    tail runSim.R
      n = length(x)
      ind = sapply(1:n, isPrime)
      return (mean(x[ind]))
    }
    
    # simulate data
    x = rnorm(n)
    
    # estimate mean
    estMeanPrimes(x)

    tail -l prints the last \(l\) lines of a file:

    tail -15 runSim.R
      return (TRUE)
    }
    
    ## estimate mean only using observation with prime indices
    estMeanPrimes = function (x) {
      n = length(x)
      ind = sapply(1:n, isPrime)
      return (mean(x[ind]))
    }
    
    # simulate data
    x = rnorm(n)
    
    # estimate mean
    estMeanPrimes(x)

  • Questions:
    • How to see the 11th line of the file and nothing else?
    • What about the 11th to the last line?

Piping and redirection

  • | sends output from one command as input of another command.

    ls -l | head -5
    total 5904
    -rw-r--r--. 1 huazhou huazhou     258 Jan  7 00:55 autoSim.R
    -rw-r--r--. 1 huazhou huazhou  110345 Jan  7 00:55 Emacs_Reference_Card.pdf
    -rw-r--r--. 1 huazhou huazhou  157353 Jan  7 00:55 IDRE_Winter_2019_Workshops.pdf
    -rw-r--r--. 1 huazhou huazhou  321281 Jan  7 00:55 key_authentication_1.png
  • > directs output from one command to a file.

  • >> appends output from one command to a file.

  • < reads input from a file.

  • Combinations of shell commands (grep, sed, awk, …), piping and redirection, and regular expressions allow us pre-process and reformat huge text files efficiently.

  • See HW1.

less is more; more is less

  • more browses a text file screen by screen (only downwards). Scroll down one page (paging) by pressing the spacebar; exit by pressing the q key.

  • less is also a pager, but has more functionalities, e.g., scroll upwards and downwards through the input.

  • less doesn’t need to read the whole file, i.e., it loads files faster than more.

grep

grep prints lines that match an expression:

  • Show lines that contain string CentOS:

    # quotes not necessary if not a regular expression
    grep 'CentOS' linux.Rmd
    - RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
    - The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
    - Show lines that contain string `CentOS`:
        grep 'CentOS' linux.Rmd
        grep 'CentOS' *.Rmd
        grep -n 'CentOS' linux.Rmd
    - Replace `CentOS` by `RHEL` in a text file:
        sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
  • Search multiple text files:

    grep 'CentOS' *.Rmd
    - RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
    - The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
    - Show lines that contain string `CentOS`:
        grep 'CentOS' linux.Rmd
        grep 'CentOS' *.Rmd
        grep -n 'CentOS' linux.Rmd
    - Replace `CentOS` by `RHEL` in a text file:
        sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
  • Show matching line numbers:

    grep -n 'CentOS' linux.Rmd
    47:- RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
    53:- The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
    345:- Show lines that contain string `CentOS`:
    348:    grep 'CentOS' linux.Rmd
    353:    grep 'CentOS' *.Rmd
    358:    grep -n 'CentOS' linux.Rmd
    375:- Replace `CentOS` by `RHEL` in a text file:
    377:    sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
  • Find all files in current directory with .png extension:

    ls | grep '.png$'
    key_authentication_1.png
    key_authentication_2.png
    linux_directory_structure.png
    linux_filepermission_oct.png
    linux_filepermission.png
    redhat_kills_centos.png
    Richard_Stallman_2013.png
    screenshot_top.png
  • Find all directories in the current directory:

    ls -al | grep '^d'
    drwxr-xr-x. 2 huazhou huazhou    4096 Jan 11 22:22 .
    drwxr-xr-x. 6 huazhou huazhou    4096 Jan  7 00:55 ..

sed

  • sed is a stream editor.

  • Replace CentOS by RHEL in a text file:

    sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
    - RHEL/RHEL is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
    - The teaching server for this class runs RHEL 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
    - Show lines that contain string `RHEL`:
        grep 'RHEL' linux.Rmd
        grep 'RHEL' *.Rmd
        grep -n 'RHEL' linux.Rmd
    - Replace `RHEL` by `RHEL` in a text file:
        sed 's/RHEL/RHEL/' linux.Rmd | grep RHEL

awk

  • awk is a filter and report writer.

  • First let’s display the content of the file /etc/passwd:

    cat /etc/passwd
    root:x:0:0:root:/root:/bin/bash
    bin:x:1:1:bin:/bin:/sbin/nologin
    daemon:x:2:2:daemon:/sbin:/sbin/nologin
    adm:x:3:4:adm:/var/adm:/sbin/nologin
    lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
    sync:x:5:0:sync:/sbin:/bin/sync
    shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
    halt:x:7:0:halt:/sbin:/sbin/halt
    mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
    operator:x:11:0:operator:/root:/sbin/nologin
    games:x:12:100:games:/usr/games:/sbin/nologin
    ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
    nobody:x:99:99:Nobody:/:/sbin/nologin
    systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
    dbus:x:81:81:System message bus:/:/sbin/nologin
    polkitd:x:999:998:User for polkitd:/:/sbin/nologin
    sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
    postfix:x:89:89::/var/spool/postfix:/sbin/nologin
    chrony:x:998:996::/var/lib/chrony:/sbin/nologin
    huazhou:x:1000:1001::/home/huazhou:/bin/bash
    tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
    rstudio-server:x:997:995::/home/rstudio-server:/bin/bash
    shiny:x:996:1002::/home/shiny:/bin/sh
    maschepps:x:1001:1003::/home/maschepps:/bin/bash
    aarbolante:x:1002:1004::/home/aarbolante:/bin/bash
    rozeta:x:1003:1005::/home/rozeta:/bin/bash
    yanlongbai975:x:1004:1006::/home/yanlongbai975:/bin/bash
    ritazxcai:x:1005:1007::/home/ritazxcai:/bin/bash
    may.lyn.cheah.phd:x:1006:1008::/home/may.lyn.cheah.phd:/bin/bash
    yifan00:x:1007:1009::/home/yifan00:/bin/bash
    yuruidong99:x:1008:1010::/home/yuruidong99:/bin/bash
    fangyi:x:1009:1011::/home/fangyi:/bin/bash
    sharonfeng:x:1010:1012::/home/sharonfeng:/bin/bash
    rfisher2022:x:1011:1013::/home/rfisher2022:/bin/bash
    seamusgallivan:x:1012:1014::/home/seamusgallivan:/bin/bash
    ehodzic:x:1013:1015::/home/ehodzic:/bin/bash
    ionahu08:x:1014:1016::/home/ionahu08:/bin/bash
    lillyhuang25:x:1015:1017::/home/lillyhuang25:/bin/bash
    jamshidian:x:1016:1018::/home/jamshidian:/bin/bash
    jonathanking192:x:1017:1019::/home/jonathanking192:/bin/bash
    yllai:x:1018:1020::/home/yllai:/bin/bash
    djlavine:x:1019:1021::/home/djlavine:/bin/bash
    blei001:x:1020:1022::/home/blei001:/bin/bash
    lifengxue2000:x:1021:1023::/home/lifengxue2000:/bin/bash
    yuyuanlin:x:1022:1024::/home/yuyuanlin:/bin/bash
    javim013:x:1023:1025::/home/javim013:/bin/bash
    tokramm:x:1024:1026::/home/tokramm:/bin/bash
    tomokiokuno0528:x:1025:1027::/home/tomokiokuno0528:/bin/bash
    qny2021:x:1026:1028::/home/qny2021:/bin/bash
    yuhang886688:x:1027:1029::/home/yuhang886688:/bin/bash
    jqin10:x:1028:1030::/home/jqin10:/bin/bash
    mimi327:x:1029:1031::/home/mimi327:/bin/bash
    nataliesisto:x:1030:1032::/home/nataliesisto:/bin/bash
    hangsun:x:1031:1033::/home/hangsun:/bin/bash
    jiahaotian0702:x:1033:1035::/home/jiahaotian0702:/bin/bash
    wanghw:x:1034:1036::/home/wanghw:/bin/bash
    wongj1721:x:1035:1037::/home/wongj1721:/bin/bash
    lsyang:x:1036:1038::/home/lsyang:/bin/bash
    khyeh0816:x:1037:1039::/home/khyeh0816:/bin/bash
    younghograd:x:1038:1040::/home/younghograd:/bin/bash
    yueyu99:x:1039:1041::/home/yueyu99:/bin/bash
    qzhang42:x:1040:1042::/home/qzhang42:/bin/bash
    zixiz:x:1041:1043::/home/zixiz:/bin/bash
    naying:x:1042:1044::/home/naying:/bin/bash
    capj245:x:1043:1045::/home/capj245:/bin/bash
    rsuseno:x:1044:1046::/home/rsuseno:/bin/bash
    inclassdemo:x:1045:1047::/home/inclassdemo:/bin/bash

    Each line contains fields (1) user name, (2) password, (3) user ID, (4) group ID, (5) user ID info, (6) home directory, and (7) command shell, separated by :.

  • Print sorted list of login names:

    awk -F: '{ print $1 }' /etc/passwd | sort | head -10
    aarbolante
    adm
    bin
    blei001
    capj245
    chrony
    daemon
    dbus
    djlavine
    ehodzic
  • Print number of lines in a file, as NR stands for Number of Rows:

    awk 'END { print NR }' /etc/passwd
    67

    or

    wc -l /etc/passwd
    67 /etc/passwd

    or (not displaying file name)

    wc -l < /etc/passwd
    67
  • Print login names with UID in range 1000-1035:

    awk -F: '{if ($3 >= 1000 && $3 <= 1047) print}' /etc/passwd
    huazhou:x:1000:1001::/home/huazhou:/bin/bash
    maschepps:x:1001:1003::/home/maschepps:/bin/bash
    aarbolante:x:1002:1004::/home/aarbolante:/bin/bash
    rozeta:x:1003:1005::/home/rozeta:/bin/bash
    yanlongbai975:x:1004:1006::/home/yanlongbai975:/bin/bash
    ritazxcai:x:1005:1007::/home/ritazxcai:/bin/bash
    may.lyn.cheah.phd:x:1006:1008::/home/may.lyn.cheah.phd:/bin/bash
    yifan00:x:1007:1009::/home/yifan00:/bin/bash
    yuruidong99:x:1008:1010::/home/yuruidong99:/bin/bash
    fangyi:x:1009:1011::/home/fangyi:/bin/bash
    sharonfeng:x:1010:1012::/home/sharonfeng:/bin/bash
    rfisher2022:x:1011:1013::/home/rfisher2022:/bin/bash
    seamusgallivan:x:1012:1014::/home/seamusgallivan:/bin/bash
    ehodzic:x:1013:1015::/home/ehodzic:/bin/bash
    ionahu08:x:1014:1016::/home/ionahu08:/bin/bash
    lillyhuang25:x:1015:1017::/home/lillyhuang25:/bin/bash
    jamshidian:x:1016:1018::/home/jamshidian:/bin/bash
    jonathanking192:x:1017:1019::/home/jonathanking192:/bin/bash
    yllai:x:1018:1020::/home/yllai:/bin/bash
    djlavine:x:1019:1021::/home/djlavine:/bin/bash
    blei001:x:1020:1022::/home/blei001:/bin/bash
    lifengxue2000:x:1021:1023::/home/lifengxue2000:/bin/bash
    yuyuanlin:x:1022:1024::/home/yuyuanlin:/bin/bash
    javim013:x:1023:1025::/home/javim013:/bin/bash
    tokramm:x:1024:1026::/home/tokramm:/bin/bash
    tomokiokuno0528:x:1025:1027::/home/tomokiokuno0528:/bin/bash
    qny2021:x:1026:1028::/home/qny2021:/bin/bash
    yuhang886688:x:1027:1029::/home/yuhang886688:/bin/bash
    jqin10:x:1028:1030::/home/jqin10:/bin/bash
    mimi327:x:1029:1031::/home/mimi327:/bin/bash
    nataliesisto:x:1030:1032::/home/nataliesisto:/bin/bash
    hangsun:x:1031:1033::/home/hangsun:/bin/bash
    jiahaotian0702:x:1033:1035::/home/jiahaotian0702:/bin/bash
    wanghw:x:1034:1036::/home/wanghw:/bin/bash
    wongj1721:x:1035:1037::/home/wongj1721:/bin/bash
    lsyang:x:1036:1038::/home/lsyang:/bin/bash
    khyeh0816:x:1037:1039::/home/khyeh0816:/bin/bash
    younghograd:x:1038:1040::/home/younghograd:/bin/bash
    yueyu99:x:1039:1041::/home/yueyu99:/bin/bash
    qzhang42:x:1040:1042::/home/qzhang42:/bin/bash
    zixiz:x:1041:1043::/home/zixiz:/bin/bash
    naying:x:1042:1044::/home/naying:/bin/bash
    capj245:x:1043:1045::/home/capj245:/bin/bash
    rsuseno:x:1044:1046::/home/rsuseno:/bin/bash
    inclassdemo:x:1045:1047::/home/inclassdemo:/bin/bash
  • Print login names and log-in shells in comma-separated format:

    awk -F: '{OFS = ","} {print $1, $7}' /etc/passwd
    root,/bin/bash
    bin,/sbin/nologin
    daemon,/sbin/nologin
    adm,/sbin/nologin
    lp,/sbin/nologin
    sync,/bin/sync
    shutdown,/sbin/shutdown
    halt,/sbin/halt
    mail,/sbin/nologin
    operator,/sbin/nologin
    games,/sbin/nologin
    ftp,/sbin/nologin
    nobody,/sbin/nologin
    systemd-network,/sbin/nologin
    dbus,/sbin/nologin
    polkitd,/sbin/nologin
    sshd,/sbin/nologin
    postfix,/sbin/nologin
    chrony,/sbin/nologin
    huazhou,/bin/bash
    tss,/sbin/nologin
    rstudio-server,/bin/bash
    shiny,/bin/sh
    maschepps,/bin/bash
    aarbolante,/bin/bash
    rozeta,/bin/bash
    yanlongbai975,/bin/bash
    ritazxcai,/bin/bash
    may.lyn.cheah.phd,/bin/bash
    yifan00,/bin/bash
    yuruidong99,/bin/bash
    fangyi,/bin/bash
    sharonfeng,/bin/bash
    rfisher2022,/bin/bash
    seamusgallivan,/bin/bash
    ehodzic,/bin/bash
    ionahu08,/bin/bash
    lillyhuang25,/bin/bash
    jamshidian,/bin/bash
    jonathanking192,/bin/bash
    yllai,/bin/bash
    djlavine,/bin/bash
    blei001,/bin/bash
    lifengxue2000,/bin/bash
    yuyuanlin,/bin/bash
    javim013,/bin/bash
    tokramm,/bin/bash
    tomokiokuno0528,/bin/bash
    qny2021,/bin/bash
    yuhang886688,/bin/bash
    jqin10,/bin/bash
    mimi327,/bin/bash
    nataliesisto,/bin/bash
    hangsun,/bin/bash
    jiahaotian0702,/bin/bash
    wanghw,/bin/bash
    wongj1721,/bin/bash
    lsyang,/bin/bash
    khyeh0816,/bin/bash
    younghograd,/bin/bash
    yueyu99,/bin/bash
    qzhang42,/bin/bash
    zixiz,/bin/bash
    naying,/bin/bash
    capj245,/bin/bash
    rsuseno,/bin/bash
    inclassdemo,/bin/bash
  • Print login names and indicate those with UID>1000 as vip:

    awk -F: -v status="" '{OFS = ","} 
    {if ($3 >= 1000) status="vip"; else status="regular"} 
    {print $1, status}' /etc/passwd
    root,regular
    bin,regular
    daemon,regular
    adm,regular
    lp,regular
    sync,regular
    shutdown,regular
    halt,regular
    mail,regular
    operator,regular
    games,regular
    ftp,regular
    nobody,regular
    systemd-network,regular
    dbus,regular
    polkitd,regular
    sshd,regular
    postfix,regular
    chrony,regular
    huazhou,vip
    tss,regular
    rstudio-server,regular
    shiny,regular
    maschepps,vip
    aarbolante,vip
    rozeta,vip
    yanlongbai975,vip
    ritazxcai,vip
    may.lyn.cheah.phd,vip
    yifan00,vip
    yuruidong99,vip
    fangyi,vip
    sharonfeng,vip
    rfisher2022,vip
    seamusgallivan,vip
    ehodzic,vip
    ionahu08,vip
    lillyhuang25,vip
    jamshidian,vip
    jonathanking192,vip
    yllai,vip
    djlavine,vip
    blei001,vip
    lifengxue2000,vip
    yuyuanlin,vip
    javim013,vip
    tokramm,vip
    tomokiokuno0528,vip
    qny2021,vip
    yuhang886688,vip
    jqin10,vip
    mimi327,vip
    nataliesisto,vip
    hangsun,vip
    jiahaotian0702,vip
    wanghw,vip
    wongj1721,vip
    lsyang,vip
    khyeh0816,vip
    younghograd,vip
    yueyu99,vip
    qzhang42,vip
    zixiz,vip
    naying,vip
    capj245,vip
    rsuseno,vip
    inclassdemo,vip

Text editors

Source: Editor War on Wikipedia.

Emacs

  • Emacs is a powerful text editor with extensive support for many languages including R, \(\LaTeX\), python, and C/C++; however it’s not installed by default on many Linux distributions.

  • Basic survival commands:

    • emacs filename to open a file with emacs.
    • CTRL-x CTRL-f to open an existing or new file.
    • CTRL-x CTRX-s to save.
    • CTRL-x CTRL-w to save as.
    • CTRL-x CTRL-c to quit.
  • Google emacs cheatsheet

C-<key> means hold the control key, and press <key>.
M-<key> means press the Esc key once, and press <key>.

Vi

  • Vi is ubiquitous (POSIX standard). Learn at least its basics; otherwise you can edit nothing on some clusters.

  • Basic survival commands:

    • vi filename to start editing a file.
    • vi is a modal editor: insert mode and normal mode. Pressing i switches from the normal mode to insert mode. Pressing ESC switches from the insert mode to normal mode.
    • :x<Return> quits vi and saves changes.
    • :q!<Return> quits vi without saving latest changes.
    • :w<Return> saves changes.
    • :wq<Return> quits vi and saves changes.
  • Google vi cheatsheet

IDE (Integrated Development Environment)

  • Statisticians/data scientists write a lot of code. Critical to adopt a good IDE that goes beyond code editing: syntax highlighting, executing code within editor, debugging, profiling, version control, etc.

  • RStudio, Eclipse, Emacs, Matlab, Visual Studio, etc.

Processes

Cancel a non-responding program

  • Press Ctrl+C to cancel a non-responding or long-running program.

Processes

  • OS runs processes on behalf of user.

  • Each process has Process ID (PID), Username (UID), Parent process ID (PPID), Time and data process started (STIME), time running (TIME), etc.

    ps
      PID TTY          TIME CMD
    15080 ?        00:00:06 rsession
    16542 ?        00:00:01 R
    16661 ?        00:00:00 sh
    16662 ?        00:00:00 ps
  • All current running processes:

    ps -eaf
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 Jan03 ?        00:01:34 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
    root         2     0  0 Jan03 ?        00:00:00 [kthreadd]
    root         4     2  0 Jan03 ?        00:00:00 [kworker/0:0H]
    root         6     2  0 Jan03 ?        00:00:02 [ksoftirqd/0]
    root         7     2  0 Jan03 ?        00:00:00 [migration/0]
    root         8     2  0 Jan03 ?        00:00:00 [rcu_bh]
    root         9     2  0 Jan03 ?        00:05:10 [rcu_sched]
    root        10     2  0 Jan03 ?        00:00:00 [lru-add-drain]
    root        11     2  0 Jan03 ?        00:00:03 [watchdog/0]
    root        12     2  0 Jan03 ?        00:00:02 [watchdog/1]
    root        13     2  0 Jan03 ?        00:00:00 [migration/1]
    root        14     2  0 Jan03 ?        00:00:02 [ksoftirqd/1]
    root        16     2  0 Jan03 ?        00:00:00 [kworker/1:0H]
    root        17     2  0 Jan03 ?        00:00:02 [watchdog/2]
    root        18     2  0 Jan03 ?        00:00:00 [migration/2]
    root        19     2  0 Jan03 ?        00:00:02 [ksoftirqd/2]
    root        21     2  0 Jan03 ?        00:00:00 [kworker/2:0H]
    root        22     2  0 Jan03 ?        00:00:02 [watchdog/3]
    root        23     2  0 Jan03 ?        00:00:00 [migration/3]
    root        24     2  0 Jan03 ?        00:00:02 [ksoftirqd/3]
    root        26     2  0 Jan03 ?        00:00:00 [kworker/3:0H]
    root        28     2  0 Jan03 ?        00:00:00 [kdevtmpfs]
    root        29     2  0 Jan03 ?        00:00:00 [netns]
    root        30     2  0 Jan03 ?        00:00:00 [khungtaskd]
    root        31     2  0 Jan03 ?        00:00:00 [writeback]
    root        32     2  0 Jan03 ?        00:00:00 [kintegrityd]
    root        33     2  0 Jan03 ?        00:00:00 [bioset]
    root        34     2  0 Jan03 ?        00:00:00 [bioset]
    root        35     2  0 Jan03 ?        00:00:00 [bioset]
    root        36     2  0 Jan03 ?        00:00:00 [kblockd]
    root        37     2  0 Jan03 ?        00:00:00 [md]
    root        38     2  0 Jan03 ?        00:00:00 [edac-poller]
    root        39     2  0 Jan03 ?        00:00:00 [watchdogd]
    root        49     2  0 Jan03 ?        00:00:17 [kswapd0]
    root        50     2  0 Jan03 ?        00:00:00 [ksmd]
    root        51     2  0 Jan03 ?        00:00:07 [khugepaged]
    root        52     2  0 Jan03 ?        00:00:00 [crypto]
    root        60     2  0 Jan03 ?        00:00:00 [kthrotld]
    root        61     2  0 Jan03 ?        00:00:00 [kmpath_rdacd]
    root        62     2  0 Jan03 ?        00:00:00 [kaluad]
    root        63     2  0 Jan03 ?        00:00:00 [kpsmoused]
    root        65     2  0 Jan03 ?        00:00:00 [ipv6_addrconf]
    root        78     2  0 Jan03 ?        00:00:00 [deferwq]
    root       133     2  0 Jan03 ?        00:00:03 [kauditd]
    root       199     2  0 Jan03 ?        00:00:00 [virtscsi-scan]
    root       200     2  0 Jan03 ?        00:00:00 [scsi_eh_0]
    root       201     2  0 Jan03 ?        00:00:00 [scsi_tmf_0]
    root       212     2  0 Jan03 ?        00:00:06 [kworker/0:1H]
    root       233     2  0 Jan03 ?        00:00:00 [kworker/2:1H]
    root       251     2  0 Jan03 ?        00:00:00 [bioset]
    root       252     2  0 Jan03 ?        00:00:00 [xfsalloc]
    root       253     2  0 Jan03 ?        00:00:00 [xfs_mru_cache]
    root       254     2  0 Jan03 ?        00:00:00 [xfs-buf/sda2]
    root       255     2  0 Jan03 ?        00:00:00 [xfs-data/sda2]
    root       256     2  0 Jan03 ?        00:00:00 [xfs-conv/sda2]
    root       257     2  0 Jan03 ?        00:00:00 [xfs-cil/sda2]
    root       258     2  0 Jan03 ?        00:00:00 [xfs-reclaim/sda]
    root       259     2  0 Jan03 ?        00:00:00 [xfs-log/sda2]
    root       260     2  0 Jan03 ?        00:00:00 [xfs-eofblocks/s]
    root       261     2  0 Jan03 ?        00:01:11 [xfsaild/sda2]
    root       262     2  0 Jan03 ?        00:00:00 [kworker/1:1H]
    root       263     2  0 Jan03 ?        00:00:00 [kworker/3:1H]
    root       326     1  0 Jan03 ?        00:00:25 /usr/lib/systemd/systemd-journald
    root       354     1  0 Jan03 ?        00:00:00 /usr/lib/systemd/systemd-udevd
    root       382     2  0 Jan03 ?        00:00:00 [hwrng]
    root       457     1  0 Jan03 ?        00:00:06 /sbin/auditd
    polkitd    498     1  0 Jan03 ?        00:00:01 /usr/lib/polkit-1/polkitd --no-debug
    dbus       501     1  0 Jan03 ?        00:00:05 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
    chrony     507     1  0 Jan03 ?        00:00:00 /usr/sbin/chronyd
    root       508     1  0 Jan03 ?        00:00:00 /usr/sbin/acpid
    root       525     1  0 Jan03 ?        00:00:01 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
    root       526     1  0 Jan03 tty1     00:00:00 /sbin/agetty --noclear tty1 linux
    root       527     1  0 Jan03 ttyS0    00:00:00 /sbin/agetty --keep-baud 115200,38400,9600 ttyS0 vt220
    root       547     1  0 Jan03 ?        00:00:13 /usr/sbin/NetworkManager --no-daemon
    root       670   547  0 Jan03 ?        00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-2f272bb6-80c6-470e-9878-f080f7860f33-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
    root       934     1  0 Jan03 ?        00:01:20 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
    root       936     1  0 Jan03 ?        00:07:58 /usr/bin/google_osconfig_agent
    root       937     1  0 Jan03 ?        00:00:50 /usr/sbin/rsyslogd -n
    root       938     1  0 Jan03 ?        00:01:55 /usr/bin/google_guest_agent
    tomokio+  1182  4789  0 18:22 ?        00:01:50 /usr/lib/rstudio-server/bin/rsession -u tomokiokuno0528 --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
    root      1239     1  0 Jan03 ?        00:00:04 /usr/lib/systemd/systemd-logind
    root      1295     1  0 Jan03 ?        00:00:02 /usr/sbin/crond -n
    root      1322     1  0 Jan03 ?        00:00:04 /usr/libexec/postfix/master -w
    postfix   1326  1322  0 Jan03 ?        00:00:00 qmgr -l -t unix -u
    root      1654     2  0 Jan03 ?        00:00:09 [jbd2/sdb-8]
    root      1655     2  0 Jan03 ?        00:00:00 [ext4-rsv-conver]
    root      1944     2  0 08:26 ?        00:00:00 [kworker/u8:2]
    root      3749     2  0 Jan10 ?        00:00:09 [kworker/0:2]
    root      3789     2  0 19:01 ?        00:00:00 [kworker/2:1]
    tomokio+  3963     1  0 Jan10 ?        00:00:00 ssh tomokiokuno0528@server.ucla-biostat-203b.com
    root      3964 32148  0 Jan10 ?        00:00:00 sshd: tomokiokuno0528 [priv]
    tomokio+  3999  3964  0 Jan10 ?        00:00:00 sshd: tomokiokuno0528@pts/3
    tomokio+  4000  3999  0 Jan10 pts/3    00:00:00 -bash
    rstudio+  4789     1  0 Jan03 ?        00:39:54 /usr/lib/rstudio-server/bin/rserver
    root      7255     1  0 Jan03 ?        00:00:00 /opt/shiny-server/ext/node/bin/shiny-server /opt/shiny-server/lib/main.js
    maschep+  7276     1  0 Jan08 ?        00:00:00 ssh-agent -s
    root      8371 32148  0 Jan08 ?        00:00:00 sshd: maschepps [priv]
    root      8384     2  0 20:12 ?        00:00:00 [kworker/0:1]
    maschep+  8385  8371  0 Jan08 ?        00:00:00 sshd: maschepps@pts/0
    maschep+  8386  8385  0 Jan08 pts/0    00:00:00 -bash
    maschep+  8727     1  0 Jan08 ?        00:00:00 ssh-agent -s
    tomokio+  8960  4000  0 01:13 pts/3    00:00:00 ssh -i /home/tomokiokuno0528/.ssh/id_rsa tomokiokuno0528@server.ucla-biostat-203b.com
    root      8961 32148  0 01:13 ?        00:00:00 sshd: tomokiokuno0528 [priv]
    tomokio+  8966  8961  0 01:13 ?        00:00:00 sshd: tomokiokuno0528@pts/1
    tomokio+  8967  8966  0 01:13 pts/1    00:00:00 -bash
    may.lyn+ 11314  4789  0 Jan06 ?        00:47:44 /usr/lib/rstudio-server/bin/rsession -u may.lyn.cheah.phd --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
    rfisher+ 13556  4789  0 21:48 ?        00:00:17 /usr/lib/rstudio-server/bin/rsession -u rfisher2022 --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
    rfisher+ 13646 13556  0 21:48 pts/5    00:00:00 bash -l
    postfix  13888  1322  0 21:52 ?        00:00:00 pickup -l -t unix -u
    tomokio+ 13901  1182  0 21:52 ?        00:00:00 bash /tmp/RtmpWFSsSs/chunk-code-49e78c9829d.txt
    tomokio+ 13903 13901  0 21:52 ?        00:00:00 vi pg42671.txt M
    inclass+ 14172  4789  0 21:57 ?        00:00:05 /usr/lib/rstudio-server/bin/rsession -u inclassdemo --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
    root     14222     2  0 21:58 ?        00:00:00 [kworker/3:0]
    may.lyn+ 14763 11314  0 Jan06 pts/27   00:00:00 bash -l
    root     15053     2  0 22:13 ?        00:00:00 [kworker/1:0]
    huazhou  15080  4789  0 22:13 ?        00:00:06 /usr/lib/rstudio-server/bin/rsession -u huazhou --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
    root     15220     2  0 22:14 ?        00:00:00 [kworker/3:2]
    may.lyn+ 15227 14763  0 Jan06 pts/27   00:00:00 /usr/lib64/R/bin/exec/R
    may.lyn+ 15282 15227  0 Jan06 pts/27   00:00:00 sh -c '/usr/lib64/R/bin/pager' < '/tmp/RtmpkXUttn/3b7b1e86c95'
    may.lyn+ 15283 15282  0 Jan06 pts/27   00:00:00 /usr/bin/less
    tomokio+ 15720  1182  0 22:16 pts/6    00:00:00 bash -l
    huazhou  15792 15080  0 22:17 pts/7    00:00:00 bash -l
    root     15852     2  0 Jan04 ?        00:00:02 [jbd2/sdc-8]
    root     15853     2  0 Jan04 ?        00:00:00 [ext4-rsv-conver]
    root     15964     2  0 22:20 ?        00:00:00 [kworker/2:0]
    root     16503     2  0 22:24 ?        00:00:00 [kworker/1:1]
    root     16538     2  0 22:24 ?        00:00:00 [kworker/3:1]
    huazhou  16542 15080 50 22:24 ?        00:00:01 /usr/lib64/R/bin/exec/R --no-save --no-restore -s -e rmarkdown::render('/home/huazhou/203b-2022winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
    huazhou  16663 16542  0 22:24 ?        00:00:00 sh -c 'bash'  -c 'ps -eaf' 2>&1
    huazhou  16664 16663  0 22:24 ?        00:00:00 ps -eaf
    root     17349     2  0 Jan08 ?        00:00:06 [kworker/2:2]
    maschep+ 17544     1  0 Jan04 ?        00:00:00 ssh-agent
    maschep+ 17970     1  0 Jan04 ?        00:00:00 ssh-agent
    maschep+ 18118     1  0 Jan04 ?        00:00:00 ssh-agent -s
    ehodzic  25860     1  0 Jan10 ?        00:00:00 ssh-agent -s
    root     29948 32148  0 Jan08 ?        00:00:00 sshd: maschepps [priv]
    maschep+ 29955 29948  0 Jan08 ?        00:00:00 sshd: maschepps@pts/2
    maschep+ 29956 29955  0 Jan08 pts/2    00:00:00 -bash
    root     30621     2  0 Jan07 ?        00:00:00 [kworker/u8:1]
    root     31188     2  0 Jan08 ?        00:00:05 [kworker/1:2]
    root     32148     1  0 Jan03 ?        00:00:05 /usr/sbin/sshd -D
  • All Python processes:

    ps -eaf | grep python
    root       525     1  0 Jan03 ?        00:00:01 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
    root       934     1  0 Jan03 ?        00:01:20 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
    huazhou  16665 16542  0 22:24 ?        00:00:00 sh -c 'bash'  -c 'ps -eaf | grep python' 2>&1
    huazhou  16666 16665  0 22:24 ?        00:00:00 bash -c ps -eaf | grep python
    huazhou  16668 16666  0 22:24 ?        00:00:00 grep python
  • Process with PID=1:

    ps -fp 1
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 Jan03 ?        00:01:34 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
  • All processes owned by a user:

    ps -fu huazhou
    UID        PID  PPID  C STIME TTY          TIME CMD
    huazhou  15080  4789  0 22:13 ?        00:00:06 /usr/lib/rstudio-server/bin/rsession -u huazhou --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
    huazhou  15792 15080  0 22:17 pts/7    00:00:00 bash -l
    huazhou  16542 15080 51 22:24 ?        00:00:01 /usr/lib64/R/bin/exec/R --no-save --no-restore -s -e rmarkdown::render('/home/huazhou/203b-2022winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
    huazhou  16671 16542  0 22:24 ?        00:00:00 sh -c 'bash'  -c 'ps -fu huazhou' 2>&1
    huazhou  16672 16671  0 22:24 ?        00:00:00 ps -fu huazhou

Kill processes

  • Kill process with PID=1001:

    kill 1001
  • Kill all R processes.

    killall -r R

top

  • top prints realtime process information (very useful).

    top

  • Exit the top program by pressing the q key.

Secure shell (SSH)

SSH

SSH (secure shell) is the dominant cryptographic network protocol for secure network connection via an insecure network.

  • On Linux or Mac terminal, access the teaching server by

    ssh [USERNAME]@server.ucla-biostat-203b.com

    Replace above [USERNAME] by your account user name on teaching server.

  • For Windows users, there are at least three ways: (1) (highly recommended) Git Bash which is included in Git for Windows, (2) (not recommended) PuTTY program (free), or (3) (may be an overkill for this class) use WSL for Windows to install a full fledged Linux system within Windows.

Advantages of keys over password

  • Key authentication is more secure than password. Most passwords are weak.

  • Script or a program may need to systematically SSH into other machines.

  • Log into multiple machines using the same key.

  • Seamless use of many services: Git/GitHub, AWS or Google cloud service, parallel computing on multiple hosts, Travis CI (continuous integration) etc.

  • Many servers only allow key authentication and do not accept password authentication.

Key authentication


  • Public key. Put on the machine(s) you want to log in.

  • Private key. Put on your own computer. Consider this as the actual key in your pocket; never give private key to others. For fun: https://www.youtube.com/watch?v=S8K464ImU0c

  • Messages from server to your computer is encrypted with your public key. It can only be decrypted using your private key.

  • Messages from your computer to server is signed with your private key (digital signatures) and can be verified by anyone who has your public key (authentication).

Steps to generate keys

  • On Linux, Mac, or Windows Git Bash, to generate a key pair:

    ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
    • [KEY_FILENAME] is the name that you want to use for your SSH key files. For example, a filename of id_rsa generates a private key file named id_rsa and a public key file named id_rsa.pub.

    • [USERNAME] is the user for whom you will apply this SSH key.

    • Use a (optional) paraphrase different from password.

  • Set correct permissions on the .ssh folder and key files.

    • The permission for the ~/.ssh folder should be 700 (drwx------).
    • The permission of the private key ~/.ssh/id_rsa should be 600 (-rw-------).
    • The permission of the public key ~/.ssh/id_rsa.pub should be 644 (-rw-r--r--).
    chmod 700 ~/.ssh
    chmod 600 ~/.ssh/[KEY_FILENAME]
    chmod 644 ~/.ssh/[KEY_FILENAME].pub

    Note Windows is different, it doesn’t allow change of permissions.


  • Append the public key to the ~/.ssh/authorized_keys file of any Linux machine we want to SSH to, e.g.,

    ssh-copy-id -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.ucla-biostat-203b.com

    Make sure the permission of the authorized_keys file is 600 (-rw-------).

  • Test your new key.

    ssh -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.ucla-biostat-203b.com
  • From now on, you don’t need password each time you connect from your machine to the teaching server.


  • If you set paraphrase when generating keys, you’ll be prompted for the paraphrase each time the private key is used. Avoid repeatedly entering the paraphrase by using ssh-agent on Linux/Mac or Pagent on Windows.

  • Same key pair can be used between any two machines. We don’t need to regenerate keys for each new connection.

Transfer files between machines

  • scp securely transfers files between machines using SSH.

    ## copy file from local to remote
    scp [LOCALFILE] [USERNAME]@server.ucla-biostat-203b.com:/[PATH_TO_FOLDER]
    ## copy file from remote to local
    scp [USERNAME]@server.ucla-biostat-203b.com:/[PATH_TO_FILE] [PATH_TO_LOCAL_FOLDER]
  • sftp is FTP via SSH.

  • Globus is GUI program for securely transferring files between machines. To use Globus you will have to go to https://www.globus.org/ and login through UCLA by selecting your existing organizational login as UCLA. Then you will need to download their Globus Connect Personal software, then set your laptop as an endpoint. Very detailed instructions can be found at https://www.hoffman2.idre.ucla.edu/file-transfer/globus/.

  • GUIs for Windows (WinSCP) or Mac (Cyberduck).

  • You can even use RStudio to upload files to a remote machine with RStudio Server installed.

  • (Preferred way) Use a version control system (git, svn, cvs, …) to sync project files between different machines and systems.

Line breaks in text files

  • Windows uses a pair of CR and LF for line breaks.

  • Linux/Unix uses an LF character only.

  • MacOS X also uses a single LF character. But old Mac OS used a single CR character for line breaks.

  • If transferred in binary mode (bit by bit) between OSs, a text file could look a mess.

  • Most transfer programs automatically switch to text mode when transferring text files and perform conversion of line breaks between different OSs; but I used to run into problems using WinSCP. Sometimes you have to tell WinSCP explicitly a text file is being transferred.

Run R in Linux

Interactive mode

  • Start R in the interactive mode by typing R in shell.

  • Then run R script by

    source("script.R")

Batch mode

  • Demo script meanEst.R implements an (terrible) estimator of mean \[ {\widehat \mu}_n = \frac{\sum_{i=1}^n x_i 1_{i \text{ is prime}}}{\sum_{i=1}^n 1_{i \text{ is prime}}}. \]

    ## check if a given integer is prime
    isPrime = function(n) {
      if (n <= 3) {
        return (TRUE)
      }
      if (any((n %% 2:floor(sqrt(n))) == 0)) {
        return (FALSE)
      }
      return (TRUE)
    }
    
    ## estimate mean only using observation with prime indices
    estMeanPrimes = function (x) {
      n = length(x)
      ind = sapply(1:n, isPrime)
      return (mean(x[ind]))
    }
    
    print(estMeanPrimes(rnorm(100000)))

  • To run your R code non-interactively aka in batch mode, we have at least two options:

    # default output to meanEst.Rout
    R CMD BATCH meanEst.R

    or

    # output to stdout
    Rscript meanEst.R
  • Typically automate batch calls using a scripting language, e.g., Python, Perl, and shell script.

Pass arguments to R scripts

  • Specify arguments in R CMD BATCH:

    R CMD BATCH '--args mu=1 sig=2 kap=3' script.R
  • Specify arguments in Rscript:

    Rscript script.R mu=1 sig=2 kap=3
  • Parse command line arguments using magic formula

    for (arg in commandArgs(TRUE)) {
      eval(parse(text=arg))
    }

    in R script. After calling the above code, all command line arguments will be available in the global namespace.


  • To understand the magic formula commandArgs, run R by:

    R '--args mu=1 sig=2 kap=3'

    and then issue commands in R

    commandArgs()
    commandArgs(TRUE)

  • Understand the magic formula parse and eval:

    rm(list = ls())
    print(x)
    Error in print(x): object 'x' not found
    parse(text = "x=3")
    expression(x = 3)
    eval(parse(text = "x=3"))
    print(x)
    [1] 3

  • runSim.R has components: (1) command argument parser, (2) method implementation, (3) data generator with unspecified parameter n, and (4) estimation based on generated data.
## parsing command arguments
for (arg in commandArgs(TRUE)) {
  eval(parse(text=arg))
}

## check if a given integer is prime
isPrime = function(n) {
  if (n <= 3) {
    return (TRUE)
  }
  if (any((n %% 2:floor(sqrt(n))) == 0)) {
    return (FALSE)
  }
  return (TRUE)
}

## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
  n = length(x)
  ind = sapply(1:n, isPrime)
  return (mean(x[ind]))
}

# simulate data
x = rnorm(n)

# estimate mean
estMeanPrimes(x)

  • Call runSim.R with sample size n=100:

    R CMD BATCH '--args n=100' runSim.R

    or

    Rscript runSim.R n=100
    [1] -0.01570209

Run long jobs

  • Many statistical computing tasks take long: simulation, MCMC, etc. If we exit Linux when the job is unfinished, the job is killed.

  • nohup command in Linux runs program(s) immune to hangups and writes output to nohup.out by default. Logging out will not kill the process; we can log in later to check status and results.

  • nohup is POSIX standard thus available on Linux and MacOS.

  • Run runSim.R in background and writes output to nohup.out:

    nohup Rscript runSim.R n=100 &
    [1] -0.07564119

    The & at the end of the command instructs Linux to run this command in background, so we gain control of the terminal immediately.

screen

  • screen is another popular utility, but not installed by default.

  • Typical workflow using screen.

    1. Access remote server using ssh.

    2. Start jobs in batch mode.

    3. Detach jobs.

    4. Exit from server, wait for jobs to finish.

    5. Access remote server using ssh.

    6. Re-attach jobs, check on progress, get results, etc.

Use R to call R

R in conjuction with nohup (or screen) can be used to orchestrate a large simulation study.

  • It can be more elegant, transparent, and robust to parallelize jobs corresponding to different scenarios (e.g., different generative models) outside of the code used to do statistical computation.

  • We consider a simulation study in R but the same approach could be used with code written in Julia, Matlab, Python, etc.

  • Python in many ways makes a better glue.

  • Suppose we have

    • runSim.R which runs a simulation based on command line argument n.
    • A large collection of n values that we want to use in our simulation study.
    • Access to a server with 128 cores.
      How to parallelize the job?
  • Option 1: manually call runSim.R for each setting.

  • Option 2 (smarter): automate calls using R and nohup.

  • Let’s demonstrate using the script autoSim.R

    cat autoSim.R
    # autoSim.R
    
    nVals <- seq(100, 1000, by=100)
    for (n in nVals) {
      oFile <- paste("n", n, ".txt", sep="")
      sysCall <- paste("nohup Rscript runSim.R n=", n, " > ", oFile, sep="")
      system(sysCall, wait = FALSE)
      print(paste("sysCall=", sysCall, sep=""))
    }

    Note when we call bash command using the system function in R, we set optional argument wait=FALSE so that jobs can be run parallel.

  • Rscript autoSim.R
    [1] "sysCall=nohup Rscript runSim.R n=100 > n100.txt"
    [1] "sysCall=nohup Rscript runSim.R n=200 > n200.txt"
    [1] "sysCall=nohup Rscript runSim.R n=300 > n300.txt"
    [1] "sysCall=nohup Rscript runSim.R n=400 > n400.txt"
    [1] "sysCall=nohup Rscript runSim.R n=500 > n500.txt"
    [1] "sysCall=nohup Rscript runSim.R n=600 > n600.txt"
    [1] "sysCall=nohup Rscript runSim.R n=700 > n700.txt"
    [1] "sysCall=nohup Rscript runSim.R n=800 > n800.txt"
    [1] "sysCall=nohup Rscript runSim.R n=900 > n900.txt"
    [1] "sysCall=nohup Rscript runSim.R n=1000 > n1000.txt"
  • Now we just need to write a script to collect results from the output files.

  • Later we will learn how to coordinate large scale computation on UCLA Hoffman2 cluster, using Linux and R scripting.

Some other Linux commands

  • Log out Linux: exit or logout or ctrl+d.

  • Clear screen: clear.