This html is knitted from RMarkdown on the teaching server.
To practice the Linux commands, we can log on the teaching server, git clone the course material, and try the Linux commands on the Terminal.
Log onto the RStudio on teaching server: http://server.ucla-biostat-203b.com:8787
In RStudio, File
-> New Project...
-> Version Control
-> Git
-> put https://github.com/ucla-biostat-203b/2022winter.git
to Repository URL:
-> put 203b-2022winter
to Project directory name:
-> choose home directory for Create project as subdirectory of:
-> click Create Project
button
In the Terminal
tab, navigate to the folder of current slide: cd ~/203b-2022winter/slides/02-linux
Linux is the most common platform for scientific computing and deployment of data science tools.
Open source and community support.
Things break; when they break using Linux, it’s easy to fix.
Scalability: portable devices (Android, iOS), laptops, servers, clusters, and super computers.
Cost: it’s free!
Debian/Ubuntu is a popular choice for personal computers.
RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
MacOS was originally derived from Unix/Linux (Darwin kernel). It is POSIX compliant. Most shell commands we review here apply to MacOS terminal as well. Windows/DOS, unfortunately, is a totally different breed.
Show distribution/version on Linux:
cat /etc/*-release
CentOS Linux release 7.9.2009 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.9.2009 (Core)
CentOS Linux release 7.9.2009 (Core)
Show distribution/version on MacOS:
# only on Mac terminal
sw_vers -productVersion
or
# only on Mac terminal
system_profiler SPSoftwareDataType
A shell translates commands to OS instructions.
Most commonly used shells include bash
, csh
, tcsh
, zsh
, etc.
The default shell in MacOS changed from bash
to zsh
since MacOS v10.15.
Sometimes a command and a script does not run simply because it’s written for another shell.
We mostly use bash
shell commands in this class.
Determine the current shell:
echo $SHELL
/bin/bash
List available shells:
cat /etc/shells
/bin/sh
/bin/bash
/usr/bin/sh
/usr/bin/bash
Change to another shell:
exec bash -l
The -l
option indicates it should be a login shell.
Change your login shell permanently:
chsh -s /bin/bash [USERNAME]
Then log out and log in.
We can navigate to previous/next commands by the upper and lower keys, or maintain a command history stack using pushd
and popd
commands.
Bash provides the following standard completion for the Linux users by default. Much less typing errors and time!
Pathname completion.
Filename completion.
Variablename completion: echo $[TAB][TAB]
.
Username completion: cd ~[TAB][TAB]
.
Hostname completion ssh huazhou@[TAB][TAB]
.
It can also be customized to auto-complete other stuff such as options and command’s arguments. Google bash completion
for more information.
man
is man’s best friendOnline help for shell commands: man [COMMANDNAME]
.
# display documentation for the ls command
man ls
LS(1) User Commands LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
DESCRIPTION
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is speci‐
fied.
Mandatory arguments to long options are mandatory for short options
too.
-a, --all
do not ignore entries starting with .
-A, --almost-all
do not list implied . and ..
--author
with -l, print the author of each file
-b, --escape
print C-style escapes for nongraphic characters
--block-size=SIZE
scale sizes by SIZE before printing them; e.g., '--block-size=M'
prints sizes in units of 1,048,576 bytes; see SIZE format below
-B, --ignore-backups
do not list implied entries ending with ~
-c with -lt: sort by, and show, ctime (time of last modification of
file status information); with -l: show ctime and sort by name;
otherwise: sort by ctime, newest first
-C list entries by columns
--color[=WHEN]
colorize the output; WHEN can be 'never', 'auto', or 'always'
(the default); more info below
-d, --directory
list directories themselves, not their contents
-D, --dired
generate output designed for Emacs' dired mode
-f do not sort, enable -aU, disable -ls --color
-F, --classify
append indicator (one of */=>@|) to entries
--file-type
likewise, except do not append '*'
--format=WORD
across -x, commas -m, horizontal -x, long -l, single-column -1,
verbose -l, vertical -C
--full-time
like -l --time-style=full-iso
-g like -l, but do not list owner
--group-directories-first
group directories before files;
can be augmented with a --sort option, but any use of
--sort=none (-U) disables grouping
-G, --no-group
in a long listing, don't print group names
-h, --human-readable
with -l, print sizes in human readable format (e.g., 1K 234M 2G)
--si likewise, but use powers of 1000 not 1024
-H, --dereference-command-line
follow symbolic links listed on the command line
--dereference-command-line-symlink-to-dir
follow each command line symbolic link
that points to a directory
--hide=PATTERN
do not list implied entries matching shell PATTERN (overridden
by -a or -A)
--indicator-style=WORD
append indicator with style WORD to entry names: none (default),
slash (-p), file-type (--file-type), classify (-F)
-i, --inode
print the index number of each file
-I, --ignore=PATTERN
do not list implied entries matching shell PATTERN
-k, --kibibytes
default to 1024-byte blocks for disk usage
-l use a long listing format
-L, --dereference
when showing file information for a symbolic link, show informa‐
tion for the file the link references rather than for the link
itself
-m fill width with a comma separated list of entries
-n, --numeric-uid-gid
like -l, but list numeric user and group IDs
-N, --literal
print raw entry names (don't treat e.g. control characters spe‐
cially)
-o like -l, but do not list group information
-p, --indicator-style=slash
append / indicator to directories
-q, --hide-control-chars
print ? instead of nongraphic characters
--show-control-chars
show nongraphic characters as-is (the default, unless program is
'ls' and output is a terminal)
-Q, --quote-name
enclose entry names in double quotes
--quoting-style=WORD
use quoting style WORD for entry names: literal, locale, shell,
shell-always, c, escape
-r, --reverse
reverse order while sorting
-R, --recursive
list subdirectories recursively
-s, --size
print the allocated size of each file, in blocks
-S sort by file size
--sort=WORD
sort by WORD instead of name: none (-U), size (-S), time (-t),
version (-v), extension (-X)
--time=WORD
with -l, show time as WORD instead of default modification time:
atime or access or use (-u) ctime or status (-c); also use spec‐
ified time as sort key if --sort=time
--time-style=STYLE
with -l, show times using style STYLE: full-iso, long-iso, iso,
locale, or +FORMAT; FORMAT is interpreted like in 'date'; if
FORMAT is FORMAT1<newline>FORMAT2, then FORMAT1 applies to
non-recent files and FORMAT2 to recent files; if STYLE is pre‐
fixed with 'posix-', STYLE takes effect only outside the POSIX
locale
-t sort by modification time, newest first
-T, --tabsize=COLS
assume tab stops at each COLS instead of 8
-u with -lt: sort by, and show, access time; with -l: show access
time and sort by name; otherwise: sort by access time
-U do not sort; list entries in directory order
-v natural sort of (version) numbers within text
-w, --width=COLS
assume screen width instead of current value
-x list entries by lines instead of by columns
-X sort alphabetically by entry extension
-1 list one file per line
SELinux options:
--lcontext
Display security context. Enable -l. Lines will probably be
too wide for most displays.
-Z, --context
Display security context so it fits on most displays. Displays
only mode, user, group, security context and file name.
--scontext
Display only security context and file name.
--help display this help and exit
--version
output version information and exit
SIZE is an integer and optional unit (example: 10M is 10*1024*1024).
Units are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (pow‐
ers of 1000).
Using color to distinguish file types is disabled both by default and
with --color=never. With --color=auto, ls emits color codes only when
standard output is connected to a terminal. The LS_COLORS environment
variable can change the settings. Use the dircolors command to set it.
Exit status:
0 if OK,
1 if minor problems (e.g., cannot access subdirectory),
2 if serious trouble (e.g., cannot access command-line argument).
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report ls translation bugs to <http://translationproject.org/team/>
AUTHOR
Written by Richard M. Stallman and David MacKenzie.
COPYRIGHT
Copyright © 2013 Free Software Foundation, Inc. License GPLv3+: GNU
GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
The full documentation for ls is maintained as a Texinfo manual. If
the info and ls programs are properly installed at your site, the com‐
mand
info coreutils 'ls invocation'
should give you access to the complete manual.
GNU coreutils 8.22 November 2020 LS(1)
cat
prints the contents of a file:
cat runSim.R
## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
head
prints the first 10 lines of a file:
head runSim.R
## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
head -l
prints the first \(l\) lines of a file:
head -15 runSim.R
## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
tail
prints the last 10 lines of a file:
tail runSim.R
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
tail -l
prints the last \(l\) lines of a file:
tail -15 runSim.R
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
|
sends output from one command as input of another command.
ls -l | head -5
total 5904
-rw-r--r--. 1 huazhou huazhou 258 Jan 7 00:55 autoSim.R
-rw-r--r--. 1 huazhou huazhou 110345 Jan 7 00:55 Emacs_Reference_Card.pdf
-rw-r--r--. 1 huazhou huazhou 157353 Jan 7 00:55 IDRE_Winter_2019_Workshops.pdf
-rw-r--r--. 1 huazhou huazhou 321281 Jan 7 00:55 key_authentication_1.png
>
directs output from one command to a file.
>>
appends output from one command to a file.
<
reads input from a file.
Combinations of shell commands (grep
, sed
, awk
, …), piping and redirection, and regular expressions allow us pre-process and reformat huge text files efficiently.
See HW1.
less
is more; more
is lessmore
browses a text file screen by screen (only downwards). Scroll down one page (paging) by pressing the spacebar; exit by pressing the q
key.
less
is also a pager, but has more functionalities, e.g., scroll upwards and downwards through the input.
less
doesn’t need to read the whole file, i.e., it loads files faster than more
.
grep
grep
prints lines that match an expression:
Show lines that contain string CentOS
:
# quotes not necessary if not a regular expression
grep 'CentOS' linux.Rmd
- RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
- The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
- Show lines that contain string `CentOS`:
grep 'CentOS' linux.Rmd
grep 'CentOS' *.Rmd
grep -n 'CentOS' linux.Rmd
- Replace `CentOS` by `RHEL` in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Search multiple text files:
grep 'CentOS' *.Rmd
- RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
- The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
- Show lines that contain string `CentOS`:
grep 'CentOS' linux.Rmd
grep 'CentOS' *.Rmd
grep -n 'CentOS' linux.Rmd
- Replace `CentOS` by `RHEL` in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Show matching line numbers:
grep -n 'CentOS' linux.Rmd
47:- RHEL/CentOS is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
53:- The teaching server for this class runs CentOS 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
345:- Show lines that contain string `CentOS`:
348: grep 'CentOS' linux.Rmd
353: grep 'CentOS' *.Rmd
358: grep -n 'CentOS' linux.Rmd
375:- Replace `CentOS` by `RHEL` in a text file:
377: sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
Find all files in current directory with .png
extension:
ls | grep '.png$'
key_authentication_1.png
key_authentication_2.png
linux_directory_structure.png
linux_filepermission_oct.png
linux_filepermission.png
redhat_kills_centos.png
Richard_Stallman_2013.png
screenshot_top.png
Find all directories in the current directory:
ls -al | grep '^d'
drwxr-xr-x. 2 huazhou huazhou 4096 Jan 11 22:22 .
drwxr-xr-x. 6 huazhou huazhou 4096 Jan 7 00:55 ..
sed
sed
is a stream editor.
Replace CentOS
by RHEL
in a text file:
sed 's/CentOS/RHEL/' linux.Rmd | grep RHEL
- RHEL/RHEL is popular on servers. (In December 2020, Red Hat terminated the development of CentOS Linux distribution.)
- The teaching server for this class runs RHEL 7. UCLA Hoffman2 cluster runs CentOS 7.9.2009 (as of 2022-01-01).
- Show lines that contain string `RHEL`:
grep 'RHEL' linux.Rmd
grep 'RHEL' *.Rmd
grep -n 'RHEL' linux.Rmd
- Replace `RHEL` by `RHEL` in a text file:
sed 's/RHEL/RHEL/' linux.Rmd | grep RHEL
awk
awk
is a filter and report writer.
First let’s display the content of the file /etc/passwd
:
cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
huazhou:x:1000:1001::/home/huazhou:/bin/bash
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
rstudio-server:x:997:995::/home/rstudio-server:/bin/bash
shiny:x:996:1002::/home/shiny:/bin/sh
maschepps:x:1001:1003::/home/maschepps:/bin/bash
aarbolante:x:1002:1004::/home/aarbolante:/bin/bash
rozeta:x:1003:1005::/home/rozeta:/bin/bash
yanlongbai975:x:1004:1006::/home/yanlongbai975:/bin/bash
ritazxcai:x:1005:1007::/home/ritazxcai:/bin/bash
may.lyn.cheah.phd:x:1006:1008::/home/may.lyn.cheah.phd:/bin/bash
yifan00:x:1007:1009::/home/yifan00:/bin/bash
yuruidong99:x:1008:1010::/home/yuruidong99:/bin/bash
fangyi:x:1009:1011::/home/fangyi:/bin/bash
sharonfeng:x:1010:1012::/home/sharonfeng:/bin/bash
rfisher2022:x:1011:1013::/home/rfisher2022:/bin/bash
seamusgallivan:x:1012:1014::/home/seamusgallivan:/bin/bash
ehodzic:x:1013:1015::/home/ehodzic:/bin/bash
ionahu08:x:1014:1016::/home/ionahu08:/bin/bash
lillyhuang25:x:1015:1017::/home/lillyhuang25:/bin/bash
jamshidian:x:1016:1018::/home/jamshidian:/bin/bash
jonathanking192:x:1017:1019::/home/jonathanking192:/bin/bash
yllai:x:1018:1020::/home/yllai:/bin/bash
djlavine:x:1019:1021::/home/djlavine:/bin/bash
blei001:x:1020:1022::/home/blei001:/bin/bash
lifengxue2000:x:1021:1023::/home/lifengxue2000:/bin/bash
yuyuanlin:x:1022:1024::/home/yuyuanlin:/bin/bash
javim013:x:1023:1025::/home/javim013:/bin/bash
tokramm:x:1024:1026::/home/tokramm:/bin/bash
tomokiokuno0528:x:1025:1027::/home/tomokiokuno0528:/bin/bash
qny2021:x:1026:1028::/home/qny2021:/bin/bash
yuhang886688:x:1027:1029::/home/yuhang886688:/bin/bash
jqin10:x:1028:1030::/home/jqin10:/bin/bash
mimi327:x:1029:1031::/home/mimi327:/bin/bash
nataliesisto:x:1030:1032::/home/nataliesisto:/bin/bash
hangsun:x:1031:1033::/home/hangsun:/bin/bash
jiahaotian0702:x:1033:1035::/home/jiahaotian0702:/bin/bash
wanghw:x:1034:1036::/home/wanghw:/bin/bash
wongj1721:x:1035:1037::/home/wongj1721:/bin/bash
lsyang:x:1036:1038::/home/lsyang:/bin/bash
khyeh0816:x:1037:1039::/home/khyeh0816:/bin/bash
younghograd:x:1038:1040::/home/younghograd:/bin/bash
yueyu99:x:1039:1041::/home/yueyu99:/bin/bash
qzhang42:x:1040:1042::/home/qzhang42:/bin/bash
zixiz:x:1041:1043::/home/zixiz:/bin/bash
naying:x:1042:1044::/home/naying:/bin/bash
capj245:x:1043:1045::/home/capj245:/bin/bash
rsuseno:x:1044:1046::/home/rsuseno:/bin/bash
inclassdemo:x:1045:1047::/home/inclassdemo:/bin/bash
Each line contains fields (1) user name, (2) password, (3) user ID, (4) group ID, (5) user ID info, (6) home directory, and (7) command shell, separated by :
.
Print sorted list of login names:
awk -F: '{ print $1 }' /etc/passwd | sort | head -10
aarbolante
adm
bin
blei001
capj245
chrony
daemon
dbus
djlavine
ehodzic
Print number of lines in a file, as NR
stands for Number of Rows:
awk 'END { print NR }' /etc/passwd
67
or
wc -l /etc/passwd
67 /etc/passwd
or (not displaying file name)
wc -l < /etc/passwd
67
Print login names with UID in range 1000-1035
:
awk -F: '{if ($3 >= 1000 && $3 <= 1047) print}' /etc/passwd
huazhou:x:1000:1001::/home/huazhou:/bin/bash
maschepps:x:1001:1003::/home/maschepps:/bin/bash
aarbolante:x:1002:1004::/home/aarbolante:/bin/bash
rozeta:x:1003:1005::/home/rozeta:/bin/bash
yanlongbai975:x:1004:1006::/home/yanlongbai975:/bin/bash
ritazxcai:x:1005:1007::/home/ritazxcai:/bin/bash
may.lyn.cheah.phd:x:1006:1008::/home/may.lyn.cheah.phd:/bin/bash
yifan00:x:1007:1009::/home/yifan00:/bin/bash
yuruidong99:x:1008:1010::/home/yuruidong99:/bin/bash
fangyi:x:1009:1011::/home/fangyi:/bin/bash
sharonfeng:x:1010:1012::/home/sharonfeng:/bin/bash
rfisher2022:x:1011:1013::/home/rfisher2022:/bin/bash
seamusgallivan:x:1012:1014::/home/seamusgallivan:/bin/bash
ehodzic:x:1013:1015::/home/ehodzic:/bin/bash
ionahu08:x:1014:1016::/home/ionahu08:/bin/bash
lillyhuang25:x:1015:1017::/home/lillyhuang25:/bin/bash
jamshidian:x:1016:1018::/home/jamshidian:/bin/bash
jonathanking192:x:1017:1019::/home/jonathanking192:/bin/bash
yllai:x:1018:1020::/home/yllai:/bin/bash
djlavine:x:1019:1021::/home/djlavine:/bin/bash
blei001:x:1020:1022::/home/blei001:/bin/bash
lifengxue2000:x:1021:1023::/home/lifengxue2000:/bin/bash
yuyuanlin:x:1022:1024::/home/yuyuanlin:/bin/bash
javim013:x:1023:1025::/home/javim013:/bin/bash
tokramm:x:1024:1026::/home/tokramm:/bin/bash
tomokiokuno0528:x:1025:1027::/home/tomokiokuno0528:/bin/bash
qny2021:x:1026:1028::/home/qny2021:/bin/bash
yuhang886688:x:1027:1029::/home/yuhang886688:/bin/bash
jqin10:x:1028:1030::/home/jqin10:/bin/bash
mimi327:x:1029:1031::/home/mimi327:/bin/bash
nataliesisto:x:1030:1032::/home/nataliesisto:/bin/bash
hangsun:x:1031:1033::/home/hangsun:/bin/bash
jiahaotian0702:x:1033:1035::/home/jiahaotian0702:/bin/bash
wanghw:x:1034:1036::/home/wanghw:/bin/bash
wongj1721:x:1035:1037::/home/wongj1721:/bin/bash
lsyang:x:1036:1038::/home/lsyang:/bin/bash
khyeh0816:x:1037:1039::/home/khyeh0816:/bin/bash
younghograd:x:1038:1040::/home/younghograd:/bin/bash
yueyu99:x:1039:1041::/home/yueyu99:/bin/bash
qzhang42:x:1040:1042::/home/qzhang42:/bin/bash
zixiz:x:1041:1043::/home/zixiz:/bin/bash
naying:x:1042:1044::/home/naying:/bin/bash
capj245:x:1043:1045::/home/capj245:/bin/bash
rsuseno:x:1044:1046::/home/rsuseno:/bin/bash
inclassdemo:x:1045:1047::/home/inclassdemo:/bin/bash
Print login names and log-in shells in comma-separated format:
awk -F: '{OFS = ","} {print $1, $7}' /etc/passwd
root,/bin/bash
bin,/sbin/nologin
daemon,/sbin/nologin
adm,/sbin/nologin
lp,/sbin/nologin
sync,/bin/sync
shutdown,/sbin/shutdown
halt,/sbin/halt
mail,/sbin/nologin
operator,/sbin/nologin
games,/sbin/nologin
ftp,/sbin/nologin
nobody,/sbin/nologin
systemd-network,/sbin/nologin
dbus,/sbin/nologin
polkitd,/sbin/nologin
sshd,/sbin/nologin
postfix,/sbin/nologin
chrony,/sbin/nologin
huazhou,/bin/bash
tss,/sbin/nologin
rstudio-server,/bin/bash
shiny,/bin/sh
maschepps,/bin/bash
aarbolante,/bin/bash
rozeta,/bin/bash
yanlongbai975,/bin/bash
ritazxcai,/bin/bash
may.lyn.cheah.phd,/bin/bash
yifan00,/bin/bash
yuruidong99,/bin/bash
fangyi,/bin/bash
sharonfeng,/bin/bash
rfisher2022,/bin/bash
seamusgallivan,/bin/bash
ehodzic,/bin/bash
ionahu08,/bin/bash
lillyhuang25,/bin/bash
jamshidian,/bin/bash
jonathanking192,/bin/bash
yllai,/bin/bash
djlavine,/bin/bash
blei001,/bin/bash
lifengxue2000,/bin/bash
yuyuanlin,/bin/bash
javim013,/bin/bash
tokramm,/bin/bash
tomokiokuno0528,/bin/bash
qny2021,/bin/bash
yuhang886688,/bin/bash
jqin10,/bin/bash
mimi327,/bin/bash
nataliesisto,/bin/bash
hangsun,/bin/bash
jiahaotian0702,/bin/bash
wanghw,/bin/bash
wongj1721,/bin/bash
lsyang,/bin/bash
khyeh0816,/bin/bash
younghograd,/bin/bash
yueyu99,/bin/bash
qzhang42,/bin/bash
zixiz,/bin/bash
naying,/bin/bash
capj245,/bin/bash
rsuseno,/bin/bash
inclassdemo,/bin/bash
Print login names and indicate those with UID>1000 as vip
:
awk -F: -v status="" '{OFS = ","}
{if ($3 >= 1000) status="vip"; else status="regular"}
{print $1, status}' /etc/passwd
root,regular
bin,regular
daemon,regular
adm,regular
lp,regular
sync,regular
shutdown,regular
halt,regular
mail,regular
operator,regular
games,regular
ftp,regular
nobody,regular
systemd-network,regular
dbus,regular
polkitd,regular
sshd,regular
postfix,regular
chrony,regular
huazhou,vip
tss,regular
rstudio-server,regular
shiny,regular
maschepps,vip
aarbolante,vip
rozeta,vip
yanlongbai975,vip
ritazxcai,vip
may.lyn.cheah.phd,vip
yifan00,vip
yuruidong99,vip
fangyi,vip
sharonfeng,vip
rfisher2022,vip
seamusgallivan,vip
ehodzic,vip
ionahu08,vip
lillyhuang25,vip
jamshidian,vip
jonathanking192,vip
yllai,vip
djlavine,vip
blei001,vip
lifengxue2000,vip
yuyuanlin,vip
javim013,vip
tokramm,vip
tomokiokuno0528,vip
qny2021,vip
yuhang886688,vip
jqin10,vip
mimi327,vip
nataliesisto,vip
hangsun,vip
jiahaotian0702,vip
wanghw,vip
wongj1721,vip
lsyang,vip
khyeh0816,vip
younghograd,vip
yueyu99,vip
qzhang42,vip
zixiz,vip
naying,vip
capj245,vip
rsuseno,vip
inclassdemo,vip
Emacs
is a powerful text editor with extensive support for many languages including R
, \(\LaTeX\), python
, and C/C++
; however it’s not installed by default on many Linux distributions.
Basic survival commands:
emacs filename
to open a file with emacs.CTRL-x CTRL-f
to open an existing or new file.CTRL-x CTRX-s
to save.CTRL-x CTRL-w
to save as.CTRL-x CTRL-c
to quit.Google emacs cheatsheet
C-<key>
means hold the control
key, and press <key>
.
M-<key>
means press the Esc
key once, and press <key>
.
Vi
is ubiquitous (POSIX standard). Learn at least its basics; otherwise you can edit nothing on some clusters.
Basic survival commands:
vi filename
to start editing a file.vi
is a modal editor: insert mode and normal mode. Pressing i
switches from the normal mode to insert mode. Pressing ESC
switches from the insert mode to normal mode.:x<Return>
quits vi
and saves changes.:q!<Return>
quits vi without saving latest changes.:w<Return>
saves changes.:wq<Return>
quits vi
and saves changes.Google vi cheatsheet
Statisticians/data scientists write a lot of code. Critical to adopt a good IDE that goes beyond code editing: syntax highlighting, executing code within editor, debugging, profiling, version control, etc.
RStudio, Eclipse, Emacs, Matlab, Visual Studio, etc.
Ctrl+C
to cancel a non-responding or long-running program.OS runs processes on behalf of user.
Each process has Process ID (PID), Username (UID), Parent process ID (PPID), Time and data process started (STIME), time running (TIME), etc.
ps
PID TTY TIME CMD
15080 ? 00:00:06 rsession
16542 ? 00:00:01 R
16661 ? 00:00:00 sh
16662 ? 00:00:00 ps
All current running processes:
ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jan03 ? 00:01:34 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
root 2 0 0 Jan03 ? 00:00:00 [kthreadd]
root 4 2 0 Jan03 ? 00:00:00 [kworker/0:0H]
root 6 2 0 Jan03 ? 00:00:02 [ksoftirqd/0]
root 7 2 0 Jan03 ? 00:00:00 [migration/0]
root 8 2 0 Jan03 ? 00:00:00 [rcu_bh]
root 9 2 0 Jan03 ? 00:05:10 [rcu_sched]
root 10 2 0 Jan03 ? 00:00:00 [lru-add-drain]
root 11 2 0 Jan03 ? 00:00:03 [watchdog/0]
root 12 2 0 Jan03 ? 00:00:02 [watchdog/1]
root 13 2 0 Jan03 ? 00:00:00 [migration/1]
root 14 2 0 Jan03 ? 00:00:02 [ksoftirqd/1]
root 16 2 0 Jan03 ? 00:00:00 [kworker/1:0H]
root 17 2 0 Jan03 ? 00:00:02 [watchdog/2]
root 18 2 0 Jan03 ? 00:00:00 [migration/2]
root 19 2 0 Jan03 ? 00:00:02 [ksoftirqd/2]
root 21 2 0 Jan03 ? 00:00:00 [kworker/2:0H]
root 22 2 0 Jan03 ? 00:00:02 [watchdog/3]
root 23 2 0 Jan03 ? 00:00:00 [migration/3]
root 24 2 0 Jan03 ? 00:00:02 [ksoftirqd/3]
root 26 2 0 Jan03 ? 00:00:00 [kworker/3:0H]
root 28 2 0 Jan03 ? 00:00:00 [kdevtmpfs]
root 29 2 0 Jan03 ? 00:00:00 [netns]
root 30 2 0 Jan03 ? 00:00:00 [khungtaskd]
root 31 2 0 Jan03 ? 00:00:00 [writeback]
root 32 2 0 Jan03 ? 00:00:00 [kintegrityd]
root 33 2 0 Jan03 ? 00:00:00 [bioset]
root 34 2 0 Jan03 ? 00:00:00 [bioset]
root 35 2 0 Jan03 ? 00:00:00 [bioset]
root 36 2 0 Jan03 ? 00:00:00 [kblockd]
root 37 2 0 Jan03 ? 00:00:00 [md]
root 38 2 0 Jan03 ? 00:00:00 [edac-poller]
root 39 2 0 Jan03 ? 00:00:00 [watchdogd]
root 49 2 0 Jan03 ? 00:00:17 [kswapd0]
root 50 2 0 Jan03 ? 00:00:00 [ksmd]
root 51 2 0 Jan03 ? 00:00:07 [khugepaged]
root 52 2 0 Jan03 ? 00:00:00 [crypto]
root 60 2 0 Jan03 ? 00:00:00 [kthrotld]
root 61 2 0 Jan03 ? 00:00:00 [kmpath_rdacd]
root 62 2 0 Jan03 ? 00:00:00 [kaluad]
root 63 2 0 Jan03 ? 00:00:00 [kpsmoused]
root 65 2 0 Jan03 ? 00:00:00 [ipv6_addrconf]
root 78 2 0 Jan03 ? 00:00:00 [deferwq]
root 133 2 0 Jan03 ? 00:00:03 [kauditd]
root 199 2 0 Jan03 ? 00:00:00 [virtscsi-scan]
root 200 2 0 Jan03 ? 00:00:00 [scsi_eh_0]
root 201 2 0 Jan03 ? 00:00:00 [scsi_tmf_0]
root 212 2 0 Jan03 ? 00:00:06 [kworker/0:1H]
root 233 2 0 Jan03 ? 00:00:00 [kworker/2:1H]
root 251 2 0 Jan03 ? 00:00:00 [bioset]
root 252 2 0 Jan03 ? 00:00:00 [xfsalloc]
root 253 2 0 Jan03 ? 00:00:00 [xfs_mru_cache]
root 254 2 0 Jan03 ? 00:00:00 [xfs-buf/sda2]
root 255 2 0 Jan03 ? 00:00:00 [xfs-data/sda2]
root 256 2 0 Jan03 ? 00:00:00 [xfs-conv/sda2]
root 257 2 0 Jan03 ? 00:00:00 [xfs-cil/sda2]
root 258 2 0 Jan03 ? 00:00:00 [xfs-reclaim/sda]
root 259 2 0 Jan03 ? 00:00:00 [xfs-log/sda2]
root 260 2 0 Jan03 ? 00:00:00 [xfs-eofblocks/s]
root 261 2 0 Jan03 ? 00:01:11 [xfsaild/sda2]
root 262 2 0 Jan03 ? 00:00:00 [kworker/1:1H]
root 263 2 0 Jan03 ? 00:00:00 [kworker/3:1H]
root 326 1 0 Jan03 ? 00:00:25 /usr/lib/systemd/systemd-journald
root 354 1 0 Jan03 ? 00:00:00 /usr/lib/systemd/systemd-udevd
root 382 2 0 Jan03 ? 00:00:00 [hwrng]
root 457 1 0 Jan03 ? 00:00:06 /sbin/auditd
polkitd 498 1 0 Jan03 ? 00:00:01 /usr/lib/polkit-1/polkitd --no-debug
dbus 501 1 0 Jan03 ? 00:00:05 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
chrony 507 1 0 Jan03 ? 00:00:00 /usr/sbin/chronyd
root 508 1 0 Jan03 ? 00:00:00 /usr/sbin/acpid
root 525 1 0 Jan03 ? 00:00:01 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
root 526 1 0 Jan03 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
root 527 1 0 Jan03 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200,38400,9600 ttyS0 vt220
root 547 1 0 Jan03 ? 00:00:13 /usr/sbin/NetworkManager --no-daemon
root 670 547 0 Jan03 ? 00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-2f272bb6-80c6-470e-9878-f080f7860f33-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
root 934 1 0 Jan03 ? 00:01:20 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
root 936 1 0 Jan03 ? 00:07:58 /usr/bin/google_osconfig_agent
root 937 1 0 Jan03 ? 00:00:50 /usr/sbin/rsyslogd -n
root 938 1 0 Jan03 ? 00:01:55 /usr/bin/google_guest_agent
tomokio+ 1182 4789 0 18:22 ? 00:01:50 /usr/lib/rstudio-server/bin/rsession -u tomokiokuno0528 --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
root 1239 1 0 Jan03 ? 00:00:04 /usr/lib/systemd/systemd-logind
root 1295 1 0 Jan03 ? 00:00:02 /usr/sbin/crond -n
root 1322 1 0 Jan03 ? 00:00:04 /usr/libexec/postfix/master -w
postfix 1326 1322 0 Jan03 ? 00:00:00 qmgr -l -t unix -u
root 1654 2 0 Jan03 ? 00:00:09 [jbd2/sdb-8]
root 1655 2 0 Jan03 ? 00:00:00 [ext4-rsv-conver]
root 1944 2 0 08:26 ? 00:00:00 [kworker/u8:2]
root 3749 2 0 Jan10 ? 00:00:09 [kworker/0:2]
root 3789 2 0 19:01 ? 00:00:00 [kworker/2:1]
tomokio+ 3963 1 0 Jan10 ? 00:00:00 ssh tomokiokuno0528@server.ucla-biostat-203b.com
root 3964 32148 0 Jan10 ? 00:00:00 sshd: tomokiokuno0528 [priv]
tomokio+ 3999 3964 0 Jan10 ? 00:00:00 sshd: tomokiokuno0528@pts/3
tomokio+ 4000 3999 0 Jan10 pts/3 00:00:00 -bash
rstudio+ 4789 1 0 Jan03 ? 00:39:54 /usr/lib/rstudio-server/bin/rserver
root 7255 1 0 Jan03 ? 00:00:00 /opt/shiny-server/ext/node/bin/shiny-server /opt/shiny-server/lib/main.js
maschep+ 7276 1 0 Jan08 ? 00:00:00 ssh-agent -s
root 8371 32148 0 Jan08 ? 00:00:00 sshd: maschepps [priv]
root 8384 2 0 20:12 ? 00:00:00 [kworker/0:1]
maschep+ 8385 8371 0 Jan08 ? 00:00:00 sshd: maschepps@pts/0
maschep+ 8386 8385 0 Jan08 pts/0 00:00:00 -bash
maschep+ 8727 1 0 Jan08 ? 00:00:00 ssh-agent -s
tomokio+ 8960 4000 0 01:13 pts/3 00:00:00 ssh -i /home/tomokiokuno0528/.ssh/id_rsa tomokiokuno0528@server.ucla-biostat-203b.com
root 8961 32148 0 01:13 ? 00:00:00 sshd: tomokiokuno0528 [priv]
tomokio+ 8966 8961 0 01:13 ? 00:00:00 sshd: tomokiokuno0528@pts/1
tomokio+ 8967 8966 0 01:13 pts/1 00:00:00 -bash
may.lyn+ 11314 4789 0 Jan06 ? 00:47:44 /usr/lib/rstudio-server/bin/rsession -u may.lyn.cheah.phd --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
rfisher+ 13556 4789 0 21:48 ? 00:00:17 /usr/lib/rstudio-server/bin/rsession -u rfisher2022 --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
rfisher+ 13646 13556 0 21:48 pts/5 00:00:00 bash -l
postfix 13888 1322 0 21:52 ? 00:00:00 pickup -l -t unix -u
tomokio+ 13901 1182 0 21:52 ? 00:00:00 bash /tmp/RtmpWFSsSs/chunk-code-49e78c9829d.txt
tomokio+ 13903 13901 0 21:52 ? 00:00:00 vi pg42671.txt M
inclass+ 14172 4789 0 21:57 ? 00:00:05 /usr/lib/rstudio-server/bin/rsession -u inclassdemo --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
root 14222 2 0 21:58 ? 00:00:00 [kworker/3:0]
may.lyn+ 14763 11314 0 Jan06 pts/27 00:00:00 bash -l
root 15053 2 0 22:13 ? 00:00:00 [kworker/1:0]
huazhou 15080 4789 0 22:13 ? 00:00:06 /usr/lib/rstudio-server/bin/rsession -u huazhou --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
root 15220 2 0 22:14 ? 00:00:00 [kworker/3:2]
may.lyn+ 15227 14763 0 Jan06 pts/27 00:00:00 /usr/lib64/R/bin/exec/R
may.lyn+ 15282 15227 0 Jan06 pts/27 00:00:00 sh -c '/usr/lib64/R/bin/pager' < '/tmp/RtmpkXUttn/3b7b1e86c95'
may.lyn+ 15283 15282 0 Jan06 pts/27 00:00:00 /usr/bin/less
tomokio+ 15720 1182 0 22:16 pts/6 00:00:00 bash -l
huazhou 15792 15080 0 22:17 pts/7 00:00:00 bash -l
root 15852 2 0 Jan04 ? 00:00:02 [jbd2/sdc-8]
root 15853 2 0 Jan04 ? 00:00:00 [ext4-rsv-conver]
root 15964 2 0 22:20 ? 00:00:00 [kworker/2:0]
root 16503 2 0 22:24 ? 00:00:00 [kworker/1:1]
root 16538 2 0 22:24 ? 00:00:00 [kworker/3:1]
huazhou 16542 15080 50 22:24 ? 00:00:01 /usr/lib64/R/bin/exec/R --no-save --no-restore -s -e rmarkdown::render('/home/huazhou/203b-2022winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
huazhou 16663 16542 0 22:24 ? 00:00:00 sh -c 'bash' -c 'ps -eaf' 2>&1
huazhou 16664 16663 0 22:24 ? 00:00:00 ps -eaf
root 17349 2 0 Jan08 ? 00:00:06 [kworker/2:2]
maschep+ 17544 1 0 Jan04 ? 00:00:00 ssh-agent
maschep+ 17970 1 0 Jan04 ? 00:00:00 ssh-agent
maschep+ 18118 1 0 Jan04 ? 00:00:00 ssh-agent -s
ehodzic 25860 1 0 Jan10 ? 00:00:00 ssh-agent -s
root 29948 32148 0 Jan08 ? 00:00:00 sshd: maschepps [priv]
maschep+ 29955 29948 0 Jan08 ? 00:00:00 sshd: maschepps@pts/2
maschep+ 29956 29955 0 Jan08 pts/2 00:00:00 -bash
root 30621 2 0 Jan07 ? 00:00:00 [kworker/u8:1]
root 31188 2 0 Jan08 ? 00:00:05 [kworker/1:2]
root 32148 1 0 Jan03 ? 00:00:05 /usr/sbin/sshd -D
All Python processes:
ps -eaf | grep python
root 525 1 0 Jan03 ? 00:00:01 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid
root 934 1 0 Jan03 ? 00:01:20 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
huazhou 16665 16542 0 22:24 ? 00:00:00 sh -c 'bash' -c 'ps -eaf | grep python' 2>&1
huazhou 16666 16665 0 22:24 ? 00:00:00 bash -c ps -eaf | grep python
huazhou 16668 16666 0 22:24 ? 00:00:00 grep python
Process with PID=1:
ps -fp 1
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jan03 ? 00:01:34 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
All processes owned by a user:
ps -fu huazhou
UID PID PPID C STIME TTY TIME CMD
huazhou 15080 4789 0 22:13 ? 00:00:06 /usr/lib/rstudio-server/bin/rsession -u huazhou --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token 0FE0B264 --r-restore-workspace 2 --r-run-rprofile 2
huazhou 15792 15080 0 22:17 pts/7 00:00:00 bash -l
huazhou 16542 15080 51 22:24 ? 00:00:01 /usr/lib64/R/bin/exec/R --no-save --no-restore -s -e rmarkdown::render('/home/huazhou/203b-2022winter/slides/02-linux/linux.Rmd',~+~~+~encoding~+~=~+~'UTF-8');
huazhou 16671 16542 0 22:24 ? 00:00:00 sh -c 'bash' -c 'ps -fu huazhou' 2>&1
huazhou 16672 16671 0 22:24 ? 00:00:00 ps -fu huazhou
Kill process with PID=1001:
kill 1001
Kill all R processes.
killall -r R
top
top
prints realtime process information (very useful).
top
top
program by pressing the q
key.SSH (secure shell) is the dominant cryptographic network protocol for secure network connection via an insecure network.
On Linux or Mac terminal, access the teaching server by
ssh [USERNAME]@server.ucla-biostat-203b.com
Replace above [USERNAME]
by your account user name on teaching server.
For Windows users, there are at least three ways: (1) (highly recommended) Git Bash which is included in Git for Windows, (2) (not recommended) PuTTY program (free), or (3) (may be an overkill for this class) use WSL for Windows to install a full fledged Linux system within Windows.
Key authentication is more secure than password. Most passwords are weak.
Script or a program may need to systematically SSH into other machines.
Log into multiple machines using the same key.
Seamless use of many services: Git/GitHub, AWS or Google cloud service, parallel computing on multiple hosts, Travis CI (continuous integration) etc.
Many servers only allow key authentication and do not accept password authentication.
Public key. Put on the machine(s) you want to log in.
Private key. Put on your own computer. Consider this as the actual key in your pocket; never give private key to others. For fun: https://www.youtube.com/watch?v=S8K464ImU0c
Messages from server to your computer is encrypted with your public key. It can only be decrypted using your private key.
Messages from your computer to server is signed with your private key (digital signatures) and can be verified by anyone who has your public key (authentication).
On Linux, Mac, or Windows Git Bash, to generate a key pair:
ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
[KEY_FILENAME]
is the name that you want to use for your SSH key files. For example, a filename of id_rsa
generates a private key file named id_rsa
and a public key file named id_rsa.pub
.
[USERNAME]
is the user for whom you will apply this SSH key.
Use a (optional) paraphrase different from password.
Set correct permissions on the .ssh
folder and key files.
~/.ssh
folder should be 700 (drwx------)
.~/.ssh/id_rsa
should be 600 (-rw-------)
.~/.ssh/id_rsa.pub
should be 644 (-rw-r--r--)
.chmod 700 ~/.ssh
chmod 600 ~/.ssh/[KEY_FILENAME]
chmod 644 ~/.ssh/[KEY_FILENAME].pub
Note Windows is different, it doesn’t allow change of permissions.
Append the public key to the ~/.ssh/authorized_keys
file of any Linux machine we want to SSH to, e.g.,
ssh-copy-id -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.ucla-biostat-203b.com
Make sure the permission of the authorized_keys
file is 600 (-rw-------)
.
Test your new key.
ssh -i ~/.ssh/[KEY_FILENAME] [USERNAME]@server.ucla-biostat-203b.com
From now on, you don’t need password each time you connect from your machine to the teaching server.
If you set paraphrase when generating keys, you’ll be prompted for the paraphrase each time the private key is used. Avoid repeatedly entering the paraphrase by using ssh-agent
on Linux/Mac or Pagent on Windows.
Same key pair can be used between any two machines. We don’t need to regenerate keys for each new connection.
scp
securely transfers files between machines using SSH.
## copy file from local to remote
scp [LOCALFILE] [USERNAME]@server.ucla-biostat-203b.com:/[PATH_TO_FOLDER]
## copy file from remote to local
scp [USERNAME]@server.ucla-biostat-203b.com:/[PATH_TO_FILE] [PATH_TO_LOCAL_FOLDER]
sftp
is FTP via SSH.
Globus
is GUI program for securely transferring files between machines. To use Globus you will have to go to https://www.globus.org/ and login through UCLA by selecting your existing organizational login as UCLA. Then you will need to download their Globus Connect Personal software, then set your laptop as an endpoint. Very detailed instructions can be found at https://www.hoffman2.idre.ucla.edu/file-transfer/globus/.
GUIs for Windows (WinSCP) or Mac (Cyberduck).
You can even use RStudio to upload files to a remote machine with RStudio Server installed.
(Preferred way) Use a version control system (git, svn, cvs, …) to sync project files between different machines and systems.
Windows uses a pair of CR
and LF
for line breaks.
Linux/Unix uses an LF
character only.
MacOS X also uses a single LF
character. But old Mac OS used a single CR
character for line breaks.
If transferred in binary mode (bit by bit) between OSs, a text file could look a mess.
Most transfer programs automatically switch to text mode when transferring text files and perform conversion of line breaks between different OSs; but I used to run into problems using WinSCP. Sometimes you have to tell WinSCP explicitly a text file is being transferred.
Start R in the interactive mode by typing R
in shell.
Then run R script by
source("script.R")
Demo script meanEst.R
implements an (terrible) estimator of mean \[
{\widehat \mu}_n = \frac{\sum_{i=1}^n x_i 1_{i \text{ is prime}}}{\sum_{i=1}^n 1_{i \text{ is prime}}}.
\]
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
print(estMeanPrimes(rnorm(100000)))
To run your R code non-interactively aka in batch mode, we have at least two options:
# default output to meanEst.Rout
R CMD BATCH meanEst.R
or
# output to stdout
Rscript meanEst.R
Typically automate batch calls using a scripting language, e.g., Python, Perl, and shell script.
Specify arguments in R CMD BATCH
:
R CMD BATCH '--args mu=1 sig=2 kap=3' script.R
Specify arguments in Rscript
:
Rscript script.R mu=1 sig=2 kap=3
Parse command line arguments using magic formula
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
in R script. After calling the above code, all command line arguments will be available in the global namespace.
To understand the magic formula commandArgs
, run R by:
R '--args mu=1 sig=2 kap=3'
and then issue commands in R
commandArgs()
commandArgs(TRUE)
Understand the magic formula parse
and eval
:
rm(list = ls())
print(x)
Error in print(x): object 'x' not found
parse(text = "x=3")
expression(x = 3)
eval(parse(text = "x=3"))
print(x)
[1] 3
runSim.R
has components: (1) command argument parser, (2) method implementation, (3) data generator with unspecified parameter n
, and (4) estimation based on generated data.## parsing command arguments
for (arg in commandArgs(TRUE)) {
eval(parse(text=arg))
}
## check if a given integer is prime
isPrime = function(n) {
if (n <= 3) {
return (TRUE)
}
if (any((n %% 2:floor(sqrt(n))) == 0)) {
return (FALSE)
}
return (TRUE)
}
## estimate mean only using observation with prime indices
estMeanPrimes = function (x) {
n = length(x)
ind = sapply(1:n, isPrime)
return (mean(x[ind]))
}
# simulate data
x = rnorm(n)
# estimate mean
estMeanPrimes(x)
Call runSim.R
with sample size n=100
:
R CMD BATCH '--args n=100' runSim.R
or
Rscript runSim.R n=100
[1] -0.01570209
Many statistical computing tasks take long: simulation, MCMC, etc. If we exit Linux when the job is unfinished, the job is killed.
nohup
command in Linux runs program(s) immune to hangups and writes output to nohup.out
by default. Logging out will not kill the process; we can log in later to check status and results.
nohup
is POSIX standard thus available on Linux and MacOS.
Run runSim.R
in background and writes output to nohup.out
:
nohup Rscript runSim.R n=100 &
[1] -0.07564119
The &
at the end of the command instructs Linux to run this command in background, so we gain control of the terminal immediately.
screen
is another popular utility, but not installed by default.
Typical workflow using screen
.
Access remote server using ssh
.
Start jobs in batch mode.
Detach jobs.
Exit from server, wait for jobs to finish.
Access remote server using ssh
.
Re-attach jobs, check on progress, get results, etc.
R in conjuction with nohup
(or screen
) can be used to orchestrate a large simulation study.
It can be more elegant, transparent, and robust to parallelize jobs corresponding to different scenarios (e.g., different generative models) outside of the code used to do statistical computation.
We consider a simulation study in R but the same approach could be used with code written in Julia, Matlab, Python, etc.
Python in many ways makes a better glue.
Suppose we have
runSim.R
which runs a simulation based on command line argument n
.n
values that we want to use in our simulation study.Option 1: manually call runSim.R
for each setting.
Option 2 (smarter): automate calls using R and nohup
.
Let’s demonstrate using the script autoSim.R
cat autoSim.R
# autoSim.R
nVals <- seq(100, 1000, by=100)
for (n in nVals) {
oFile <- paste("n", n, ".txt", sep="")
sysCall <- paste("nohup Rscript runSim.R n=", n, " > ", oFile, sep="")
system(sysCall, wait = FALSE)
print(paste("sysCall=", sysCall, sep=""))
}
Note when we call bash command using the system
function in R, we set optional argument wait=FALSE
so that jobs can be run parallel.
Rscript autoSim.R
[1] "sysCall=nohup Rscript runSim.R n=100 > n100.txt"
[1] "sysCall=nohup Rscript runSim.R n=200 > n200.txt"
[1] "sysCall=nohup Rscript runSim.R n=300 > n300.txt"
[1] "sysCall=nohup Rscript runSim.R n=400 > n400.txt"
[1] "sysCall=nohup Rscript runSim.R n=500 > n500.txt"
[1] "sysCall=nohup Rscript runSim.R n=600 > n600.txt"
[1] "sysCall=nohup Rscript runSim.R n=700 > n700.txt"
[1] "sysCall=nohup Rscript runSim.R n=800 > n800.txt"
[1] "sysCall=nohup Rscript runSim.R n=900 > n900.txt"
[1] "sysCall=nohup Rscript runSim.R n=1000 > n1000.txt"
Now we just need to write a script to collect results from the output files.
Later we will learn how to coordinate large scale computation on UCLA Hoffman2 cluster, using Linux and R scripting.
Log out Linux: exit
or logout
or ctrl+d
.
Clear screen: clear
.