-
Explain how shell relates to the operating system, and users’ programs.
-
Explain when and why command-line interfaces should be used instead of graphical interfaces.
A shell is a program like any other. What’s special about it is that its job is to run other programs rather than to do calculations itself. The most popular Unix shell is Bash, the Bourne Again SHell (so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.
The shell prompt allows you to interact with you computer and the files and folders present in it.
campus-009-192:~ eoziolor$
You can enter commands in shell that will allow you to execute actions.
campus-009-192:~ eoziolor$ ls
2018-setacna-rnaseq Library awscli-bundle popgen_class
Applications Movies fgfh_post power
Desktop Music jobs training
Documents Pictures learn_python
Downloads Public phgenome_post
GIGAIII_bioinformatics_workshop angus_private_key
You can modify the actions of commands (like ls) by using flags aka options
campus-009-192:~ eoziolor$ ls -lh /
total 13
drwxrwxr-x+ 67 root admin 2.1K Oct 16 14:59 Applications
drwxr-xr-x+ 64 root wheel 2.0K Oct 1 16:21 Library
drwxr-xr-x 2 root wheel 64B Oct 1 16:17 Network
drwxr-xr-x@ 5 root wheel 160B Sep 20 21:05 System
drwxr-xr-x 6 root admin 192B Oct 1 16:17 Users
drwxr-xr-x+ 3 root wheel 96B Oct 18 14:12 Volumes
drwxr-xr-x 22 eoziolor staff 704B Aug 31 11:39 anaconda3
drwxr-xr-x@ 37 root wheel 1.2K Sep 20 21:17 bin
drwxrwxr-t 2 root admin 64B Oct 1 16:17 cores
dr-xr-xr-x 3 root wheel 4.2K Oct 15 17:58 dev
lrwxr-xr-x@ 1 root wheel 11B Oct 1 16:16 etc -> private/etc
dr-xr-xr-x 2 root wheel 1B Oct 18 12:52 home
-rw-r--r-- 1 root wheel 313B Aug 17 17:55 installer.failurerequests
dr-xr-xr-x 2 root wheel 1B Oct 18 12:52 net
drwxr-xr-x 6 root wheel 192B Oct 1 16:17 private
drwxr-xr-x@ 63 root wheel 2.0K Oct 1 16:16 sbin
lrwxr-xr-x@ 1 root wheel 11B Oct 1 16:16 tmp -> private/tmp
drwxr-xr-x@ 9 root wheel 288B Sep 20 21:01 usr
lrwxr-xr-x@ 1 root wheel 11B Oct 1 16:16 var -> private/var
drwxr-xr-x 3 root wheel 96B Oct 1 16:19 vm
In this case I am using the modifier -lh, to make ls (list) spit out a (l)ist of files/folders that is (h)uman readable in the directory /.
In order to find out how to use a command you can always use the command man in front of whichever command you'd like to use. That will give you the (man)ual for that command.
LS(1) BSD General Commands Manual LS(1)
NAME
ls -- list directory contents
SYNOPSIS
ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]
DESCRIPTION
For each operand that names a file of a type other than directory, ls displays its name as well as any requested, associated information.
For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well as any
requested, associated information.
If no operands are given, the contents of the current directory are displayed. If more than one operand is given, non-directory operands
are displayed first; directory and non-directory operands are sorted separately and in lexicographical order.
The following options are available:
-@ Display extended attribute keys and sizes in long (-l) output.
-1 (The numeric digit ``one''.) Force output to be one entry per line. This is the default when output is not to a terminal.
-A List all entries except for . and ... Always set for the super-user.
-a Include directory entries whose names begin with a dot (.).
-B Force printing of non-printable characters (as defined by ctype(3) and current locale settings) in file names as \xxx, where xxx
is the numeric value of the character in octal.
-b As -B, but use C escape codes whenever possible.
-C Force multi-column output; this is the default when output is to a terminal.
-c Use time when file status was last changed for sorting (-t) or long printing (-l).
-d Directories are listed as plain files (not searched recursively).
-e Print the Access Control List (ACL) associated with the file, if present, in long (-l) output.
-F Display a slash (`/') immediately after each pathname that is a directory, an asterisk (`*') after each that is executable, an at
sign (`@') after each symbolic link, an equals sign (`=') after each socket, a percent sign (`%') after each whiteout, and a ver-
tical bar (`|') after each that is a FIFO.
-f Output is not sorted. This option turns on the -a option.
-G Enable colorized output. This option is equivalent to defining CLICOLOR in the environment. (See below.)
That way you can experiment with commands and use various options of them.
P.S.: To quit the man prompt you can just type "q"
Command prompts are simple and are a language of their own. That means that unless you are percise and accurate about what you want to do - it will get done poorly!
Let's say that you forget a space between your command and the modifier you are using.
campus-009-192:~ eoziolor$ ls-F
-bash: ls-F: command not found
Bash has no idea what you mean and is likely not going to try to guess what it is. This is why you should always double, triple, quadruple check scripts that you write. One letter off can throw out a whole pipeline and sometimes you might not even know that it's happening.
Biggest advantage of using command line is that you know exactly what you are doing. This is not a black box in which a program is going to magically analyze your data and spit out results. If you're here, usually you know what your data looks like and format it needs to take to be received by the various programs of unix command-line.
Another big advantage of unix is that it uses pipes (|). Basically that is the way for unix to rush the output from one command into another, so that you don't have to save any intermediates. We will chat more about this later, but overall that simplifies analyses immensely.
When you open the terminal you are immediately placed in a location on your computer. You can find where you are by running the command pwd (print working directory).
campus-009-192:~ eoziolor$ pwd
/Users/eoziolor
The home directory path will look different on different operating systems. On Linux it may look like /home/nelle, and on Windows it will be similar to C:\Documents and Settings\nelle or C:\Users\nelle.
You can navigate between the folders on your system using the command cd (change directory).
Try using it to go into the Desktop folder.
campus-009-192:~ eoziolor$ cd Desktop/
campus-009-192:Desktop eoziolor$ pwd
/Users/eoziolor/Desktop
campus-009-192:Desktop eoziolor$
Now in order to get out of a directory you can type cd followed by two periods. That will put you back in the directory behind yours.
campus-009-192:Desktop eoziolor$ cd ..
campus-009-192:~ eoziolor$ pwd
/Users/eoziolor
campus-009-192:~ eoziolor$
Let's say you wanted to move between further away directories, you could just append the same commands following each other.
campus-009-192:~ eoziolor$ cd Documents/UCD/
campus-009-192:UCD eoziolor$ pwd
/Users/eoziolor/Documents/UCD
campus-009-192:UCD eoziolor$ cd ../../
campus-009-192:~ eoziolor$ pwd
/Users/eoziolor
campus-009-192:~ eoziolor$
Viewing hidden files
There are also files on your system that do not appear on normal list searches. They are called hidden files and you can reveal them by appending -a at the end of ls.
campus-009-192:~ eoziolor$ ls -a
. .gitconfig .zoomus angus_private_key
.. .ipynb_checkpoints 2018-setacna-rnaseq awscli-bundle
.CFUserTextEncoding .ipython Applications fgfh_post
.DS_Store .jupyter Desktop jobs
.Rhistory .matplotlib Documents learn_python
.Trash .oracle_jre_usage Downloads phgenome_post
.atom .python_history GIGAIII_bioinformatics_workshop phpopg
.bash_history .rstudio-desktop Library popgen_class
.bash_profile .ssh Movies power
.bash_sessions .subversion Music training
.conda .swp Pictures
.cups .viminfo
There are two ways at getting to a folder. One way is to specify the absolute location of the folder:
campus-009-192:~ eoziolor$ cd /Users/eoziolor/Desktop/
campus-009-192:Desktop eoziolor$ pwd
/Users/eoziolor/Desktop
This will get you to the folder location no matter your starting location.
On the other hand, if you don't want to have to type the absolute path for very far files/folders, you can specify them from the location you are by just typing their relative location to you.
campus-009-192:~ eoziolor$ pwd
/Users/eoziolor
campus-009-192:~ eoziolor$ ls
2018-setacna-rnaseq Library awscli-bundle popgen_class
Applications Movies fgfh_post power
Desktop Music jobs training
Documents Pictures learn_python
Downloads Public phgenome_post
GIGAIII_bioinformatics_workshop angus_private_key phpopg
campus-009-192:~ eoziolor$ cd Desktop/
campus-009-192:Desktop eoziolor$ pwd
/Users/eoziolor/Desktop
You can use the command mkdir (make directory) to make a folder in the directory you are.
campus-009-192:~ eoziolor$ mkdir blabla
campus-009-192:~ eoziolor$ ls
2018-setacna-rnaseq Library awscli-bundle phpopg
Applications Movies blabla popgen_class
Desktop Music fgfh_post power
Documents Pictures jobs training
Downloads Public learn_python
GIGAIII_bioinformatics_workshop angus_private_key phgenome_post
If you would like to delete a folder, you can use the command rmdir (remove directory), but that only works if there is nothing in the folder.
To delete a folder with contents type rm -rf followed by the directory. BE VERY CAREFUL with this! You cannot undo this! It does not go into trash, it just disappears FOREVER!
Here are some rules of thumb for creating new folders:
- don't have spaces in the name - it confuses bash
- name them something simple that you can remember
- avoid capital letters - bash is sensitive to them
- create structure in your directories
In a similar manner you can create a new file with the command touch
campus-030-034:~ eoziolor$ touch document.txt
campus-030-034:~ eoziolor$ ls
2018-setacna-rnaseq Library awscli-bundle phgenome_post
Applications Movies blabla phpopg
Desktop Music document.txt popgen_class
Documents Pictures fgfh_post power
Downloads Public jobs setac_private_key
GIGAIII_bioinformatics_workshop angus_private_key learn_python training
campus-030-034:~ eoziolor$
Do not open vim unless you're ready
Let's start by opening the document we created with the test editor nano
nano document.txt
Now you can type anything that you want within that document.
hello
everyone
what
is
going
on
I will close the document by pressing Cmd+x and follow the prompt to save the changes.
Now we have a couple of option in which we can look at this document. We can use the command less, often employed for bigger documents.
Advantages of less:
- only opens a chunk of the document to fill a page
- you can scroll up and down the document
Disadvantages of less:
- can't do much more than that in terms of interacting with the text
The other option we have is cat. Advantages of cat:
- passes the text through many other programs
- can process zipped text with its sister program zcat
- just cool AF
Disadvantages of cat:
- your terminal will go crazy if you try to open a large document - type Cmd+c to stop it if it goes unwieldy
- you don't get GUI-like interaction with the text in your document
Let's start by using cat here:
cat document.txt
hello
everyone
what
is
going
on
Piping is one of the most useful thing in bash script. The unix shell was made to do something to a file and be able to unitize these commands to pass a certain pipeline.
The pipe basically means that I will do something to a file and the output of that I will pass to another program...to do whatever the other program is doing with it. The pipe symbol is | and can work as following:
cat document.txt | head -n 3
hello
everyone
what
In this case what I'm doing is printing our file and piping the output into the head commnad, which allows me to print only the first 3 lines.
I can use an abundance of commands to now manipulate this text. Let's look at a slightly more complicated document and see what we can do.
Let's quickly download a small fasta file containing the Fundulus heteroclitus transcriptome and play with it:
curl -O ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/826/765/GCF_000826765.1_Fundulus_heteroclitus-3.0.2/GCF_000826765.1_Fundulus_heteroclitus-3.0.2_rna.fna.gz
If you want to rename a file you can use mv (move)
mv GCF_000826765.1_Fundulus_heteroclitus-3.0.2_rna.fna.gz fhet.tr.fna.gz
Now let's look at the top of the file
zcat < fhet.tr.fna.gz | head
>NM_001309911.1 Fundulus heteroclitus low choriolytic enzyme-like (LOC105918466), mRNA
GGAAGGAAAAAATGGATCTCCAAGCACGAGCCTTGCTTCTGCTCCTGCTGCTTTCAGCCGTCTGTAATGCTTACCCCACA
GATAATTACAAAGCAGATGACGAAAACTCAGAGAAGGAGGACATCACAACCACTATCCTCAGAATGAACAATGGATCTGC
CGATATGCTGTTTGAAGGAGACGTTTTTGTTCCAAGATCCCGGACTGCCAAGAAGTGCCTTGATCCACGTTACAGCTGTT
TCTGGCCAAAGTCTTCAAATGGGAATGTGGAAATCCCTTTTGTTTTAAGTGACGAATATGATCACAACGAGAAGAATCAG
ATTCTCAAAGCCATGAAGGGCTTTGAGGGTAGAACCTGCATCCGCTTTGTTCGTCATAGAGGAGAGAGGGCGTACCTGAG
CATTGAGTCCAAATTTGGCTGTTTCTCTTTGATGGGTCGTTCTGGAGAAAGGCAGCTTGTGTCTCTGCAGAGACCCGGTT
GTTTAAATAATGGCATCATCCAGCATGAGCTGCTCCACGCTATGGGTTTCTACCACGAACACACTCGCAGCGACCGTGAC
AAATATGTCAAAATCAACTGGGATAACATACAAGAATATTATTATAAAAACTTCAAAAAAATGGACACAGACAATCTCAC
CCCATATGACTACTCCTCTGTGATGCAATATGGAAAAACTGCCTTTGGAAAGAACAGGGCAGAATCCATCACTCCTATCC
Here's your normal fasta output. Now let's try to get some quick stats out of this.
zcat < fhet.tr.fna.gz | grep -c "^>"
41170
What I told the terminal is to grep, which captures a certain pattern in the document, to count (-c) how often this pattern occurs.
In this case I have counted the number of new lines that begin with >, which is every other line in a fasta document. What that tells me is that there are 41170 transcripts represented in this file.
Notice that every > line starts with NM_. What if we want to get rid of all of these underscores and replace them with a dot? We can do this with tr.
zcat < fhet.tr.fna.gz | tr "_" "." | head
>NM.001309911.1 Fundulus heteroclitus low choriolytic enzyme-like (LOC105918466), mRNA
GGAAGGAAAAAATGGATCTCCAAGCACGAGCCTTGCTTCTGCTCCTGCTGCTTTCAGCCGTCTGTAATGCTTACCCCACA
GATAATTACAAAGCAGATGACGAAAACTCAGAGAAGGAGGACATCACAACCACTATCCTCAGAATGAACAATGGATCTGC
CGATATGCTGTTTGAAGGAGACGTTTTTGTTCCAAGATCCCGGACTGCCAAGAAGTGCCTTGATCCACGTTACAGCTGTT
TCTGGCCAAAGTCTTCAAATGGGAATGTGGAAATCCCTTTTGTTTTAAGTGACGAATATGATCACAACGAGAAGAATCAG
ATTCTCAAAGCCATGAAGGGCTTTGAGGGTAGAACCTGCATCCGCTTTGTTCGTCATAGAGGAGAGAGGGCGTACCTGAG
CATTGAGTCCAAATTTGGCTGTTTCTCTTTGATGGGTCGTTCTGGAGAAAGGCAGCTTGTGTCTCTGCAGAGACCCGGTT
GTTTAAATAATGGCATCATCCAGCATGAGCTGCTCCACGCTATGGGTTTCTACCACGAACACACTCGCAGCGACCGTGAC
AAATATGTCAAAATCAACTGGGATAACATACAAGAATATTATTATAAAAACTTCAAAAAAATGGACACAGACAATCTCAC
CCCATATGACTACTCCTCTGTGATGCAATATGGAAAAACTGCCTTTGGAAAGAACAGGGCAGAATCCATCACTCCTATCC
This becomes very useful for example if you have a comma separated value document and want to turn it into a tab separated value document. In that case you can use the command:
tr "," "\t"
and now you have a tab separated value document.
Word count is a useful feature and we can use it in variety of ways.
zcat < fhet.tr.fna.gz | grep -v "^>" | wc -m
131885065
What I've done above is I've used grep to remove any line that begins with ^> thus effectively only leaving the lines that have genetic code in them and not the names of each entry. Then I've told wc to give me a count of all the characters (-m) present in those lines, effectively giving me the size of the transcriptome (~131Mb).
I suggest using man or - -help to explore these commands further!