STAT 29000: Project 3 — Fall 2020
Motivation: The ability to navigate a shell, like bash
, and use some of its powerful tools, is very useful. The number of disciplines utilizing data in new ways is ever-growing, and as such, it is very likely that many of you will eventually encounter a scenario where knowing your way around a terminal will be useful. We want to expose you to some of the most useful bash
tools, help you navigate a filesystem, and even run bash
tools from within an RMarkdown file in RStudio.
Context: At this point in time, you will each have varying levels of familiarity with Scholar. In this project we will learn how to use the terminal to navigate a UNIX-like system, experiment with various useful commands, and learn how to execute bash commands from within RStudio in an RMarkdown file.
Scope: bash, RStudio
There are a variety of ways to connect to Scholar. In this class, we will primarily connect to RStudio Server by opening a browser and navigating to rstudio.scholar.rcac.purdue.edu/, entering credentials, and using the excellent RStudio interface.
Here is a video to remind you about some of the basic tools you can use in UNIX/Linux:
This is the easiest book for learning this stuff; it is short and gets right to the point:
You just log in and you can see it all; we suggest Chapters 1, 3, 4, 5, 7 (you can basically skip chapters 2 and 6 the first time through).
It is a very short read (maybe, say, 2 or 3 hours altogether?), just a thin book that gets right to the details.
Questions
Question 1
Navigate to rstudio.scholar.rcac.purdue.edu/ and login. Take some time to click around and explore this tool. We will be writing and running Python, R, SQL, and bash
all from within this interface. Navigate to Tools > Global Options …
. Explore this interface and make at least 2 modifications. List what you changed.
Here are some changes Kevin likes:
-
Uncheck "Restore .Rdata into workspace at startup".
-
Change tab width 4.
-
Check "Soft-wrap R source files".
-
Check "Highlight selected line".
-
Check "Strip trailing horizontal whitespace when saving".
-
Uncheck "Show margin".
(Dr Ward does not like to customize his own environment, but he does use the emacs key bindings: Tools > Global Options > Code > Keybindings, but this is only recommended if you already know emacs.)
-
List of modifications you made to your Global Options.
Question 2
There are four primary panes, each with various tabs. In one of the panes there will be a tab labeled "Terminal". Click on that tab. This terminal by default will run a bash
shell right within Scholar, the same as if you connected to Scholar using ThinLinc, and opened a terminal. Very convenient!
What is the default directory of your bash shell?
Start by reading the section on |
# read the manual for the `man` command
# use "k" or the up arrow to scroll up, "j" or the down arrow to scroll down
man man
-
The full filepath of default directory (home directory). Ex: Kevin’s is:
/home/kamstut
-
The
bash
code used to show your home directory or current directory (also known as the working directory) when thebash
shell is first launched.
Question 3
Learning to navigate away from our home directory to other folders, and back again, is vital. Perform the following actions, in order:
-
Write a single command to navigate to the folder containing our full datasets:
/class/datamine/data
. -
Write a command to confirm you are in the correct folder.
-
Write a command to list the files and directories within the data directory. (You do not need to recursively list subdirectories and files contained therein.) What are the names of the files and directories?
-
Write another command to return back to your home directory.
-
Write a command to confirm you are in the correct folder.
Note: /
is commonly referred to as the root directory in a linux/unix filesystem. Think of it as a folder that contains every other folder in the computer. /home
is a folder within the root directory. /home/kamstut
is the full filepath of Kevin’s home directory. There is a folder home
inside the root directory. Inside home
is another folder named kamstut
which is Kevin’s home directory.
-
Command used to navigate to the data directory.
-
Command used to confirm you are in the data directory.
-
Command used to list files and folders.
-
List of files and folders in the data directory.
-
Command used to navigate back to the home directory.
-
Command used to confirm you are in the home directory.
Question 4
Let’s learn about two more important concepts. .
refers to the current working directory, or the directory displayed when you run pwd
. Unlike pwd
you can use this when navigating the filesystem! So, for example, if you wanted to see the contents of a file called my_file.txt
that lives in /home/kamstut
(so, a full path of /home/kamstut/my_file.txt
), and you are currently in /home/kamstut
, you could run: cat ./my_file.txt
.
..
represents the parent folder or the folder in which your current folder is contained. So let’s say I was in /home/kamstut/projects/
and I wanted to get the contents of the file /home/kamstut/my_file.txt
. You could do: cat ../my_file.txt
.
When you navigate a directory tree using .
and ..
you create paths that are called relative paths because they are relative to your current directory. Alternatively, a full path or (absolute path) is the path starting from the root directory. So /home/kamstut/my_file.txt
is the absolute path for my_file.txt
and ../my_file.txt
is a relative path. Perform the following actions, in order:
-
Write a single command to navigate to the data directory.
-
Write a single command to navigate back to your home directory using a relative path. Do not use
~
or thecd
command without a path argument.
-
Command used to navigate to the data directory.
-
Command used to navigate back to your home directory that uses a relative path.
Question 5
In Scholar, when you want to deal with really large amounts of data, you want to access scratch (you can read more here). Your scratch directory on Scholar is located here: /scratch/scholar/$USER
. $USER
is an environment variable containing your username. Test it out: echo /scratch/scholar/$USER
. Perform the following actions:
-
Navigate to your scratch directory.
-
Confirm you are in the correct location.
-
Execute
myquota
. -
Find the location of the
myquota
bash script. -
Output the first 5 and last 5 lines of the bash script.
-
Count the number of lines in the bash script.
-
How many kilobytes is the script?
You could use each of the commands in the relevant topics once. |
When you type |
Commands often have options. Options are features of the program that you can trigger specifically. You can see the options of a command in the |
# using the default wc command. "/class/datamine/data/flights/1987.csv" is the first "argument" given to the command.
wc /class/datamine/data/flights/1987.csv
# to count the lines, use the -l option
wc -l /class/datamine/data/flights/1987.csv
# to count the words, use the -w option
wc -w /class/datamine/data/flights/1987.csv
# you can combine options as well
wc -w -l /class/datamine/data/flights/1987.csv
# some people like to use a single tack `-`
wc -wl /class/datamine/data/flights/1987.csv
# order doesn't matter
wc -lw /class/datamine/data/flights/1987.csv
The |
-
Command used to navigate to your scratch directory.
-
Command used to confirm your location.
-
Output of
myquota
. -
Command used to find the location of the
myquota
script. -
Absolute path of the
myquota
script. -
Command used to output the first 5 lines of the
myquota
script. -
Command used to output the last 5 lines of the
myquota
script. -
Command used to find the number of lines in the
myquota
script. -
Number of lines in the script.
-
Command used to find out how many kilobytes the script is.
-
Number of kilobytes that the script takes up.
Question 6
Perform the following operations:
-
Navigate to your scratch directory.
-
Copy and paste the file:
/class/datamine/data/flights/1987.csv
to your current directory (scratch). -
Create a new directory called
my_test_dir
in your scratch folder. -
Move the file you copied to your scratch directory, into your new folder.
-
Use
touch
to create an empty file namedim_empty.txt
in your scratch folder. -
Remove the directory
my_test_dir
and the contents of the directory. -
Remove the
im_empty.txt
file.
|
-
Command used to navigate to your scratch directory.
-
Command used to copy the file,
/class/datamine/data/flights/1987.csv
to your current directory (scratch). -
Command used to create a new directory called
my_test_dir
in your scratch folder. -
Command used to move the file you copied earlier
1987.csv
into your newmy_test_dir
folder. -
Command used to create an empty file named
im_empty.txt
in your scratch folder. -
Command used to remove the directory and the contents of the directory
my_test_dir
. -
Command used to remove the
im_empty.txt
file.
Question 7
Please include a statement in Project 3 that says, "I acknowledge that the STAT 19000/29000/39000 1-credit Data Mine seminar will be recorded and posted on Piazza, for participants in this course." or if you disagree with this statement, please consult with us at datamine@purdue.edu for an alternative plan.