Engineering productivity tips study guide

Star

Working in groups with Git

Overview Git is a version control system (VCS) that tracks changes of different files in a given repository. In particular, it is useful for:


Getting started The table below summarizes the commands to start a new project, depending on whether or not the repository already exists:

CaseActionCommandIllustration
No existing repositoryInitialize repository from local foldergit initInitialization
Repository already existsCopy repository from remote to localgit clone path/to/address.gitClone


File check-in We can track modifications made in the repository, done by either modifying, adding or deleting a file, through the following steps:

StepCommandIllustration
1. Add modified, new, or deleted file to staging areagit add fileAdd
2. Save snapshot along with descriptive messagegit commit -m 'description'Commit

Remark 1: git add . will have all modified files to the staging area.
Remark 2: files that we do not want to track can be listed in the .gitignore file.


Sync with remote The following commands enable changes to be synchronized between remote and local machines:

ActionCommandIllustration
Fetch most recent changes from remote branchgit pull origin name_of_branchPull
Push latest local changes to remote branchgit push origin name_of_branchPush


Parallel workstreams In order to make changes that do not interfere with the current branch, we can create another branch name_of_new_branch as follows:

git checkout -b name_of_new_branch   # Create and checkout to that branch

Depending on whether we want to incorporate or discard the branch, we have the following commands:

ActionCommandIllustration
Merge name_of_branch with current branchgit merge name_of_branchMerge
Remove name_of_branchgit branch -D name_of_branchDelete


Tracking status We can check previous changes made to the repository with the following commands:

ActionCommandIllustration
Check status of modified file(s)git statusStatus
View last commitsgit log --onelineLog
Compare changes made between two commitsgit diff commit_1 commit_2Diff
View list of local branchesgit branchBranch


Canceling changes Canceling changes is done differently depending on the situation that we are in. The table below sums up the most common cases:

CaseActionCommandIllustration
UnstagedRevert file to state in last commitgit checkout -- fileRevert
StagedRemove file from staging areagit reset HEAD fileRemove
CommittedGo back to a previous commitgit reset --hard prev_commitGo back


Project structure It is important to keep a consistent and logical structure of the project. One example of structure of a data science project is as follows:

my_project/
  ├── analysis/
    ├── graphs/
    └── notebooks/
  ├── data/
    ├── query/
    ├── raw/
    └── processed/
  ├── modeling/
    ├── methods/
    ├── results/
    └── tests/
  └── README.md


Working with Bash

Basic terminal commands The table below sums up the most useful terminal commands:

CategoryActionCommand
ExplorationDisplay list of files (including hidden ones)ls (-a)
Show path to current directorypwd
Show content of filecat path_to_file
Show statistics of file (lines/words/characters)wc path_to_file
File
management
Create new foldermkdir folder_name
Change directory to foldercd path_to_folder
Create new empty filetouch filename
Copy-paste file (folder) from origin to destinationscp (-R) origin destination
Move file/folder from origin to destinationmv origin destination
Remove file (folder)rm (-R) path
CompressionCompress folder into filetar -czvf compressed.tar.gz folder
Uncompress filetar -xzvf compressed.tar.gz
MiscellaneousDisplay messageecho "message"
Overwrite / append file with outputoutput > file.txt / output >> file.txt
Execute a given command with elevated privilegessudo command
Connect to a remote machinessh remote_machine_address


Chaining It is a concept that improves readability by chaining operations with the pipe | operator. A few common examples are summed up in the table below:

ActionCommand
Count number of files in a folderls path_to_folder | wc -l
Count number of lines in filecat path_to_file | wc -l
Show last n commands executedhistory | tail -n


Advanced search The find command allows the search of specific files and manipulate them if necessary. The general structure of the command is as follows:

find path_to_folder/. [conditions] [actions]

The possible conditions and actions are summarized in the table below:

CategoryActionCommand
FiltersCertain names, regex accepted-name 'certain_name'
Certain file types (d/f for directory/file)-type certain_type
Certain file sizes (c/k/M/G for B/kB/MB/GB)-size file_size
Opposite of a given condition-not [condition]
ActionsDelete selected files-deletePrint selected files-print

Remark: the flags above can be combined to make a multi-condition search.


Changing permissions The following command enables to change the permissions of a given file (or folder):

chmod (-R) three_digits file

with three_digits being a combination of three digits, where:

Each digit is one of (0, 4, 5, 6, 7), and has the following meaning:

RepresentationBinaryDigitExplanation
---0000No permission
r--1004Only read permission
r-x1015Both read and execution permissions
rw-1106Both read and write permissions
rwx1117Read, write and execution permissions

For instance, giving read, write, execution permissions to everyone for a given_file is done by running the following command:

chmod 777 given_file

Remark: in order to change ownership of a file to a given user and group, we use the command chown user:group file.


Terminal shortcuts The table below summarizes the main shortcuts when working with the terminal:

ActionShortcut
Search previous commandsCtrl + R
Go to beginning / end of lineCtrl + A / Ctrl + E
Remove everything after the cursorCtrl + K
Clear lineCtrl + U
Clear terminal windowCtrl + L


Automating tasks

Create aliases Shortcuts can be added to the ~/.bash_profile file by adding the following line of code:

shortcut="command"


Bash scripts Bash scripts are files whose file name ends with .sh and where the file itself is structured as follows:

#!/bin/bash

... [bash script] ...


Crontabs By letting the day of the month vary between 1-31 and the day of the week vary between 0-6 (Sunday-Saturday), a crontab is of the following format:

  *         *         *         *         *
minute    hour       day      month      day
                   of month            of week


tmux Terminal multiplexing, often known as tmux, is a way of running tasks in the background and in parallel. The table below summarizes the main commands:

CategoryActionCommand
Session managementOpen a new / last existing sessiontmux / tmux attach
Leave current sessiontmux detach
List all open sessionstmux ls
Remove session_nametmux kill-session -t session_name
Window managementOpen / close a windowCtrl + B + C / Ctrl + B + X
Move to $n^{\textrm{th}}$ windowCtrl + B + N


Mastering editors

Vim Vim is a popular terminal editor enabling quick and easy file editing, which is particularly useful when connected to a server. The main commands to have in mind are summarized in the table below:

CategoryActionCommand
File handlingGo to beginning / end of line0 / $
Go to first / last line / $i^{\textrm{th}}$ linegg / G / i G
Go to previous / next wordb / w
Exit file with / without saving changesEsc + :wq / :q!
Text editingCopy line n line(s), where $n\in\mathbb{N}$nyy
Insert n line(s) previously copiedp
SearchingSearch for expression containing name_of_pattern/name_of_pattern
Next / previous occurrence of name_of_patternn / N
ReplacingReplace old with new expressions with confirmation for each changeEsc + :%s/old/new/gc


Jupyter notebook Editing code in an interactive way is easily done through jupyter notebooks. The main commands to have in mind are summarized in the table below:

CategoryActionShortcut
Cell transformationTransform selected cell to text / codeClick cell + m / y
Delete selected cellClick cell + dd
Add new cell below / above selected cellClick cell + b / a
Revert changes to cellClick cell + z