Some Handy Unix Commands

Post date: Jun 25, 2013 6:01:56 PM

That's why I need this blog

I was tired of searching the interwebs for these common commands again and again. So, I thought this would be a good way to initiate my blog and keep track of these handy commands.

Opening a `tarball'

a.k.a Uncompress and unarchive a File/Folder

tar xvzf filename.tar.gz

Creating a `tarball'

a.k.a. Compress and archive a File/Folder

tar czf new_archive_name /path/to/file_or_folder

Finding a file

find <path> -name <filename>
  • This command is recursive (will look in sub folders).
  • regex may be used in the filenames.

Symbolic Links

Create Symbolic Links

Navigate to the location where you wish to have the link created.

ln -s path/to/file/or/folder link-name

Create symbolic links for multiple files at a time

cp -s /path/to/files/

Delete Symbolic Links

unlink link-name

Find the number of processors in your unix system.

grep -c "processor" /proc/cpuinfo
  • If your system supports hyper-threading, this will not work. The result will be some multiple of the actual number of processors available.

Find and Replace one liner

sed -i 's/findThis/replaceWith/g' filename
  • `-i' means in place ( on the spot )
  • The command says, "sed" find all instances(/g) of "findThis" and substitute(s/) them with "replaceWith" on the spot(-i) in "filename".

Sort a file in ascending order

sort -g -k # filename
  • `-k' means column number, starting from 1, i.e. there is no column `0'.
  • `-g' means sort the column alphanumerically.

Sort a File in descending order

sort -g -r -k # filename
  • `-k' means column number, starting from 1, i.e. there is no column `0'.
  • `-g' means sort the column alphanumerically.
  • `-r' reverse the order (descending)

Get unique values in a column

sort -u -k # filename
  • `-u' means get unique values only
  • `-k' means column number, starting from 1, i.e. there is no column `0'.

Calculating the sum of a column

awk '{ total += $3 } END {print total}' filename
  • $3 = column 3 (start counting from 1); change 3 to whichever column number you want the average for.
  • total = any variable

Calculating the average of a column

awk '{ total += $3 } END {print total/NR}' filename
  • $3 = column 3 (start counting from 1); change 3 to whichever column number you want the average for.
  • total = any variable
  • NR = number of records.

Sync your folders/directories across servers

From your local machine:

rsync -chavzrP --stats /path/to/local/storage
  • -c, --checksum skip based on checksum, not mod-time & size
  • -h, --human-readable output numbers in a human-readable format
  • -a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)
  • -v, --verbose increase verbosity
  • -z, --compress compress file data during the transfer
  • -r, --recursive recurse into directories
  • -P same as --partial --progress
  • --partial keep partially transferred files
  • --progress show progress during transfer
  • --stats give some file-transfer stats

Download the contents of a webpage to a file

cURL is a software package which consists of command line tool and a library for transferring data using URL syntax.

curl -O
  • -o (lowercase o) the result will be saved in the filename provided in the command line
  • -O (uppercase O) the filename in the URL will be taken and it will be used as the filename to store the result (in the example above, you'll save a file called "somePage.htm")
  • more examples

Recursively md5sum(checksum) all files in a directory

find /path/to/dir/ -name '*.fastq' -type f -execdir md5sum {} \; >> fastq_checksum.md5
  • The snippet above says, "find all the files (-type f) with extension fastq(-name *.fastq) in the `path/to/dir' directory, execute(-exec) the md5sum command on each file and save the checksum in the `fastq_checksum.md5' file
  • `{ }' is a placeholder for the file that is found. When a file is found it replaces this placeholder and executes the command with the file name.
  • The `\' escapes the `;' which indicates the end of the command to be executed.

To check with the actual data

md5sum -c fastq_checksum.md5