Find duplicate files

Tags: ,

So you know you have a bunch of files all with similar content but different file names strewn all over a directory tree. How would you identify the repetitions so that you can work towards eliminating them? Here is some CLI magic that helps you do so (sourced from CommandLineFu). find -not -empty -type f […]

Continue reading » 1 Comment

Vi – Disable wrapping

Tags: , ,

So your neatly formatted csv file looks all ugly in vi because your screen isn’t wide enough? [Esc]:set nowrap[Enter] There, that will disable wrapping and your file will make sense once again.

Continue reading » 1 Comment

Use binary diff tools like xdelta, bsdiff/bspatch to transfer large files

Tags: , , ,

What do you do if you have to frequently transfer large files over a slow connection as part of your work? Perhaps new builds produced by your corporate build server need to be deployed at a different location for the QA team? I have seen such scenarios causing huge amounts of wasted time for multiple […]

Continue reading » 4 Comments

/dev/random vs /dev/urandom

Tags: , , ,

If you want random data in a Linux/Unix type OS, the standard way to do so is to use /dev/random or /dev/urandom. These devices are special files. They can be read like normal files and the read data is generated via multiple sources of entropy in the system which provide the randomness. /dev/random will block […]

Continue reading » 1 Comment

How to create a file of arbitrary size with shell script commands

Tags: , , , ,

A few days ago I was working on writing Java code to transfer files via SFTP and FTPS. As part of the test cases, I wanted to try and transfer files of small medium and large sizes like 0 byte, 1 byte, 100 bytes, 1000 bytes, 1MB, 10MB. How would one go about creating files […]

Continue reading » 3 Comments

How to split a file, process the pieces in multiple threads and combine results using a shell script

Tags: , , , , , ,

Say you are in a situation where you have a file with a huge number of records to be processed and the processing of one record does not need data from the processing of previous records (ie. a perfectly paralellizable situation), what can you do to speed up things? Well, here’s what I did when […]

Continue reading » 4 Comments