Sometime small small hacks saves a lot of our time. The regexes used here can be changed as per need. In some cases try alternatively with single/double quote if dont work.
Count the number of times a specific character appears in each line
- This counts the number of quotation marks in each line and prints it
perl -ne ‘$cnt = tr/”//;print “$cnt\n“‘ inputFileName.txt
Add string to beginning of each line
- Adds string(slc) to each line,
perl -pe “s/(.*)/slcn\$1/” in.txt > out.txt
Add string to end of each line
- Append a string(us.o.com) to each line
perl -pe “s/(.*)/\$1.us.o.com/” in.txt > out.txt
Print only alternate values in a list
-
Sometime froma list we have to print only alternate values, We can use the special $| of perl which stores onlt 1 or 0, Below will print the alternate from a list (a..z ) starting with a.perl -E ‘say grep –$|, a..z’
Print only some columns of a file
- Columns separated by a space
cut fileWithLotsOfColumns.txt -d” “ -f 1,2,3,4 > fileWithOnlyFirst4Cols.txt
Print all columns except the first
-
cut -d” “ -f 1 –complement filename > filename.
Replace a pattern with another one inside the file with backup
- Replace all occurrences of pattern1 (e.g. [0-9]) with pattern2
perl -p -i.bak -w -e ‘s/pattern1/pattern2/g’ inputFile
Print only non-uppercase letters
- Go through file and only print words that do not have any uppercase letters.
perl –ne ‘print unless m/[A-Z]/’ allWords.txt > allWordsOnlyLowercase.txt
Print one word per line
- Go through file, split line at each space and print words one per line.
perl –ne ‘print join(“\n”, split(/ /,$_));print(“\n”)’ someText.txt > wordsPerLine.txt
Kill all screen sessions (no remorse)
- Since there’s no screen command that would kill all screen sessions regardless of what they’re doing, here’s a perl one-liner that really kills ALL screen sessions without remorse.
screen -ls | perl –ne ‘/(\d+)\./;print $1’ | xargs -l kill –9
- The killall command may also do the job…
Return all unique words in a text document (divided by spaces), sorted by their counts (how often they appear)
- assuming no punctuation marks:
perl -ne ‘print join(“\n“, split(/\s+/,$_));print(“\n“)’ documents.txt > wordsOnePerLine.txt
cat wordsOnePerLine.txt | sort | uniq -c | sort -n > wordCountsSorted.txt
Delete all special characters
- delete every character that is not a letter, white space or line end (replace with nothing)
perl -pne ‘s/[^a-zA-Z\s]*//g’ text_withSpecial.txt > text_lettersOnly.txt
Lower case everything
-
perl -pne ‘tr/[A-Z]/[a-z]/’ textWithUpperCase.txt > textwithoutuppercase.txt;
Combine lower-casing with word counting and sorting
-
perl -pne ‘tr/[A-Z]/[a-z]/’ sentences.txt | perl –ne ‘print join(“\n”, split(/ /,$_));print(“\n”)’ | sort | uniq -c | sort -n
Print only one column
- Print only the second column of the data when using tabular as a separator
perl –ne ‘@F = split(“\t”, $_); print “$F[1]”;’ columnFileWithTabs.txt > justSecondColumn.txt
Print only text between tags
-
perl –ne ‘if (m/\<a\>(.*?)\<\/a\>/g){print “$1\n”}’ textFile
- Extracting multiple multiline patterns between a start and an end tag
- Here, we want to extract everything between <parse> and </parse>.
-
#!/usr/bin/perl -w
local $/;open(DAT, “yourFile.xml”) || die(“Could not open file!”);
my $content = <DAT>;while ($content =~ m/<parse>(.*?)<\/parse>/sg){
print “$1\n“
};
Sort lines by their length
-
perl -e ‘print sort {length $a <=> length $b} <>’ textFile
Print second column, unless it contains a number
-
perl -lane ‘print $F[1] unless $F[1] =~ m/[0-9]/’ wordCounts.txt
Trim/ Collapse white spaces and replace new lines by something else
-
echo “The cat sat on the mat
asd sad das “ | perl –ne ‘s/\n/ /; print $_; print(“;”)’ | perl –ne ‘s/\s+/ /g; print $_’
Get the average of one column from certain lines
-
grep “another criterion” thisDataFile.txt | perl –ne ‘@F = split(“,”, $_); print “$F[29]\n”;’ | awk ‘{sum+=$1} END { print “Average = “,sum/NR}’
How to sort a file by a column
- Columns are separated by a space, we sort numerically (-n) and we sort by the 10’th column (-k10)
-
sort -t’ ‘ -n -k10 eSet1_both.txt
Replace specific space but also copy a group of matches
- matches a group of numbers in the beginning of a line
-
perl -p -i.bak -w -e ‘s/^([0-9]+) “/$1\t”/g’ someFile.txt