Unit 3: File manipulation


Untar output.tgz and change to the directory output.

Remark: You can loop over several strings in bash using a for loop, e.g. here program1 is run with the arguments bli, bla, ble and blu:
for key in bli bla ble blu; do program1 $key; echo $key; done



You want to re-run the same set of calculations and maybe compare the outcome afterwards.
How would you rename all Files from out-XX-anneal to original-out-XX-anneal

A:
for file in out-*-anneal; do mv -i $file original-$file; done # use echo instead of mv before you really run the loop
If it is present, it is much simpler to use the program prename sometimes just called rename.
prename 's/^/original-/' out-*-anneal .
More often, you would just move the files into a sub-directory: mkdir old-2018_03_01; mv calculation*out old-2018_03_01

Q: Now rename the files back. Use a for loop if you usually have extra time. Else just unpack bashcourse.tgz again.

The most efficient way uses bash variable substitution.
for file in original*; do mv $file ${file#original-} ; done # # use echo instead of mv before use
The simplest way is to use prename (sometimes under the name rename) if it is available:
prename 's/original-//' original-*


All right! Good news, you still get the same results on those calculations, so now you can proceed. The files came from a calculation you ran with the colleague's program. From the output, you are interested in the total Energies in that file.

Q: Pick one of the files. Use a command to print all lines which contain the string "Etotal"

A: grep Etotal filename

Q1: Use awk to do the same thing (Section awk in the cheat sheet).

A1: Using the awk: awk '/Etotal/{print}' filename

Q1: Use "awk" to print only the column of those lines that contain the number of the energy. Awk puts the content of each column into the variables $1 $2 $3 etc.
awk '/Etotal/{print $3}' out-18-anneal

Q2: There is an annoying = in front of the numbers. You can remove this by using gsub in awk (search for gsub in man awk), or you can use the -F option to awk and make = a field seperator using the option -F. This option uses regular expressions to describe the delimiter between columns (you can search in man awk for -F). What do you have to supply after -F to get the column without the "="?
To do this, you can use the option -F'[ =]+'.
Remark on the term '[ =]+' : the ' are quotes to protect the string from the shell. The inside is a regular expression. [] contains a list of characters. Here it is a space and a =. the "+" means that one or more of those characters have to occur for a match.

Now use the option together with the full command to answer Q2. If you have extra time, look up gsub in the awk manual and use it to remove the =.
awk -F'[ =]+' '/Etotal/{print $4}' out-18-anneal # you now have to adjust columns, because the -F option causes an additional column $1.

If you have extra-time: print all lines in which the second column contains the string „Etotal:“ often the same keyword may appear in a different context, but on a different column);
(awk)

Q: Pick one of the output files, extract the numbers and write them into a new file with a prefix "energies-" to the filename.

awk -F'[ =]+' '/Etotal/{print $4}' out-18-anneal > energies-out-18-anneal

Q: Repeat the writing of energies from the last exercise for all outpuf files using a for loop

for file in out*; do awk -F'[ =]+' '/Etotal/{print $4}' $file > energies-$file ; done
Q: now you could plot those energies using gnuplot. In the simplest case, you just do:
gnuplot
gnuplot> plot 'energies-out-18-anneal'

If this is on a terminal, where you cannot get a graphical window, first do
gnuplot> set term dumb

You can also create a PNG grafic file by doing:

gnuplot> set term png
gnuplot> set output "energies-out-18.png"
gnuplot> replot
gnuplot> exit # note that the file is only written when it is closed. The exit closes it in this case.


Q: You want to make the file with the extracted energies available to your colleagues on the cluster, so they can read it from your home directory. How do you do this?

A: Using traditional permissions, you have to chmod a+x $HOME if the file stays in $HOME. a is "all" and x is "execute" and allows anyone to change into the directory at all (but not to read what files are inside it). You have to do this for all directories "on the way" to your file. Additionally, you have to give read permission with "chmod a+r energies-out-18-anneal. Anyone who knows the filename, can now read that file! If you want to give permissions to a single user only, you have to use ACLs. Read "man setfacl" and allow some other user to read a file. You can also use ACLs to give execute permission to the directory only to certain people.

Q: It is half a year later and you have run many, many of those calculations. You have all stored them in sub-directories and sub-sub directories of different depth. There was one very strange one that you want to look up again. It had an Energy of 9998.657 which you wrote down in your notes. It must be an output file starting with "out", not one of the large other files. How do you find that output file? (find)

find . -name 'out*' -type f -exec grep -il 9998.657 '{}' +


If there is still time: Write a bash script that will automate the process of what you have just done to plot the energies from the annealing files producing one png plot file for each output produced by the calculation.