Merge CSV files keeping one header

Merge CSV files keeping one header

May 19, 2015 09:36 1 comment
 
10 Kudos
Don't
move!

There are typical situations in life that require lots of engineering to be addressed. Other situations don’t. This is exactly one of those commands you may want to learn by heart but, for those who can’t, that’s why blogs exist.

Suppose you have collected statistics for several days and you have lots of files having the same structure, e.g.:

cola,colb,colc
1,2,3
1,3,5
4,3,1
...

and you want to create one big file that keeps the header from just the first one and appends all the files so that you can quickly load it into your favourite tool for further analysis.
Here is the trick:

Explanation:
FNR is the number of records read by awk (per file), NR is the number of lines read overall. Therefore, the condition FNR==1 && NR>1 evaluates to true only if the line being evaluated is the first line of the file and we already read at least one line overall (so it is not the first line we read). What happens when it is true? {next;}, hence the line gets ignored.

Warning
I am pretty sure that some of you just came here, copied&paste the line above, tested and found out that the command never ends. Well, if this is the case the file you generated is probably taking all of your free disk space. Yummie. Why? Probably because the output file already existed before and now it matches the wildcard condition of awk. So, make sure that the output file doesn’t exist when executing the command or, at least, doesn’t end in your shell expansion.

  • I just did the python altranetive to thisfp = open(“file.txt”, “rU”)lines = fp.readlines()fp.close()f_f=” “for line in lines: f=line.split(“,”) if f[0]==f_f: print “,”+”,”.join(f[1:]).rstrip() else: f_f=f[0] print line.rstrip()