awk工作常用技巧

xiaoxiao2021-02-28 92

本文主要是总结下工作中常用的awk场景及方法：

awk -F ‘[. ]’ 指定多个分隔符，指定列(map-value)计数，条件判断只输出unique列

输出行按照step（例如只输出偶数行或者行数为3的倍数）

awk -v 传递参数

awk next使用

awk -F ‘[. ]’ 指定多个分隔符，指定列(map-value)计数，条件判断只输出unique列

awk '{print $1;}' awk_test.txt （默认空格分隔符） 10,15-10-2014,abc 20,12-10-2014,bcd 10,09-10-2014,def awk -F'[,]' '{print $1, $2, $3;}' awk_test.txt # -F[] 分隔符的集合 awk -F, '{print $1, $2, $3;}' awk_test.txt # equivalent to the above 10 15-10-2014 abc 20 12-10-2014 bcd 10 09-10-2014 def awk_test.txt 内容如下 2.gu Qxy 23 4.gui Qxr 21 1.guT QWS 18 awk '{print $1, $2}' awk_test.txt 2.gu Qxy 4.gui Qxr 1.guT QWS # . ‘ ’ space as sperator awk -F'[. ]' '{print $1, $2}' awk_test.txt 2 gu 4 gui 1 guT

根据第一列唯一输出所有项

awk -F, '!seen[$1]++' awk_test.txt 10,15-10-2014,abc 20,12-10-2014,bcd

当然使用sort命令也可以

sort -u -t, -k1,1 awk_test.txt 10,15-10-2014,abc 20,12-10-2014,bcd

按照某两列（一列为key另一列为value）统计计数例如（文本awk_test.txt内容如下）

smiths-Login-2 olivert-Login-10 denniss-Payroll-100 smiths-Time-200 smiths-Logout-10 awk -F '-' '$1 ~ /smiths/ {sum += $3} END {print sum}' awk_test.txt awk -F '-' '$1 == "smiths" {sum += $3} END {print sum}' awk_test.txt

或者查找包含“smiths”

awk 'BEGIN {FS = "-"} ; $1 ~ /^smiths$/ {sum+=$3} END {print sum}'

当然我们也可以统计多有人的统计信息

awk -F '-' '{a[$1] += $3} END{for (i in a) print i, a[i]}' filename.txt

Output:

smiths 212 denniss 100 olivert 10

比如在做深度学习图片分类的时候统计train.txt文件中每个类别的数量

awk -v 传递参数

下面的for循环把文件名当做参数传递给awk内部

real_dir# for f in *.txt;do awk -v f="$f" '{line_num+=1}END{print f": line->"line_num}' $f;done 1.txt: line->9 2.txt: line->10 3.txt: line->10 4.txt: line->3

awk next使用

下面是stack overflow别人的描述

This keyword is often useful when you want to iterate over 2 files; sometimes it’s the same file that you want to process twice. You’ll see the following idiom:

awk ' FNR==NR { < stuff that works on file 1 only > next } { < stuff that works on file 2 only > }' ./infile1 ./infile2

下面给出一个示例：

root@ubuntu:/data/services# cat out1.txt ID Name Telephone 1 John 011 2 Sam 013 root@ubuntu:/data/services# cat out2.txt 1 Test1 Test2 2 Test3 Test4 3 Test5 Test6 root@ubuntu:/data/services# awk '{print $0, "NR: "NR, "FNR: "FNR}' out1.txt out2.txt ID Name Telephone NR: 1 FNR: 1 1 John 011 NR: 2 FNR: 2 2 Sam 013 NR: 3 FNR: 3 1 Test1 Test2 NR: 4 FNR: 1 2 Test3 Test4 NR: 5 FNR: 2 3 Test5 Test6 NR: 6 FNR: 3

比如我们要安装ID相同的合并这两个文件

下面在给出另一篇bolg的简单示例：

cat food_list.txt No Item_Name Price Quantity 1 Mangoes $3.45 5 2 Apples $2.45 25 3 Pineapples $4.45 55 4 Tomatoes $3.45 25 5 Onions $1.45 15 6 Bananas $3.45 30 我们要在Quantity《=20后面打上*号我们可以使用 next，下面的next直接跳过了后面的判断，比较高效 # awk '$4 <= 20 { printf "%s\t%s\n", $0,"*" ; next; } $4 > 20 { print $0 ;} ' food_list.txt No Item_Name Price Quantity 1 Mangoes $3.45 5 * 2 Apples $2.45 25 3 Pineapples $4.45 55 4 Tomatoes $3.45 25 5 Onions $1.45 15 * 6 Bananas $3.45 30

转载请注明原文地址: https://www.6miu.com/read-66140.html

技术

最新回复(0)