开发者

xargs on retaining filename for batch html to text conversion

开发者 https://www.devze.com 2023-02-15 12:36 出处:网络
I\'m converting some html files to text using html2text and want to retain the name of the file name charliesheenwinning.html as charliesheenwinning.txt or even charliesheenwinning.html.txt .

I'm converting some html files to text using html2text and want to retain the name of the file name charliesheenwinning.html as charliesheenwinning.txt or even charliesheenwinning.html.txt .

find ./ -not -regex ".*\(png\|jpg\|gif\)$" -print0 | xargs -0 -L10 {} max-process=0 html2text {} -o ../potistotallywinning/{}.txt

Of course the last part -o is so wrong. How do I retain reusing the filename beyond the first argument to html2text? Can use a for in -exec, but how can I do it with xargs?

update

Ended up doing

find path/to/dir -type f -not -regex ".*\(gif\|png\|jpg\|jpeg\|mov\|pdf\|txt\)$" -print0 | xargs -0 -L10 --max-procs=0 -I {} html2text -o {}.txt {}
mkdir dir/w/textfiles
cp -r path/to/dir dir/w/textfiles
find dir/w/textfiles -type f -not -regex ".*txt$" -print0 | xargs -0 -L10 --max-procs=0 -I {} rm {}

Not the best .. but whatever.. [just in case you were wondering why it isn't just a simpl开发者_JAVA技巧e -name '*html' in the find argument, this was a wget of a mediawiki .. ]


You should try to use basename:

$ man basename


I was facing the same problem – for the record, here's what I came up with to get substition into xargs:

seq 100 | xargs -I % -n 1 -P 16 bash -c 'echo % `sed "s/1/X/" <<< %`'

It will print something like this:

10 X0
3 3
12 X2
4 4
11 X1
1 X
15 X5
0

精彩评论

暂无评论...
验证码 换一张
取 消