text processing - How to do this in a single command on Ubuntu 16.04?

text processing - How to do this in a single command on Ubuntu 16.04? - Ask Ubuntu

March 15, 2015

i have file of urls, in format shown below:

com.blendtuts/s °= com.blengineering.www/:http ±= com.blenheimgang.www/le-porsche-museum-en-details/porsche-museum-3 ²= com.blenheimsi ³= com.blenkov.www/page/media/18/34/376 ´= com.blentwell.www/bookmarks.php/jackroldan/sp ¸= com.blentwell.www/tags.php/i

the file size in gigabytes. around 250 gb of file size.

i trying reverse words in file , extract domains text. tried make using ubuntu os terminal commands. let me tell have tried:

first removed data after “/” using following command:

~$ ex -sc '%s/\(\/\).*/\1/ | x' newfile.txt > ddm.txt

and result as:

com.blendtuts/  °= com.blengineering.www/ ±= com.blenheimgang.www/ ²= com.blenheimsi ³= com.blenkov.www/ ´= com.blentwell.www/ ¸= com.blentwell.www/

now reversed complete text in file using solution : https://stackoverflow.com/questions/40467918/how-to-reverse-the-word-in-ubuntu

and got following result:

    /blendtuts.com     °= /www.blengineering.com     ±= /www.blenheimgang.com     ²= blenheimsi.com     ³= /www.blenkov.com     µ=  /www.blentwell.com     ¶=  /www.blentwell.com     •=  /www.blentwell.com  /www.blentwell.com

but still problem not solved. how possible extract urls , put them file using ubuntu. can see above output still have not domain, has backslash it.

if there solution such problem using other operating system, let me know. prefer go ubuntu.

i extract domains out of file , separate them file , in proper format.

if unique domain excellent solution query. otherwise, using command as:

$ sort filename.txt | uniq > save_to_file.txt

please try not give me solution using awk command, not work on system.

sample data:

com.blendschutzrollo.www/d_chefsessel6_maxx_chefsessel_mit_kopfstutze_chefdrehsessel___munchen__374 ¯=  com.blendtuts/s °=  com.blengineering.www/:http ±=  com.blenheimgang.www/le-porsche-museum-en-details/porsche-museum-3 ²=  com.blenheimsi ³=  com.blenkov.www/page/media/18/34/376 ´=  com.blenoir.www/lat µ=  com.blentwell.www/bookmarks.php/bashment%20jack/re ¶=  com.blentwell.www/bookmarks.php/djcable/rt ·=  com.blentwell.www/bookmarks.php/jackroldan/sp ¸=  com.blentwell.www/tags.php/i ¹=  com.blentwell.www/tags.php/eurot º=  com.blentwell.www/tags.php/mitarbeiters »=  com.blentwell.www/tags.php/verw ¼=  com.blenzblog/tag/olympic-w ½=  com.blepharoplastyusa.www/albany-n ¾=

a perl solution, adapting one of string reversal solutions:

$ perl -f/ -anle 'print reverse(split("([^.]*)", $f[0])) if /\./' input www.blendschutzrollo.com blendtuts.com www.blengineering.com www.blenheimgang.com blenheimsi.com www.blenkov.com www.blenoir.com www.blentwell.com www.blentwell.com www.blentwell.com www.blentwell.com www.blentwell.com www.blentwell.com www.blentwell.com blenzblog.com www.blepharoplastyusa.com

the arguments:

-f/ -a creates array f out of each line of input, splitting on /.
-nle runs expression (-e <expr>) on each line of input, without automatically printing (-n), while handling newline @ end of each line (-l)
the line split on /, , need part before first /, first element of array f: $f[0]. split on . , reverse each of those, , print if line contains ..

now can sort -u this.

Search This Blog

Primitatvve

text processing - How to do this in a single command on Ubuntu 16.04? - Ask Ubuntu

Comments

Post a Comment

Popular posts from this blog

download - Firefox cannot save files (most of the time), how to solve? - Super User

windows - "-2146893807 NTE_NOT_FOUND" when repair certificate store - Super User

sql server - "Configuration file does not exist", Event ID 274 - Super User