command line - Intergenic Regions? - Ask Ubuntu


i have bed file genes on plus , minus strands. want intergenic regions between genes on positive , negative strands. there way awk command? much!

edit: second , third columns gene start , stop sites. want subtract number in third column , number in preceding second row if there + sign starting bottom. if last column has - sign want start bottom subtract number in second column , number in preceding third row.

desired output

chr1  820983    (860759-820983)    ensg00000269308 591    + chr1   818542     (818542-369634)   ensg00000235249 587   + ch1 738637     (738637-623056)      ensg00000185097 589   - ch1 621546     (621546-140379)      ensg00000237683  586 - 

input

chr1    13885   140379  ensg00000237683 586 - chr1    36854   369634  ensg00000235249 587 +     chr1    621546  623056  ensg00000185097 589 -     chr1    738637  740137  ensg00000269831 590 -     chr1    818542  820983  ensg00000269308 591 +    chr1    860759  875671  ensg00000187634 591 + 

i still don't understand explanation , desired output (why column 1 values chr1 , ch1? why column 2 in first row 820983 rather 860759?), may enough started.

if don't care order of output, simplest thing seem to reverse order of lines in file, keep rolling record of recent column 2 values + , - entries, , print/subtract them when next + or - encountered:

$ tac file.bed | awk ' >   $nf ~ /+/ {if(last2p) print $1, last2p, "(" last2p "-" $3 ")", $4, $5, $6; last2p = $2} >   $nf ~ /-/ {if(last2m) print $1, last2m, "(" last2m "-" $3 ")", $4, $5, $6; last2m = $2} > ' chr1 860759 (860759-820983) ensg00000269308 591 + chr1 738637 (738637-623056) ensg00000185097 589 - chr1 818542 (818542-369634) ensg00000235249 587 + chr1 621546 (621546-140379) ensg00000237683 586 - 

if do care output order, can construct pair of arrays plus , minus say, iterate on them in reverse looking "up" next matching + or -. big one-liner presented here executable awk script:

$ cat chr.awk #!/usr/bin/gawk -f  function fooprint(a,i, j,p,q) {   split(a[i], p);   for(j=i-1;j>0;j--) {     if(j in a) {       split(a[j], q);       print q[1], p[2], "(" p[2] "-" q[3] ")", q[4], q[5], q[6];       break;     }   } }  $nf ~ /+/ {plus[fnr] = $0} $nf ~ /-/ {minus[fnr] = $0}  end {   for(i=nr; i>1; i--) {     if (i in plus)       fooprint(plus,i);     else if (i in minus)       fooprint(minus,i);   } } 

then

$ ./chr.awk file.bed chr1 860759 (860759-820983) ensg00000269308 591 + chr1 818542 (818542-369634) ensg00000235249 587 + chr1 738637 (738637-623056) ensg00000185097 589 - chr1 621546 (621546-140379) ensg00000237683 586 - 

Comments

Popular posts from this blog

download - Firefox cannot save files (most of the time), how to solve? - Super User

windows - "-2146893807 NTE_NOT_FOUND" when repair certificate store - Super User

sql server - "Configuration file does not exist", Event ID 274 - Super User