command line - How does uniq work? - Ask Ubuntu


this question has answer here:

do not confuse question duplicate of "what difference b/w sort -u , sort | uniq"

this in essence word count program

confusion raised following command reason asking question:

    root@sanctum:~/datascience# cat data      file supposed file 

this gives incorrect output:

root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | uniq -c       1       1       1       1 file       1       1       1 supposed       1       1       1       1 file 

piping output sort , uniq gives perfect answer-

root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | sort |uniq -c       2       1       2 file       2       1 supposed       1       1       1 

output of when piped sort:

root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | sort  a file file is supposed 

how line number of appearance of line have effect on count of occurrences in file? i dont know how phrase u point

basically why cant cat data | sed 's/ /\n/g' | uniq -c give required result?

this not random behavior. man uniq:

note: 'uniq' not detect repeated lines unless adjacent. may want sort input first, or use 'sort -u' without 'uniq'. also, comparisons honor rules specified 'lc_collate'.

essentially, uniq default works on sorted input. design, in other words.

your main question is:

how line number of appearance of line have effect on count of occurrences in file

to answer question, you'd have @ source code:

 while (!feof (stdin))     {       char *thisfield;       size_t thislen;       if (readlinebuffer_delim (thisline, stdin, delimiter) == 0)         break;       thisfield = find_field (thisline);       thislen = thisline->length - 1 - (thisfield - thisline->buffer);       if (prevline->length == 0           || different (thisfield, prevfield, thislen, prevlen))         {           fwrite (thisline->buffer, sizeof (char),                   thisline->length, stdout);            swap_lines (prevline, thisline);           prevfield = thisfield;           prevlen = thislen;         }     } 

the key here file read line-by-line , comparison can done current , previous line in function different() returns true if lines not same, false if same. reason if compare against all lines, you'd need large amount of memory if there's large number of lines. isn't practical, , slow down uniq considerably


Comments

Popular posts from this blog

download - Firefox cannot save files (most of the time), how to solve? - Super User

windows - "-2146893807 NTE_NOT_FOUND" when repair certificate store - Super User

sql server - "Configuration file does not exist", Event ID 274 - Super User