command line - How does uniq work?

command line - How does uniq work? - Ask Ubuntu

June 15, 2010

this question has answer here:

uniq command not working properly? 3 answers

do not confuse question duplicate of "what difference b/w sort -u , sort | uniq"

this in essence word count program

confusion raised following command reason asking question:

    root@sanctum:~/datascience# cat data      file supposed file

this gives incorrect output:

root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | uniq -c       1       1       1       1 file       1       1       1 supposed       1       1       1       1 file

piping output sort , uniq gives perfect answer-

root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | sort |uniq -c       2       1       2 file       2       1 supposed       1       1       1

output of when piped sort:

root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | sort  a file file is supposed

how line number of appearance of line have effect on count of occurrences in file? i dont know how phrase u point

basically why cant cat data | sed 's/ /\n/g' | uniq -c give required result?

this not random behavior. man uniq:

note: 'uniq' not detect repeated lines unless adjacent. may want sort input first, or use 'sort -u' without 'uniq'. also, comparisons honor rules specified 'lc_collate'.

essentially, uniq default works on sorted input. design, in other words.

your main question is:

how line number of appearance of line have effect on count of occurrences in file

to answer question, you'd have @ source code:

 while (!feof (stdin))     {       char *thisfield;       size_t thislen;       if (readlinebuffer_delim (thisline, stdin, delimiter) == 0)         break;       thisfield = find_field (thisline);       thislen = thisline->length - 1 - (thisfield - thisline->buffer);       if (prevline->length == 0           || different (thisfield, prevfield, thislen, prevlen))         {           fwrite (thisline->buffer, sizeof (char),                   thisline->length, stdout);            swap_lines (prevline, thisline);           prevfield = thisfield;           prevlen = thislen;         }     }

the key here file read line-by-line , comparison can done current , previous line in function different() returns true if lines not same, false if same. reason if compare against all lines, you'd need large amount of memory if there's large number of lines. isn't practical, , slow down uniq considerably

Search This Blog

Primitatvve

command line - How does uniq work? - Ask Ubuntu

Comments

Post a Comment

Popular posts from this blog

crash - Windows Rundll32 (child process of DllHost) is crashing. How can I even identify it? - Super User

16.04 - Errors were encountered while processing in python - Ask Ubuntu

networking - Slave steals IP meant for bonded interface - Ask Ubuntu