command line - How does uniq work? - Ask Ubuntu
this question has answer here:
- uniq command not working properly? 3 answers
do not confuse question duplicate of "what difference b/w sort -u , sort | uniq"
this in essence word count program
confusion raised following command reason asking question:
root@sanctum:~/datascience# cat data file supposed file
this gives incorrect output:
root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | uniq -c 1 1 1 1 file 1 1 1 supposed 1 1 1 1 file
piping output sort , uniq gives perfect answer-
root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | sort |uniq -c 2 1 2 file 2 1 supposed 1 1 1
output of when piped sort:
root@sanctum:~/datascience# cat data | sed 's/ /\n/g' | sort a file file is supposed
how line number of appearance of line have effect on count of occurrences in file? i dont know how phrase u point
basically why cant cat data | sed 's/ /\n/g' | uniq -c
give required result?
this not random behavior. man uniq
:
note: 'uniq' not detect repeated lines unless adjacent. may want sort input first, or use 'sort -u' without 'uniq'. also, comparisons honor rules specified 'lc_collate'.
essentially, uniq
default works on sorted input. design, in other words.
your main question is:
how line number of appearance of line have effect on count of occurrences in file
to answer question, you'd have @ source code:
while (!feof (stdin)) { char *thisfield; size_t thislen; if (readlinebuffer_delim (thisline, stdin, delimiter) == 0) break; thisfield = find_field (thisline); thislen = thisline->length - 1 - (thisfield - thisline->buffer); if (prevline->length == 0 || different (thisfield, prevfield, thislen, prevlen)) { fwrite (thisline->buffer, sizeof (char), thisline->length, stdout); swap_lines (prevline, thisline); prevfield = thisfield; prevlen = thislen; } }
the key here file read line-by-line , comparison can done current , previous line in function different()
returns true if lines not same, false if same. reason if compare against all lines, you'd need large amount of memory if there's large number of lines. isn't practical, , slow down uniq
considerably
Comments
Post a Comment