Linux sort is a handy Linux command line tool to sort text files. It can sort fairly large files without consuming too much memory. The sorting behaviour can change depending upon the localisation setting set using LC_ALL environment variable.
Find/change sorting rule on Linux
To find the sorting rule set on your Linux system run the following:
$ echo 1 | sort --debug sort: using ‘en_US.UTF-8’ sorting rules 1 _
sort in debug mode tells us which sorting rule is being used. Above result may vary on your system depending upon the environment variables set.
Now run the following to set LC_ALL=C and then see the sorting rule used by sort:
$ LC_ALL=C; echo 1 | sort --debug sort: using simple byte comparison 1 _
Here are some examples with different LC_ALL localisation values:
Traditional sort using byte values
To do the tradition sort set LC_ALL=C before sort. Here is an example using this:
$ LC_ALL=C; printf "a 4\na3\n" | sort a 4 a3
Here since space comes before number 3 in simple byte camparison, The output has “a 4” before “a3”.
sort using en_US.UTF-8
This is usually default value of LC_ALL on Ubuntu Linux. So you may end up doing sort using this. But we’ll set it explicitly.
$ LC_ALL=en_US.UTF-8; printf "a 4\na3\n" | sort a3 a 4
Notice the different outcome this time. Using en_US.UTF-8 causes sort to ignore spaces. For this reason sort treats “a 4” as “a4” and hence the above outcome.
It is a good idea to review what sorting rule sort is using on your system before using it to avoid surprises.