Natural Order Numerical Sorting

I came across an interesting problem today in the CodeIgniter forums. Suppose you have a dataset like this:

1
1.2
1.2.1
1.2.3
1.2.4
2
5
5
5.1
5.7
11
11.1.2

Obviously, the example cited above is sorted, and sorted correctly. Now lets say these values exist in a random order in a database and we want to retrieve them in the same order as above–sorted ascending. The simple SELECT version FROM test ORDER BY version ASC doesn’t quite do the job.

+---------+
| version |
+---------+
| 1       |
| 1.2     |
| 1.2.1   |
| 1.2.3   |
| 1.2.4   |
| 11      |
| 11.1.2  |
| 2       |
| 5       |
| 5       |
| 5.1     |
| 5.7     |
+---------+

This is because our data is stored as a string, and the database will sort it as such lexicographically. The thinking human, however, wants it sorted numerically. But we cant, it’s a string. “So convert it to a decimal”, you say. Well, that partly works, except, we loose everything after the second period. For example, SELECT version, (version + 0) FROM test ORDER BY (version + 0) ASC will implicitly cast our field to a decimal. But the results will look something like this:

+---------+---------------+
| version | (version + 0) |
+---------+---------------+
| 1       |             1 |
| 1.2.1   |           1.2 |
| 1.2     |           1.2 |
| 1.2.4   |           1.2 |
| 1.2.3   |           1.2 |
| 2       |             2 |
| 5       |             5 |
| 5       |             5 |
| 5.1     |           5.1 |
| 5.7     |           5.7 |
| 11      |            11 |
| 11.1.2  |          11.1 |
+---------+---------------+

The second column is there to show what we’re actually sorting by. As you can see we’re close, but not there. The secret is to sort both numerically AND lexicographically (in that order). The new SQL query will be SELECT version FROM test ORDER BY (version + 0), version ASC. And the results are just what we wanted:

+---------+
| version |
+---------+
| 1       |
| 1.2     |
| 1.2.1   |
| 1.2.3   |
| 1.2.4   |
| 2       |
| 5       |
| 5       |
| 5.1     |
| 5.7     |
| 11      |
| 11.1.2  |
+---------+

Comments

7 Responses to “Natural Order Numerical Sorting”

  1. Ben Van Aerde on May 15th, 2008 3:23 am

    Many thanks for this tip!
    In addition to the howto: it also works when using other characters than a dot. For example: I used dashes, and it still sorted correctly.

  2. Robert on June 27th, 2008 1:58 pm

    Its very interesting but it didnt work for me.

    | 2.1.0.135 | 2.1 |
    | 2.1.0.134 | 2.1 |
    | 2.1.0.133 | 2.1 |
    | 2.1.0.132 | 2.1 |
    | 2.1.0.131 | 2.1 |
    | 2.1.0.130 | 2.1 |
    | 2.1.0.13 | 2.1 | <<<<<<<
    | 2.1.0.129 | 2.1 |
    | 2.1.0.128 | 2.1 |
    | 2.1.0.127 | 2.1 |
    | 2.1.0.126 | 2.1 |
    | 2.1.0.126 | 2.1 |
    | 2.1.0.126 | 2.1 |
    | 2.1.0.126 | 2.1 |

  3. Sean Murphy on June 27th, 2008 5:11 pm

    Robert, unfortunately the technique I outlined only works for two decimals. For each decimal you add the ORDER BY clause must get more and more complex.

  4. Tony on July 2nd, 2008 11:27 pm

    Thank you.

  5. Robbie on July 18th, 2008 2:33 am

    Try this:

    SELECT version FROM test ORDER BY (REPLACE(version , ‘.’, ”) ASC

  6. Robbie on July 18th, 2008 5:37 am

    The previous doesn’t work, sorry.

    I tested this, and it is work fine:

    $c = ($m + 1); //$m = max of level -> $query = “SELECT max(version) FROM test”

    $queryStr = “Round(SUBSTRING_INDEX(version,’.',1))”;

    for ($i = 2; $i < ($c + 1); $i++) {
    $queryStr .= “, Round(RIGHT(SUBSTRING_INDEX(version,’.',”.$i.”),((LENGTH(SUBSTRING_INDEX(version,’.',”.$i.”))-1)-LENGTH(SUBSTRING_INDEX(version,’.',”.($i-1).”)))))”;
    }

    $query = “SELECT version FROM test ORDER BY $queryStr ASC”;

  7. Robbie on July 18th, 2008 6:18 am

    The right maximum level query is:

    “SELECT max(level) FROM test”

    Where:

    1 -> level = 0
    1.1 -> level = 1
    1.1.1 -> level = 2

    and so on

Leave a Reply