credmp

Ramblings of a code junkie

January 16, 2007

C++ lexicographical_compare and locales

Written by
core

While doing a bit of research into sorting strings in C++ (a seemingly easy task) I ran into various issues. For one, locales. If you have a string that uses ‘\334′ or ü as it is in human readable form, it is not taken into account by the normal case insensitive sorting algorithms.

I found a paper, “How to do case-insensitive string comparison” written by “Matt Austern“. He provides a good explanation and a sound solution to the problem. However, I prefer to utilize the C++ data encapsulation paradigms and have re-written the code to be more readable / usable (my opinion anyways;). All classes have moved to their own header files, private members are actually private and you can now use the code in various places in your code since the header files are guarded.

The code is seperated over 4 files:

lt_char.hpp

#ifndef LT_CHAR_H
#define LT_CHAR_H

class lt_char {
public:
    lt_char(const char *table);

    bool operator()(char x, char y) const;
private:
    const char *char_table;
};

#endif // LT_CHAR_H

lt_char.cxx

#include "lt_char.hpp"
#include 

lt_char::lt_char(const char *table) : char_table(table) {
}

bool lt_char::operator()(char x, char y) const {
    return char_table[x - CHAR_MIN] < char_table[y - CHAR_MIN];
}

lt_string.hpp

#ifndef LT_STRING_H
#define LT_STRING_H

#include 
#include 

class lt_string : public std::binary_function {
public:
    lt_string(const std::locale &locale = std::locale::classic());

    bool operator()(const std::string &x, const std::string &y) const;
private:
    char char_table[CHAR_MAX - CHAR_MIN + 1];
};

#endif // LT_STRING_H

lt_string.cxx

#include "lt_string.hpp"
#include "lt_char.hpp"

lt_string::lt_string(const std::locale &locale) {
    const std::ctype &ctype_char = std::use_facet< std::ctype >(locale);

    for (int i = CHAR_MIN; i <= CHAR_MAX; ++i) {
        char_table[i - CHAR_MIN] = (char) i;
    }

    ctype_char.toupper(char_table, char_table + (CHAR_MAX - CHAR_MIN + 1));
}

bool lt_string::operator()(const std::string &x, const std::string &y) const {
    return std::lexicographical_compare(x.begin(), x.end(),
                                        y.begin(), y.end(),
                                        lt_char(char_table));
}
Zarro Taags!

Leave a Comment