-Поиск по дневнику

Поиск сообщений в rss_thedaily_wtf

 -Подписка по e-mail

 

 -Постоянные читатели

 -Статистика

Статистика LiveInternet.ru: показано количество хитов и посетителей
Создан: 06.04.2008
Записей:
Комментариев:
Написано: 0


CodeSOD: ImAlNumb?

Среда, 11 Сентября 2019 г. 09:30 + в цитатник

I think it’s fair to say that C, as a language, has never had a particularly great story for working with text. Individual characters are okay, but strings are a nightmare. The need to support unicode has only made that story a little more fraught, especially as older code now suddenly needs to support extended characters. And by “older” I mean, “wchar was added in 1995, which is practically yesterday in C time”.

Lexie inherited some older code. It was not designed to support unicode, which is certainly a problem in 2019, and it’s the problem Lexie was tasked with fixing. But it had an… interesting approach to deciding if a character was alphanumeric.

Now, if we limit ourselves to ASCII, there are a variety of ways we could do this check. We could convert it to a number and do a simple check- characters 48–57 are numeric, 65–90 and 97–122 cover the alphabetic characters. But that’s a conditional expression- six comparison operations! So maybe we should be more clever. There is a built-in library function, isalnum, which might be more optimized, and is available on Lexie’s platform. But we’re dedicated to really doing some serious premature optimization, so there has to be a better way.

bool isalnumCache[256] =
{false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true, false, false, false, false, false, false,
false,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true, false, false, false, false, false,
false,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true,  true, true, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,
false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false};

This is a lookup table. Convert your character to an integer, and then use it to index the array. This is fast. It’s also error prone, and this block does incorrectly identify a non-alphanumeric as an alphanumeric. It also 100% fails if you are dealing with wchar_t, which is how Lexie ended up looking at this block in the first place.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!

https://thedailywtf.com/articles/imalnumb

Метки:  

 

Добавить комментарий:
Текст комментария: смайлики

Проверка орфографии: (найти ошибки)

Прикрепить картинку:

 Переводить URL в ссылку
 Подписаться на комментарии
 Подписать картинку