-Поиск по дневнику

Поиск сообщений в rss_planet_mozilla

 -Подписка по e-mail

 

 -Постоянные читатели

 -Статистика

Статистика LiveInternet.ru: показано количество хитов и посетителей
Создан: 19.06.2007
Записей:
Комментариев:
Написано: 7


Seif Lotfy: Skizze - A probabilistic data-structures service and storage (Alpha)

Пятница, 25 Декабря 2015 г. 12:53 + в цитатник
Skizze - A probabilistic data-structures service and storage (Alpha)

At my day job we deal with a lot of incoming data for our product, which requires us to be able to calculate histograms and other statistics on the data-stream as fast as possible.

One of the best tools for this is Redis, which will give you 100% accuracy in O(1) (except for its HyperLogLog implementation which is a probabilistic data-structure). All in all Redis does a great job.
The problem with Redis for me personally is that, when using it for 100 of millions of counters, I could end up with Gigabytes of memory.

I also tend to use Top-K, which is not implemented in Redis but via Lua scripting can be built on top of the ZSet data-structure. The Top-K data-structure is used to keep track of the top "k" heavy hitters in a stream without having to keep track all "n" flows (k < n), with a O(1) complexity.

Anyhow, dealing with a massive amount of data the interest is most of the time in heavy hitters, that could be estimated while using less memory with an O(1) complexity for reading and writing (that is if you don't care about a count being 124352435 or 124352011 because on the UI of an app you will be showing "over 124 Million").

There are a lot of algorithms floating around and used to solve counting, frequency, membership and top-k problems, which in practice are implemented and used as part of a data-stream pipeline where stuff is counted, merged then stored.

I couldn't find a one-stop-shop service to fire & forget my data at.

Basically in need of a solution where I can set up sketches to answer cardinality, frequency, membership as well as ranking queries about my data-stream (without having to reimplement the algorithm in a pipeline embedded in storm, spark, etc...) led to the development of Skizze (which is in alpha state).

What is Skizze?

Skizze (['sk

http://geekyogre.com/skizze-a-probabilistic-data-structures-service-and-storage/


 

Добавить комментарий:
Текст комментария: смайлики

Проверка орфографии: (найти ошибки)

Прикрепить картинку:

 Переводить URL в ссылку
 Подписаться на комментарии
 Подписать картинку