Červnové setkání Pražské Czech Java User Group
proběhne 26.6. od 19h v posluchárně S5 na
Matematicko-fyzikální fakultě Karlovy Univerzity na
Malostranském náměstí 25, Praha 1. Čekají nás
prezentace Engineering Machine Learning Algorithms at
Scale (prof. Jan Vitek), Real-time stream data
processing (Zbyněk Šlajchrt). Sponzorem tohoto
setkání je firma AVAST Software. Vstup na akce CZJUGu
je zdarma, a není třeba se předem registrovat. Pokud
se chystáte přijít, dejte nám vědět formou hlasování
v anketě na hlavní stránce portálu java.cz.

Engineering Machine Learning
Algorithms at Scale

The talk will describe how to engineer a scalable
implementation of a popular supervised machine
learning algorithm, Random Forest, so that it can
scale to terabyte data sets. To achieve this I will
describe how to leverage the H20 analytics engine to
write Java distributed Fork/Join code that is
massively scalable and efficient. H20 has an API for
Big Data Math that uses a simple giant vector
programming style that runs in parallel across a
cluster. H20 is able to run on top on infrastructures
like Hadoop or stand alone and has been shown to
scale to 100s of nodes. The H20 project is an open
source effort and so is our implementation of Random
Forest.

Jan Vitek

Jan Vitek is a Professor of Computer Science at
Purdue University, USA. His research career
encompasses work on all aspects of programming
language design and implementation. He lead the
development of the first real-time Java virtual
machine, he worked on language-based security,
concurrency and transactional memory. On the academic
side of his life he chairs SIGPLAN, the ACM Special
Interest Group on Programming Languages and chaired
conferences such as ECOOP, PLDI, COORDINATION and
TOOLS. He was an academic visitor for several years
at IBM and Oracle. He cofounded Fiji Systems to sell
real-time technology and he is currently an advisor
at 0xdata where he works on big data. His most recent
research interests include JavaScript and the R
programming language.

Real-time stream data processing

This presentation deals with the concept of
coroutines and its applicability in the world of
stream data processing. Although it is rarely used in
the todays applications, the coroutines have been
here since the early days of digital computing.
Surprisingly, coroutines can be nicely combined with
the map-reduce paradigm that is used frequently in
the world of cloud computing and big data processing.
In contrast to the traditional map-reduce concept,
which is designed for offline job processing, the
coroutines&map-reduce hybrid is primarily
targeted at real-time event processing. Clockwork, an
open-source library developed at Avast, combines
these two concepts and allows a programmer to write a
real-time stream analysis as if he wrote a
traditional map-reduce job for Hadoop, for instance.
The presentation is focused mainly on coding and
samples and will show how to program applications
ranging from simple real-time statistics to more
advanced tasks.

Zbyněk Šlajchrt

After finishing studies at Faculty of Mathematics
and Physics at the Charles University, he began to
work as Java EE developer and architect at several
Czech and international companies. Now he works at
Avast a.s. and aside his main job he gives lectures
of Java EE programming at the University of
Economics, Prague. In the current position he is
responsible for designing and developing a private
cloud platform and applications build on the Java
platform in AVAST Software.