Ask me anything.

Categories: Events

Cloudera is eight; Apache Hadoop is ten. Big data has gone from zero to how-did-that-happen huge. The bestiary is bigger than ever, too: new projects like Apache Kudu, Apache Impala (incubating), Apache Kafka and Apache Spark define the future of big data and analytics, extending the core Hadoop platform to handle streaming, real-time and advanced…

Read More

Amplifying Data Sharing as a Force Multiplier for Government Agencies

Categories: Cybersecurity

As I mentioned in my last post, Cloudera has made significant progress in recent years around ensuring data safety and security no matter the location for government agencies. One opportunity for securing data less commonly addressed than crypto or guns/gates/guards is the problem of data movement. Every time data is moved, copied, or replicated a…

Read More

Enhanced Streaming and Machine Learning with Apache Spark 2.0

Categories: Spark

Apache Spark has risen to be the taster’s choice of high-scale distributed computation and solidified itself as the de-facto processing engine in the Apache Hadoop ecosystem. In fact, recently Curt Monash of DBMS2 wrote, “The greatest use for Spark seems to be the same as the canonical first use for MapReduce: data transformation.” But the…

Read More