Project Metamorphosis: Unveiling the next-gen event streaming platform.Learn More

ApacheCon 2015

I was at ApacheCon 2015 in Austin, Texas a couple of weeks ago. The following is a short summary of some of the trends that I observed at the conference.

There is a lot of usage and interest in Apache Kafka.

I gave a talk on “Introduction to Apache Kafka Tuesday afternoon. About a quarter of the attendees went to the talk and the room was full. We also hosted a Kafka meetup on Tuesday evening for conference attendees. I provided an overview of the Confluent Platform 1.0 release. In particular, I described Schema Registry and how it can be used with Kafka to captured structured feeds of stream data. Most participants seem to understand the value of having schemas and a well defined policy for schema evolution. On Wednesday morning, Todd Palino from LinkedIn gave a well received talk on Kafka at Scale: a multi-tiered architecture, in which he shared lots of valuable experience of running Kafka in production.

Tony Ng from Ebay gave a talk on Pulsar: Realtime Analytics at Scale leveraging Kafka, Hadoop and Kylin. Pulsar is an open source real time analytics and stream processing framework. Its typical usage includes reporting/dashboards, business activity/monitoring, personalization, marketing/advertising, fraud and bot detection. Currently, it’s processing about 0.5 million events/sec at Ebay. Similar to other stream processing systems, Pulsar supports getting data from different channels including Kafka, file, http, etc. It also has a pluggable processing layer. Pulsar supports SQL on stream data through Esper. One key distinguishing feature I found is that Pulsar integrates with Druid and Apache Kylin, a Hadoop based OLAP engine to build cubes.

There is a growing awareness and interest in the space of realtime data ingestion.

Joe Will gave a talk on Apache NiFi: Better Analytics Demands Better Dataflow. Apache Nifi is a new incubator project and was originally developed at the NSA. In short, it is a data flow management system similar to Apache Camel and Flume. It’s mostly intended for getting data from a source to a sync. It can do light weight processing such as enrichment and conversion, but not heavy duty ETL. One unique feature of Nifi is its built-in UI, which makes the management and the monitoring of the data flow convenient. The whole data flow pipeline can be drawn on a panel. The UI shows statistics such as in/out byte rates, failures and latency in each of the edges in the flow. One can pause and resume the flow in real time in the UI. Nifi’s Architecture is also a bit different from Camel and Flume. There is a master node and many slave nodes. The slaves are running the actual data flow and the master is for monitoring the slaves. Each slave has a web server, a flow controller (thread pool) layer, and a storage layer. All events are persisted to a local content repository. It also stores the lineage information in a separate governance repository, which allows it to trace at the event level. Currently, the fault tolerance story in Nifi is a bit weak. The master is a single point of failure. There is also no redundancy across the slaves. So, if a slave dies, the flow stops until the slave is brought up again.

There are lots of activities in container-based resource management.

Christos Kozyrakis gave a talk on Apache Mesos in which he outlined some of the new development in Mesos. One can now do resource allocation based on the framework roles. For example, one can configure to only give SSDs to database services. It seems that there is a Mesos DNS for service discovery. Christos also showed the results of an interesting analysis. Even with the usage of Mesos, in a data center, typical server cpu utilization is only about 30% most of the time. The reason is due to the curse of over-provisioning. Mesos is trying to improve the server utilization by reporting unused resources to the master as “best effort” resources. Those resources can then be used for low priority tasks. If not done carefully, such an approach may have negative performance impact. For example, the low priority task can pollute L3 cache and cause the latency to increase in realtime applications. Mesos tries to address this problem by using isolators at different levels.

In summary, Kafka had great presence in ApacheCon this year. It’s great to see a lot of interest in Kafka and what Confluent is building. If you like working on Kafka and are interested in helping us build realtime stream processing technologies at Confluent, please let us know.

Did you like this blog post? Share it now

Subscribe to the Confluent blog

More Articles Like This

Getting Started with Protobuf in Confluent Cloud

Confluent Cloud supports Schema Registry as a fully managed service that allows you to easily manage schemas used across topics, with Apache Kafka® as a central nervous system that connects […]

Walmart’s Real-Time Inventory System Powered by Apache Kafka

Consumer shopping patterns have changed drastically in the last few years. Shopping in a physical store is no longer the only way. Retail shopping experiences have evolved to include multiple […]

Broadcom Modernizes Machine Learning and Anomaly Detection with ksqlDB

Mainframes are still ubiquitous, used for almost every financial transaction around the world—credit card transactions, billing, payroll, etc. You might think that working on mainframe software would be dull, requiring […]

Jetzt registrieren

Erhalten Sie in den ersten drei Monaten einen Rabatt von bis zu 50 USD pro Kalendermonat auf Ihre Rechnung.

Nur neue Registrierungen.

By clicking “sign up” above you understand we will process your personal information in accordance with our und bin damit einverstanden.

Indem Sie oben auf „Registrieren“ klicken, akzeptieren Sie die Nutzungsbedingungen und den gelegentlichen Erhalt von Marketing-E-Mails von Confluent. Zudem ist Ihnen bekannt, dass wir Ihre personenbezogenen Daten gemäß unserer und bin damit einverstanden.

Auf einem einzigen Kafka Broker unbegrenzt kostenlos verfügbar

Die Software ermöglicht die unbegrenzte Nutzung der kommerziellen Funktionen auf einem einzelnen Kafka Broker. Nach dem Hinzufügen eines zweiten Brokers startet automatisch ein 30-tägiger Timer für die kommerziellen Funktionen, der auch durch ein erneutes Herunterstufen auf einen einzigen Broker nicht zurückgesetzt werden kann.

Wählen Sie den Implementierungstyp aus
Manual Deployment
  • tar
  • zip
  • deb
  • rpm
  • docker
Automatische Implementierung
  • kubernetes
  • ansible

By clicking "download free" above you understand we will process your personal information in accordance with our Datenschutzerklärung zu.

Indem Sie oben auf „kostenlos herunterladen“ klicken, akzeptieren Sie die Confluent-Lizenzvertrag und den gelegentlichen Erhalt von Marketing-E-Mails von Confluent. Zudem erklären Sie sich damit einverstanden, dass wir Ihre personenbezogenen Daten gemäß unserer Datenschutzerklärung zu.

Diese Website verwendet Cookies zwecks Verbesserung der Benutzererfahrung sowie zur Analyse der Leistung und des Datenverkehrs auf unserer Website. Des Weiteren teilen wir Informationen über Ihre Nutzung unserer Website mit unseren Social-Media-, Werbe- und Analytics-Partnern.