Kafka vs. JMS: Which One Should You Be Using?

Actualités, Méthodes et organisation des process IT

Two of the most popular message brokers used today are Kafka and those based around JMS. JMS is a long-standing Java API used generally for developing messaging applications, with its primary function of being able to send messages between two or more clients. Kafka, on the other hand, is a distributed streaming platform that provides a lot of scalabilities and is useful for real-time data processing.

While both offer their own advantages and are highly useful in their own right, which of the two should you be actually using?

Source de l’article sur DZONE

6 juillet 2022/par Service comm.

Querying Kafka Topics Using Presto

Actualités, Méthodes et organisation des process IT

Presto is a distributed query engine that allows querying different data sources such as Kafka, MySQL, MongoDB, Oracle, Cassandra, Hive, etc. using SQL. It has the ability to analyze big data and query multiple data sources together.

In this article, we will discuss how Presto can be used to query Kafka topics. Below is the step-by-step process to set up Presto and Kafka, and connect them together. Here, I have considered MacOS, but similar setups can be done on any other system.

Source de l’article sur DZONE

6 mai 2022/par Service comm.

SaaS Galore: Integrating CockroachDB With Confluent Kafka, Fivetran, and Snowflake

Actualités, Méthodes et organisation des process IT

Motivation

The problem this tutorial is trying to solve is the lack of a native Fivetran connector for CockroachDB. My customer has built their analytics pipeline based on Fivetran. Given there is no native integration, their next best guess was to set up a Postgres connector:

CockroachDB is PostgreSQL wire compatible, but it is not correct to assume it is 1:1. Let’s attempt to configure the connector:

Source de l’article sur DZONE

8 avril 2022/par Service comm.

Next-Gen Data Pipes With Spark, Kafka, and K8s: Part 2

Actualités, Méthodes et organisation des process IT

Introduction

In our previous article, we discussed two emerging options for building new-age data pipes using stream processing. One option leverages Apache Spark for stream processing and the other makes use of a Kafka-Kubernetes combination of any cloud platform for distributed computing. The first approach is reasonably popular, and a lot has already been written about it. However, the second option is catching up in the market as that is far less complex to set up and easier to maintain. Also, data-on-the-cloud is a natural outcome of the technological drivers that are prevailing in the market. So, this article will focus on the second approach to see how it can be implemented in different cloud environments.

Kafka-K8s Streaming Approach in Cloud

In this approach, if the number of partitions in the Kafka topic matches with the replication factor of the pods in the Kubernetes cluster, then the pods together form a consumer group and ensure all the advantages of distributed computing. It can be well depicted through the below equation:

Source de l’article sur DZONE

29 mars 2022/par Service comm.

Three easy ways to run Kafka without Zookeeper

Actualités, Méthodes et organisation des process IT

There has been a couple of years since the announcement of the removal of Apache Zookeeper as a dependency to manage Apache Kafka metadata. Since version 2.8, we now can run a Kafka cluster without Zookeeper. This article will go over three easy ways to get started with a single node cluster using containers.

Control and data planes

Apache Kafka implements independent control and data planes for its clusters. The control plane manages the cluster, keeps track of what brokers are alive, and takes action when the set changes. Meanwhile, the data plane consists of the features required to handle producers and consumers and their records. In the previous iterations, Zookeeper was the cluster component that held most of the implementation of the control plane.

Source de l’article sur DZONE

4 janvier 2022/par Service comm.

Stopping Cybersecurity Threats: Why Databases Matter

Actualités, Méthodes et organisation des process IT

From intrusion detection to threat analysis to endpoint security, the effectiveness of cybersecurity efforts often boils down to how much data can be processed in real-time with the most advanced algorithms and models.

Many factors are obviously involved in stopping cybersecurity threats effectively. However, the databases responsible for processing the billions or trillions of events per day (from millions of endpoints) play a particularly crucial role. High throughput and low latency directly correlate with better insights as well as more threats discovered and mitigated in near real-time. Cybersecurity data-intensive systems are incredibly complex: many span 4+ data centers with database clusters exceeding 1000 nodes and petabytes of heterogeneous data under active management.

Source de l’article sur DZONE

29 décembre 2021/par Service comm.

Apache Kafka in Cybersecurity for SIEM/SOAR Modernization

Actualités, Méthodes et organisation des process IT

Apache Kafka became the de facto standard for processing data in motion across enterprises and industries. Cybersecurity is a key success factor across all use cases. Kafka is not just used as a backbone and source of truth for data. It also monitors, correlates, and proactively acts on events from real-time and batch data sources to detect anomalies and respond to incidents. This blog series explores use cases and architectures for Kafka in the cybersecurity space, including situational awareness, threat intelligence, forensics, air-gapped and zero trust environments, and SIEM/SOAR modernization. This post is part six: SIEM/SOAR Modernization.

Blog Series: Apache Kafka for Cybersecurity

This blog series explores why security features such as RBAC, encryption, and audit logs are only the foundation of a secure event streaming infrastructure. Learn about use cases, architectures, and reference deployments for Kafka in the cybersecurity space:

Source de l’article sur DZONE

25 septembre 2021/par Service comm.

Apache Kafka in the Gaming Industry: Use Cases + Architectures

Actualités, Méthodes et organisation des process IT

This blog post explores how event streaming with Apache Kafka provides a scalable, reliable, and efficient infrastructure to make gamers happy and gaming companies successful. Various use cases and architectures in the gaming industry are discussed, including online and mobile games, betting, gambling, and video streaming.

Learn about:

Source de l’article sur DZONE

19 août 2021/par Service comm.

Transformations of Varying JSON Payloads Using Spark-Streaming

Actualités, Méthodes et organisation des process IT

Spark-streaming can be used to read the data from a source in a streaming fashion. We just have to create a read-stream from the data source and then we can create the write-stream to load the data into a target datasource.

For this demo, I will assume that we have different JSON payloads coming into a kafka topic that we need to transform and write it to another kafka topic.

Source de l’article sur DZONE

18 août 2021/par Service comm.

Apache Kafka in the Insurance Industry

Actualités, Méthodes et organisation des process IT

The rise of data in motion in the insurance industry is visible across all lines of business, including life, healthcare, travel, vehicle, and others. Apache Kafka changes how enterprises rethink data. This blog post explores use cases and architectures for event streaming. Real-world examples from Generali, Centene, Humana, and Tesla show innovative insurance-related data integration and stream processing in real-time.

Digital Transformation in the Insurance Industry

Most insurance companies have similar challenges:

Source de l’article sur DZONE

7 août 2021/par Service comm.

Articles

Motivation

Introduction

Kafka-K8s Streaming Approach in Cloud

Control and data planes

Blog Series: Apache Kafka for Cybersecurity

Digital Transformation in the Insurance Industry