Articles

Two of the most popular message brokers used today are Kafka and those based around JMS. JMS is a long-standing Java API used generally for developing messaging applications, with its primary function of being able to send messages between two or more clients. Kafka, on the other hand, is a distributed streaming platform that provides a lot of scalabilities and is useful for real-time data processing. 

While both offer their own advantages and are highly useful in their own right, which of the two should you be actually using?

Source de l’article sur DZONE

Presto is a distributed query engine that allows querying different data sources such as Kafka, MySQL, MongoDB, Oracle, Cassandra, Hive, etc. using SQL. It has the ability to analyze big data and query multiple data sources together.

In this article, we will discuss how Presto can be used to query Kafka topics. Below is the step-by-step process to set up Presto and Kafka, and connect them together. Here, I have considered MacOS, but similar setups can be done on any other system.

Source de l’article sur DZONE


Motivation

The problem this tutorial is trying to solve is the lack of a native Fivetran connector for CockroachDB. My customer has built their analytics pipeline based on Fivetran. Given there is no native integration, their next best guess was to set up a Postgres connector:

CockroachDB is PostgreSQL wire compatible, but it is not correct to assume it is 1:1. Let’s attempt to configure the connector:

Source de l’article sur DZONE


Introduction 

In our previous article, we discussed two emerging options for building new-age data pipes using stream processing. One option leverages Apache Spark for stream processing and the other makes use of a Kafka-Kubernetes combination of any cloud platform for distributed computing. The first approach is reasonably popular, and a lot has already been written about it. However, the second option is catching up in the market as that is far less complex to set up and easier to maintain. Also, data-on-the-cloud is a natural outcome of the technological drivers that are prevailing in the market. So, this article will focus on the second approach to see how it can be implemented in different cloud environments.

Kafka-K8s Streaming Approach in Cloud

In this approach, if the number of partitions in the Kafka topic matches with the replication factor of the pods in the Kubernetes cluster, then the pods together form a consumer group and ensure all the advantages of distributed computing. It can be well depicted through the below equation:

Source de l’article sur DZONE

From intrusion detection to threat analysis to endpoint security, the effectiveness of cybersecurity efforts often boils down to how much data can be processed in real-time with the most advanced algorithms and models.

Many factors are obviously involved in stopping cybersecurity threats effectively. However, the databases responsible for processing the billions or trillions of events per day (from millions of endpoints) play a particularly crucial role. High throughput and low latency directly correlate with better insights as well as more threats discovered and mitigated in near real-time. Cybersecurity data-intensive systems are incredibly complex: many span 4+ data centers with database clusters exceeding 1000 nodes and petabytes of heterogeneous data under active management.

Source de l’article sur DZONE

Apache Kafka became the de facto standard for processing data in motion across enterprises and industries. Cybersecurity is a key success factor across all use cases. Kafka is not just used as a backbone and source of truth for data. It also monitors, correlates, and proactively acts on events from real-time and batch data sources to detect anomalies and respond to incidents. This blog series explores use cases and architectures for Kafka in the cybersecurity space, including situational awareness, threat intelligence, forensics, air-gapped and zero trust environments, and SIEM/SOAR modernization. This post is part six: SIEM/SOAR Modernization.

Blog Series: Apache Kafka for Cybersecurity

This blog series explores why security features such as RBAC, encryption, and audit logs are only the foundation of a secure event streaming infrastructure. Learn about use cases,  architectures, and reference deployments for Kafka in the cybersecurity space:

Source de l’article sur DZONE

This blog post explores how event streaming with Apache Kafka provides a scalable, reliable, and efficient infrastructure to make gamers happy and gaming companies successful. Various use cases and architectures in the gaming industry are discussed, including online and mobile games, betting, gambling, and video streaming.

Learn about:

Source de l’article sur DZONE

Spark-streaming can be used to read the data from a source in a streaming fashion. We just have to create a read-stream from the data source and then we can create the write-stream to load the data into a target datasource. 

For this demo, I will assume that we have different JSON payloads coming into a kafka topic that we need to transform and write it to another kafka topic.

Source de l’article sur DZONE

The rise of data in motion in the insurance industry is visible across all lines of business, including life, healthcare, travel, vehicle, and others. Apache Kafka changes how enterprises rethink data. This blog post explores use cases and architectures for event streaming. Real-world examples from Generali, Centene, Humana, and Tesla show innovative insurance-related data integration and stream processing in real-time.

Digital Transformation in the Insurance Industry

Most insurance companies have similar challenges:

Source de l’article sur DZONE