Produire et consommer des messages Avro avec Redpanda Schema Registry

Actualités, Méthodes et organisation des process IT

Produire et consommer des messages Avro avec Redpanda Schema Registry est une tâche essentielle pour les applications modernes. Découvrez comment le faire facilement!

Si vous êtes familier avec Apache Kafka®, vous avez peut-être rencontré un registre de schémas compatible avec Kafka – un composant distinct que vous déployez en dehors de votre cluster Kafka, car Kafka n’en a pas intégré.

Essentiellement, un schéma est une description logique de la façon dont vos données sont organisées, et donc un registre de schémas fournit un référentiel central pour ces schémas, permettant aux producteurs et aux consommateurs d’envoyer et de recevoir des données entre eux de manière transparente. Pour les architectures orientées événements, cela peut devenir complexe et difficile à gérer à mesure que vous évoluez, car les schémas de données peuvent changer et évoluer au fil du temps (pouvant potentiellement tout casser plus tard).

## Utilisation d’un registre de schémas compatible avec Apache Kafka®

L’architecture Apache Kafka® est bien connue et il est possible de rencontrer un registre de schémas compatible avec Kafka, qui est un composant distinct que l’on déploie en dehors du cluster Kafka, car celui-ci n’en comporte pas.

Essentiellement, un schéma est une description logique de la façon dont vos données sont organisées et un registre de schémas fournit donc un référentiel central pour ces schémas, permettant aux producteurs et aux consommateurs d’envoyer et de recevoir des données entre eux sans heurts. Pour les architectures orientées événements, cela peut devenir complexe et difficile à gérer à mesure que l’on se développe, car les schémas de données peuvent changer et évoluer dans le temps (pouvant potentiellement provoquer des dysfonctionnements plus tard).

Un registre de schémas est donc une solution très pratique pour gérer ce type d’architecture. Il permet aux producteurs et aux consommateurs d’accéder facilement aux schémas des données, ce qui leur permet de s’assurer que les données envoyées et reçues sont cohérentes et conformes. De plus, le registre de schémas permet de conserver l’historique des versions des schémas, ce qui peut être très utile pour le débogage et le développement.

Enfin, le registre de schémas peut également être utilisé pour aider à la validation des données. Les producteurs peuvent envoyer des données à un registre de schémas avant de les envoyer à Kafka, ce qui permet de s’assurer que les données sont conformes aux schémas attendus. De même, les consommateurs peuvent également valider les données reçues avant de les traiter, ce qui permet d’assurer la qualité des données et d’améliorer l’efficacité des processus.

En somme, le registre de schémas est un outil très pratique pour gérer les architectures orientées événements. Il permet aux producteurs et aux consommateurs d’accéder facilement aux schémas des données, ce qui leur permet de s’assurer que les données envoyées et reçues sont cohérentes et conformes. De plus, il permet également d’aider à la validation des données, ce qui permet d’améliorer la qualité et l’efficacité des processus.

Source de l’article sur DZONE

21 avril 2023/par Service comm.

Next-Gen Data Pipes With Spark, Kafka, and K8s: Part 2

Actualités, Méthodes et organisation des process IT

Introduction

In our previous article, we discussed two emerging options for building new-age data pipes using stream processing. One option leverages Apache Spark for stream processing and the other makes use of a Kafka-Kubernetes combination of any cloud platform for distributed computing. The first approach is reasonably popular, and a lot has already been written about it. However, the second option is catching up in the market as that is far less complex to set up and easier to maintain. Also, data-on-the-cloud is a natural outcome of the technological drivers that are prevailing in the market. So, this article will focus on the second approach to see how it can be implemented in different cloud environments.

Kafka-K8s Streaming Approach in Cloud

In this approach, if the number of partitions in the Kafka topic matches with the replication factor of the pods in the Kubernetes cluster, then the pods together form a consumer group and ensure all the advantages of distributed computing. It can be well depicted through the below equation:

Source de l’article sur DZONE

29 mars 2022/par Service comm.

Three easy ways to run Kafka without Zookeeper

Actualités, Méthodes et organisation des process IT

There has been a couple of years since the announcement of the removal of Apache Zookeeper as a dependency to manage Apache Kafka metadata. Since version 2.8, we now can run a Kafka cluster without Zookeeper. This article will go over three easy ways to get started with a single node cluster using containers.

Control and data planes

Apache Kafka implements independent control and data planes for its clusters. The control plane manages the cluster, keeps track of what brokers are alive, and takes action when the set changes. Meanwhile, the data plane consists of the features required to handle producers and consumers and their records. In the previous iterations, Zookeeper was the cluster component that held most of the implementation of the control plane.

Source de l’article sur DZONE

4 janvier 2022/par Service comm.

Migrate Data Across Kafka Cluster Using mirrormaker2 in Strimzi

Actualités, Méthodes et organisation des process IT

In this article, we will discuss a use case where data from one kafka cluster has to be migrated to another Kafka Cluster. Here the target is strimzi and the source is a standalone Kafka cluster. Target means where data has to be copied and the source is from where we want to copy/migrate data. I have an article on how to use mirrormaker with apache kafka clusters about mirrormaker version 1. This article is about mirrormaker 2, which has more features than mirrormaker1.

At the time of writing this article, the latest version of strimzi is 0.22.1 and can be downloaded from here.

Source de l’article sur DZONE

18 mai 2021/par Service comm.

Streaming Data From Files Into Multi-Broker Kafka Clusters

Actualités, Méthodes et organisation des process IT

There are multiple ways to ingest data streams into the Apache Kafka topic and subsequently deliver to various types of consumers who are hooked to the topic. The stream of data that collects continuously from the topic by consumers, passes through multiple data pipelines and then stream processing engines like Apache Spark, Apache Flink, Amazon Kinesis, etc and eventually landed upon the real-time applications to deliver a final data-driven decision. From finances, manufacturing, insurance, telecom, healthcare, commerce, and more, real-time applications are becoming the best solution for organizations to take immediate action, gain insights from the updated data. In the present day, Apache Kafka shapes the central nervous system that brings data from all aspects of the business to the large information operational hubs where choices are made.

The text files contain unformatted ASCII text and are commonly used for the storage of information. Each line of the file represents a data record and can be updated continuously to store. Every insert of a new line or lines on the text file can be considered as new data insertion on the file. Henceforth, every addition of a new line or lines on the text file continuously either by humans or applications (no modification on the already inserted line)and subsequently moves or sends to a different location can be considered as data streaming from the file. Every addition of a new line or row in the text file can be analyzed continuously by exporting the new line/lines to the Kafka topic and importing them by consumers that hooks up with the topic.

Source de l’article sur DZONE

16 janvier 2021/par Service comm.

Articles

Introduction

Kafka-K8s Streaming Approach in Cloud

Control and data planes