Articles

In this article, we will discuss a use case where data from one kafka cluster has to be migrated to another Kafka Cluster. Here the target is strimzi and the source is a standalone Kafka cluster.  Target means where data has to be copied and the source is from where we want to copy/migrate data. I have an article on how to use mirrormaker with apache kafka clusters about mirrormaker version 1. This article is about mirrormaker 2, which has more features than mirrormaker1.

At the time of writing this article, the latest version of strimzi is 0.22.1 and can be downloaded from here.

Source de l’article sur DZONE

You can expose your app to the public by setting up a Kubernetes LoadBalancer service in your IBM Cloud Kubernetes Service cluster. When you expose your app, a Load Balancer for VPC that routes requests to your app is automatically created for you in your VPC outside of your cluster.

In this post, you will provision an IBM Cloud Kubernetes Service cluster spanning two private subnets (each subnet in a different zone), deploy an application using a container image stored in an IBM Cloud Container Registry and expose the app via a VPC load balancer deployed to a public subnet in a different zone. Sound complex? Don’t worry, you will provision and deploy the app using Terraform scripts.

Source de l’article sur DZONE


Introduction

While many of us are habituated to executing Spark applications using the ‘spark-submit’ command, with the popularity of Databricks, this seemingly easy activity is getting relegated to the background. Databricks has made it very easy to provision Spark-enabled VMs on the two most popular cloud platforms, namely AWS and Azure. A couple of weeks ago, Databricks announced their availability on GCP as well. The beauty of the Databricks platform is that they have made it very easy to become a part of their platform. While Spark application development will continue to have its challenges – depending on the problem being addressed – the Databricks platform has taken out the pain of having to establish and manage your own Spark cluster.

Using Databricks

Once registered on the platform, the Databricks platform allows us to define a cluster of one or more VMs, with configurable RAM and executor specifications. We can also define a cluster that can launch a minimum number of VMs at startup and then scale to a maximum number of VMs as required. After defining the cluster, we have to define jobs and notebooks. Notebooks contain the actual code executed on the cluster. We need to assign notebooks to jobs as the Databricks cluster executes jobs (and not Notebooks). Databricks also allows us to setup the cluster such that it can download additional JARs and/or Python packages during cluster startup. We can also upload and install our own packages (I used a Python wheel).

Source de l’article sur DZONE

In the previous blog post about Kubernetes autoscaling, we looked at different concepts and terminologies related to autoscaling such as HPA, cluster auto-scaler, etc. In this post, we’ll do a walkthrough of how Kubernetes autoscaling can be implemented for custom metrics generated by the application.

Why Custom Metrics?

The CPU or RAM consumption of an application may not indicate the right metric for scaling always. For example, if you have a message queue consumer that can handle 500 messages per second without crashing. Once a single instance of this consumer is handling close to 500 messages per second, you may want to scale the application to two instances so that load is distributed across two instances. Measuring CPU or RAM is a fundamentally flawed approach for scaling such an application and you would have to look at a metric that relates more closely to the application’s nature. The number of messages that an instance is processing at a given point in time is a better indicator of the actual load on that application. Similarly, there might be applications where other metrics make more sense and these can be defined using custom metrics in Kubernetes.

Source de l’article sur DZONE

Now, we have everything prepared and ready to go to a Kubernetes Cluster in a cloud provider. It is a fact that creating a cluster in any cloud provider manually is a difficult task. Moreover, if we want to automate this deployment, we need something that helps us in this tedious task. In this article, we will see how to create a Kubernetes Cluster and all of its required objects, deploying our Alexa Skill with Terraform using Google Kubernetes Engine.

Pre-Requisites

Here, you have the technologies used in this project:

Source de l’article sur DZONE


Set Up Kubernetes Cluster

First, we need a Kubernetes cluster. You may use an existing one or set up a new one. For this tutorial, we chose to use GKE (Google Kubernetes Engine).

Just follow the quick start to create a cluster. To save your money, the default pool with only one node is sufficient for our testing. For node image type, use the default Container-Optimized OS; for machine type, please select one with at least 8G mem. After creating the cluster, go ahead to configure kubectl to connect to the cluster following this guide.

Source de l’article sur DZONE

Using the Prometheus Operator has become a common choice when it comes to running Prometheus in a Kubernetes cluster. It can manage Prometheus and Alertmanager for us with the help of CRDs in Kubernetes. The kube-prometheus-stack Helm chart (formerly known as prometheus-operator) comes with Grafana, node_exporter, and more out of the box.

In a previous blog post about Prometheus, we took a look at setting up Prometheus and Grafana using manifest files. We also explored a few of the metrics exposed by YugabyteDB. In this post, we will be setting up Prometheus and Grafana using the kube-prometheus-stack chart. And we will configure Prometheus to scrape YugabyteDB pods. At the end, we will take a look at the YugabyteDB Grafana dashboard that can be used to visualize all the collected metrics.

Source de l’article sur DZONE

In this article, we will see how to implement a data pipeline from an application to Mongo DB database and from there into an Elastic Search keeping the same document ID using Kafka connect in a Microservice Architecture. In recent days and years, all the microservices architectures are asynchronous in nature and are very loosely coupled. At the same time, the prime approach to have minimum code (minimum maintenance and cost), no batch systems (real-time data), and promising performance without data loss fear. Keeping all the features in mind Kafka and Kafka connect is the best solution so far to integrate different sources and sinks in one architecture to have very robust and reliable results.

We will Depp drive and implement such a solution using Debezium Kafka connect to achieve a very robust pipeline of data from one application into Mongo and then into Elastic cluster.

Source de l’article sur DZONE

TiDB, an open-source, distributed SQL database, provides detailed monitoring metrics through Prometheus and Grafana. These metrics are often the key to troubleshooting performance problems in the cluster.

However, for novice TiDB users, understanding hundreds of monitoring metrics can be overwhelming. You may wonder:

Source de l’article sur DZONE