Articles

PostgreSQL is an open-source, versatile, and most popular database system around the world. However, it does not have any features for high availability.

Enter Patroni.  Patroni is a cluster manager tool used for customizing and automating deployment and maintenance of high availability PostgreSQL clusters. It is written in Python and uses etcd, Consul, and ZooKeeper as a distributed configuration store for maximum accessibility. In addition, Patroni is capable of handling database replication, backup, and restoration configurations.

Source de l’article sur DZONE

In this article, we will see how to implement a data pipeline from an application to Mongo DB database and from there into an Elastic Search keeping the same document ID using Kafka connect in a Microservice Architecture. In recent days and years, all the microservices architectures are asynchronous in nature and are very loosely coupled. At the same time, the prime approach to have minimum code (minimum maintenance and cost), no batch systems (real-time data), and promising performance without data loss fear. Keeping all the features in mind Kafka and Kafka connect is the best solution so far to integrate different sources and sinks in one architecture to have very robust and reliable results.

We will Depp drive and implement such a solution using Debezium Kafka connect to achieve a very robust pipeline of data from one application into Mongo and then into Elastic cluster.

Source de l’article sur DZONE

Change Data Capture Architecture Using Debezium, Postgres, and Kafka
was a tutorial on how to use Debezium for change data capture from Azure PostgreSQL and send them to Azure Event Hubs for Kafka – it used the wal2json output plugin.

What About the pgoutput Plugin?

This blog will provide a quick walk through of how to pgoutput plugin. I will not be repeating a lot of details and use containerized versions (using Docker Compose) of Kafka connect, Kafka (and Zookeeper) to keep things simple. So, the only thing you need is Azure PostgreSQL, which you can setup using a variety of options including, the Azure Portal, Azure CLI, Azure PowerShell, ARM template.

Source de l’article sur DZONE

Google autocomplete functionality

Whenever you start typing your search on Google, you get a list of recommendations, and the more letters you type, the more accurate the recommendations get. If you’re like me, you’ve always wondered how this works — is the inverted index being stored, or is it something else?

The data structure that would be apt here is a Trie.

Source de l’article sur DZONE

Elasticsearch (ES) is the most common open-source distributed search engine. It’s based on Lucene, an information-retrieval library, and provides powerful search and query capabilities. To learn its search principles, you must understand Lucene. To learn the ES architecture, you must know how to implement a distributed system. Consistency is at the core of distributed systems.

This article describes the ES cluster composition, node discovery, master election, error detection, and scaling. In terms of node discovery and master election, ES uses its own implementation instead of external components such as ZooKeeper. We will describe how this mechanism works, and the problems with it. This series covers:

Source de l’article sur DZONE