Articles


Previous Articles

I’ve written about Presto and CockroachDB in the past, you may find the article below:

Data federation with CockroachDB and Presto

Source de l’article sur DZONE

With the amount of data produced on a daily basis continuing to rise, so too do the number of data points that companies collect. Apache Iceberg was developed as an open table format to help sift through large analytical datasets.

This Refcard introduces you to Apache Iceberg by taking you through the history of its inception, dives into key methods and techniques, and provides hands-on examples to help you get introduced to the Iceberg community.
Source de l’article sur DZONE

Presto is a distributed query engine that allows querying different data sources such as Kafka, MySQL, MongoDB, Oracle, Cassandra, Hive, etc. using SQL. It has the ability to analyze big data and query multiple data sources together.

In this article, we will discuss how Presto can be used to query Kafka topics. Below is the step-by-step process to set up Presto and Kafka, and connect them together. Here, I have considered MacOS, but similar setups can be done on any other system.

Source de l’article sur DZONE

This guide was developed using a laptop running Windows OS and docker on it.

Implementation Steps

This guide was developed using a laptop running Windows OS and docker on it.

Source de l’article sur DZONE

The need for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data continues to grow explosively. Data platform teams are increasingly using the federated SQL query engine PrestoDB to run such analytics for a variety of use cases across a wide range of data lakes and databases in-place, without the need to move data. PrestoDB is hosted by the Linux Foundation’s Presto Foundation and is the same project running at massive scale at Facebook, Uber and Twitter.

Let’s look at some important characteristics of Presto that account for its growing adoption.  

Source de l’article sur DZONE

Just like a real wormhole, this tool is all about speed.

This blog introduces Wormhole, an open-source Dockerized solution for deploying Presto and Alluxio clusters for blazing-fast analytics on file system (we use S3, GCS, OSS). When it comes to analytics, generally people are hands-on in writing SQL queries and love to analyze data that resides in a warehouse (e.g. MySQL database). But as data grows, these stores start failing and there arises a need for getting the faster results in the same or a shorter time frame. This can be solved by distributed computing and Presto is designed for that. When attached to Alluxio, it works even more, faster. That’s what Wormhole is all about.

You may also enjoy:  Alluxio Cluster Setup Using Docker

Here is the high-level architecture diagram of solution:

Source de l’article sur DZONE