Articles

Enterprises of all sizes are embracing the rapid modernization of user-facing applications as part of their broader digital transformation strategy. The relational database (RDBMS) infrastructure that such applications rely on suddenly needs to support much larger data sizes and transaction volumes. However, a monolithic RDBMS tends to quickly get overloaded in such scenarios. One of the most common architectures to get more performance and scalability in an RDBMS is to “shard” the data. In this blog, we will learn what sharding is and how it can be used to scale a database. We will also review the pros and cons of common sharding architectures, plus explore how sharding is implemented in distributed SQL-based RDBMS like YugaByte DB.

What Is Data Sharding?

Sharding is the process of breaking up large tables into smaller chunks called shards that are spread across multiple servers. A shard is essentially a horizontal data partition that contains a subset of the total data set, and hence is responsible for serving a portion of the overall workload. The idea is to distribute data that can’t fit on a single node onto a cluster of database nodes. Sharding is also referred to as horizontal partitioning. The distinction between horizontal and vertical comes from the traditional tabular view of a database. A database can be split vertically — storing different table columns in a separate database, or horizontally — storing rows of the same table in multiple database nodes.

Source de l’article sur DZONE

See how Spring Boot Batch Application saves an XML to the database and moves error/success files to error/success folder and archives error/success files.

This example covers multiple Spring Batch concepts that we need in most of our daily routine batch job implementations.

Source de l’article sur DZONE

Table inheritance is one of the most misunderstood — and powerful — features of PostgreSQL. With it, certain kinds of hard problems become easy. While many folks who have been bitten by table inheritance tend to avoid the feature, this blog post is intended to provide a framework for reasoning about when table inheritance is actually the right tool for the job.

Table inheritance is, to be sure, a power tool and thus something to use only when it brings an overall reduction in complexity to the design. Moreover, the current documentation doesn’t provide a lot of guidance regarding what the tool actually helps with and where are the performance costs and because inheritance sits orthogonal to relational design, working this out individually is very difficult.

Source de l’article sur DZONE

We got a few requests for some guidance on how to optimize the RavenDB insert rate. Our current benchmark is standing at 135,000 inserts/sec on a sustained basis on a machine that costs less than $1,000. However, some users tried to write their own benchmarks and got far less (about 50,000 writes/sec). Therefore, in this post, I’m going to do a bunch of things and see if I can make RavenDB write really fast.

I’m going to be writing this post as I’m building the benchmark and testing things out. So, you’ll get a stream of consciousness. Hopefully it will make sense.

Source de l’article sur DZONE

It’s easy for modern, distributed, high-scale applications to hide database performance and efficiency problems. Optimizing performance of such complex systems at scale requires some skill, but more importantly it requires a sound strategy and good observability, because you can’t optimize what you can’t measure. This session explains a performance measurement and optimization process anyone can use to deliver results predictably, optimizing customer experience while freeing up compute resources and saving money.

The session begins with what to measure and how; how to analyze it; how to categorize problems into one of three types; and three matching strategies to use in optimization as a result. It is a recursive method that can be used at any scale, from a data center with many types of databases cooperating as one, to a single server and drilling down to a single query. Along the way, we’ll discuss related concepts such as internally- and externally-focused golden signals of performance and resource sufficiency, workload quality of service, and more.

Source de l’article sur DZONE

If you are developing an event-based application that handles many requests from different users, you most likely want to count distinct user action within a sliding window or a specified time range.

One of the quickest ways to count distinct user is to prepare an SQL like SELECT count(distinct user) from ACTION_TABLE. But, this might be expensive if there are millions of records produced in real time.

Source de l’article sur DZONE

See the basics of how to automate database builds into a Linux SQL Server container running on Windows and then back up the containerized database and restore it into dedicated containerized development copies for each developer and tester.

An obvious use for Docker images of SQL Server is to run up a working database from a backup quickly, maybe to test it or possibly to mask the data. We’ll start by doing that in this article. We’ll then use SQL Change Automation (SCA) to synchronize an empty copy of a development database in a Docker container with the latest build in source control and fill it with data ready for testing. Finally, we’ll do a backup of the containerized database so we can restore it into each developer’s local container. These techniques, combined with ‘glue scripts,’ can be used for supporting continuous delivery of databases.

Source de l’article sur DZONE

While there has been a sharp focus and tremendous acceleration in the velocity of application software releases, updates to the underlying database have remained manual and are increasingly a bottleneck to the overall software delivery pipeline. Download this new Refcard to get started with Database Release Automation and eliminate bottlenecks. Learn the key best practices that your DevOps database solution should meet in order for you to get the most out of your investment.
Source de l’article sur DZONE

Prometheus is an open-source infrastructure and services monitoring system popular for Kubernetes and cloud-native services and apps. It can help make metric collection easier, correlate events and alerts, provide security, and do troubleshooting and tracing at scale. This Refcard will teach you how to pave the path for Prometheus adoption, what observability looks like beyond Prometheus, and how Prometheus helps provide scalability, high availability, and long-term storage.
Source de l’article sur DZONE

ArangoDB is a multi-model NoSQL database. NoSQL databases have four types: key-value, column, document, and graph, every kind with specific persistence structures to solve particular problems. ArangoDB covers three NoSQL types: key-value, document, and graph. There is a post that talks about the key-value and document, but this post will explain how to connect with Java and Jakarta EE technology.

The graph has a unique structure that makes it more natural to do a deeper relationship, even more than the relational database technology. The NoSQL Graph database has success cases within the recommendation system, such as that does exist on Social Media and Netflix. This post talks about the graph structure more deeply.

Source de l’article sur DZONE