Ankaa Pmo®

Articles

Getting Started With Apache Iceberg

Actualités, Méthodes et organisation des process IT

With the amount of data produced on a daily basis continuing to rise, so too do the number of data points that companies collect. Apache Iceberg was developed as an open table format to help sift through large analytical datasets.

This Refcard introduces you to Apache Iceberg by taking you through the history of its inception, dives into key methods and techniques, and provides hands-on examples to help you get introduced to the Iceberg community.
Source de l’article sur DZONE

27 juillet 2022/par Service comm.

No-Code: »It’s a Trap! »

Actualités, Méthodes et organisation des process IT

Gartner predicts that by 2023, over 50% of medium to large enterprises will have adopted a Low-code/No-code application as part of their platform development.

The proliferation of Low-code/No-code tooling can be partially attributed to the COVID-19 pandemic, which has put pressure on businesses around the world to rapidly implement digital solutions. However, adoption of these tools — while indeed accelerated by the pandemic — would have occurred either way.

Even before the pandemic, the largest, richest companies had already formed an oligopsony around the best tech talent and most advanced development tools. Low-Code/No-code, therefore, is an attractive solution for small and mid-sized organizations to level the playing field, and it does so by giving these smaller players the power to do more with their existing resources.

While these benefits are often realized in the short term, the long-term effect of these tools is often shockingly different. The promise of faster and cheaper delivery is the catch — or lure — inside this organizational mousetrap, whereas backlogs, vendor contracts, technical debts, and constant updates are the hammer.

So, what exactly is the No-Code trap, and how can we avoid it?

What is a No-Code Tool?

First, let’s make sure we clear up any confusion regarding naming. So far I have referred Low-Code and No-Code as if they were one term. It’s certainly easy to confuse them — even large analyst firms seem to have a hard time differentiating between the two — and in the broader context of this article, both can lead to the same set of development pitfalls.

Under the magnifying glass, however, there are lots of small details and capabilities that differentiate Low-code and No-code solutions. Most of them aren’t apparent at the UI level, leading to much of the confusion between where the two come from.

In this section, I will spend a little bit of time exploring the important differences between those two, but only to show that when it comes to the central premise of this article they are virtually equivalent.

Low-Code vs. No-Code Tools

The goal behind Low-Code is to minimize the amount of coding necessary for complex tasks through a visual interface (such as Drag ‘N’ Drop) that integrates existing blocks of code into a workflow.

Skilled professionals have the potential to work smarter and faster with Low-Code tools because repetitive coding or duplicating work is streamlined. Through this, they can spend less time on the 80% of work that builds the foundation and focuses more on optimizing the 20% that makes it different. It, therefore, takes on the role of an entry-level employee doing the grunt work for more senior developers/engineers.

No-Code has a very similar look and feel to Low-Code, but is different in one very important dimension. Where Low-Code is meant to optimize the productivity of developers or engineers that already know how to code (even if just a little), No-Code is built for business and product managers that may not know any actual programming languages. It is meant to equip non-technical workers with the tools they need to create applications without formal development training.

No-Code applications need to be self-contained and everything the No-Code vendor thinks the user may need is already built into the tool.

As a result, No-Code applications create a lot of restrictions for the long-term in exchange for quick results in the short-term. This is a great example of a ‘deliberate-prudent’ scenario in the context of the Technical Debt Quadrant, but more on this later.

Advantages of No-Code Solutions

The appeal of both Low-Code and No-Code is pretty obvious. By removing code organizations can remove those that write it — developers — because they are expensive, in short supply, and fundamentally don’t produce things quickly.

The benefits of these two forms of applications in their best forms can be pretty substantial:

Resources: Human Capital is becoming increasingly scarce — and therefore expensive. This can stop a lot of ambitious projects dead in their tracks. Low-Code and No-Code tools minimize the amount of specialized technical skills needed to get an application of the ground, which means things can get done more quickly and at a lower cost.
Low R isk/H igh ROI: Security processes, data integrations, and cross-platform support are all built into Low-Code and No-Code tools, meaning less risk and more time to focus on your business goals.
Moving to Production: Similarly, for both types of tools a single click is all it takes to send or deploy a model or application you built to production.

Looking at these advantages, it is no wonder that both Low-Code and No-Code have been taking industries by storm recently. While being distinctly different in terms of users, they serve the same goal — that is to say, faster, safer and cheaper deployment. Given these similarities, both terms will be grouped together under the ‘No-Code’ term for the rest of this article unless otherwise specified.

List of No-Code Data Tools

So far, we have covered the applications of No-Code in a very general way, but for the rest of this article, I would like to focus on data modeling. No-Code tools are prevalent in software development, but have also, in particular, started to take hold in this space, and some applications even claim to be an alternative to SQL and other querying languages (crazy, right?!). My reasons for focusing on this are two-fold:

Firstly, there is a lot of existing analysis around this problem for software development and very little for data modeling. Secondly, this is also the area in which I have the most expertise.

Now let’s take a look at some of the vendors that provide No-Code solutions in this space. These in no way constitute a complete list and are, for the most part, not exclusively built for data modeling.

1. No-Code Data Modeling in Power BI

Power BI was created by Microsoft and aims to provide interactive visualizations and business intelligence capabilities to all types of business users. Their simple interface is meant to allow end-users to create their own reports and dashboards through a number of features, including data mapping, transformation, and visualization through dashboards. Power BI does support some R coding capabilities for visualization, but when it comes to data modeling, it is a true No-Code tool.

2. Alteryx as a Low-Code Alternative

Alteryx is meant to make advanced analytics accessible to any data worker. To achieve this, it offers several data analytics solutions. Alteryx specializes in self-service analytics with an intuitive UI. Their offerings can be used as Extract, Transform, Load (ETL) Tools within their own framework. Alteryx allows data workers to organize their data pipelines through their custom features and SQL code blocks. As such, they are easily identified as a Low-Code solution.

3. Is Tableau a No-Code Data Modeling Solution?

Tableau is a visual analytics platform and a direct competitor to Power BI. They were recently acquired by Salesforce which is now hoping to ‘transform the way we use data to solve problems—empowering people and organizations to make the most of their data.’ It is also a pretty obvious No-Code platform that is supposed to appeal to all types of end-users. As of now, it offers fewer tools for data modeling than Power BI, but that is likely to change in the future.

4. Looker is a No-Code Alternative to SQL

Looker is a business intelligence software and big data analytics platform that promises to help you explore, analyze, and share real-time business analytics easily. Very much in line with Tableau and Power BI, it aims to make non-technical end-users proficient in a variety of data tasks such as transformation, modeling, and visualization.

You might be wondering why I am including so many BI/Visualization platforms when talking about potential alternatives to SQL. After all, these tools are only set up to address an organization’s reporting needs, which constitute only one of the use cases for data queries and SQL. This is certainly a valid point, so allow me to clarify my reasoning a bit more.

While it is true that reporting is only one of many potential uses for SQL, it is nevertheless an extremely important one. There is a good reason why there are so many No-Code BI tools in the market—to address heightening demand from enterprises around the world — and therefore, it is worth taking a closer look at their almost inevitable shortcomings.

Source de l’article sur DZONE

23 avril 2021/par Service comm.

Top 5 Reasons Presto Is the Foundation of the Data Analytics Stack

Actualités, Méthodes et organisation des process IT

The need for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data continues to grow explosively. Data platform teams are increasingly using the federated SQL query engine PrestoDB to run such analytics for a variety of use cases across a wide range of data lakes and databases in-place, without the need to move data. PrestoDB is hosted by the Linux Foundation’s Presto Foundation and is the same project running at massive scale at Facebook, Uber and Twitter.

Let’s look at some important characteristics of Presto that account for its growing adoption.

Source de l’article sur DZONE

26 octobre 2020/par Service comm.

4 Ways Big Data Is Evolving Risk Management

Actualités, Méthodes et organisation des process IT

In the digital era, Big Data has drastically changed the landscape of business and risk management. With unlimited access to information about potential customers and user behavior, companies are using analytics to improve their risk management practices in more advanced ways than ever before.

Big Data Analytics

Techwave’s Big data analytics consulting services help you maximize revenue options and win loyal and happy customers.

Why Big Data Is Important

Big data has been around a long time, but it has taken a while for organizations to see the usefulness of big data. Big data doesn’t just track the consumer when they are online – it provides a history of behaviors that big data services can analyze and extrapolate from. If the consumer uses smart devices, makes a purchase with credit cards or checks, or visits establishments that use smart devices, they leave a data trail that can be analyzed by big data consulting to determine possible trends. These trends help businesses understand what drives their customers to make certain purchases over others.

Source de l’article sur DZONE

23 octobre 2020/par Service comm.

How Data Is Influencing Public Services

Actualités, Méthodes et organisation des process IT

Advocates of data-driven transformation firmly believe its tentacles will extend into every walk of life, but the delivery of public services has traditionally been slightly slower on the uptake of new technologies than many domains. A recent report from Cardiff University explores whether that is also the case with data analytics.

The researchers examined the various data systems that underpin government services in the UK, with a specific emphasis on the number of decisions that are underpinned by data and algorithms. Rather surprisingly, they suggest that the collection and sharing of data across local and national governments is now pretty widespread.

Source de l’article sur DZONE

8 janvier 2019/0 Commentaires/par Service comm.

Big Data #Futures

Actualités, Méthodes et organisation des process IT

To understand the current and future state of big data, we spoke to 31 IT executives from 28 organizations. We asked them, "What’s the future of big data ingestion, management, and analysis from your perspective – where do the greatest opportunities lie?" Here’s what they told us:

AI/ML

We’ll see the transition from on-prem to the cloud and subsequently see traditional Hadoop make the transition to the cloud. This will lead to higher adoption of AI/ML.
Just drive the digitization agenda of the company. You have sufficient compute power and data – what could you do? Take advantage of the capability. Use AI/ML to filter through the data. Enable more people to get involved.
Leverage big data and ML anomaly detection with more sensors entering the world. Cameras checking on safety helmets, ML models from city sensors early warning indicators. The entire economy becomes information driven. Understand why anomalies might happen.
1) AI/ML becoming less hype and more of a trend. ML needs big data to work. Any ML requires big data. Big data is not useful by itself. Ability to have an engine automatically see trends and make suggestions on what to look at is valuable. 2) Expect more tools for visualizing and reporting on big data. Salesforce has Einstein. Tableau has a tool. Expect thousands more we haven’t seen yet. AI/ML will become more prevalent.
AI protected systems. Maintain and keep the data safer. Create ethical and moral dilemmas for humans. Protect the data because at some point it will be turned over to machines which is terrifying because you don’t know what the machine may do with it and you cannot recover.
The use of AI and ML technologies, like TensorFlow, providing the greatest possible future opportunities for big data applications. With AI the computer uncovers patterns that a human is unable to see.
We’re going to suffer a talent problem in organizations. Ability to make the value of the data visible to people who are not data scientists is an important factor to deal with. AI/ML will focus on making sense of data to provide answers to people. Context is also important — how can we create context and get it out of people’s heads?
Maturing past the Hadoop data lake world. Hadoop is a workload good for some things and not good for everything. Everyone is taking a deep breath. Hadoop is good for these things. THe same is true for the data lake. You have to go through the growing pains to figure it out. Opportunity increases as we get more into the world of AI and the system is finding things in the future, that’s the reality, we’ll get there as an industry. Huge opportunity to do across data and workloads. You have to scope that. Some use cases and workloads.

Streaming

Streaming for more real-time, faster ingestion, and analysis. Still in the early days of animated actions with the data.
How to connect real-time data to get the most out of big data. Connect data and dots to explore and predict relationships.

Tools

Recognition about the variety of data. Rationalize across all the different kinds of data. Aggregation of variables – credit bureau, core banking system, data on Hadoop. There’s an opportunity with the proliferation of tools you can put in the hands of the data analysts or business users rather than relying on data governance or DBAs. Give people access to the data and the tools to manipulate.
More maturity and more tools with the ability to interpret. 1) More data, more types, streaming more quickly. 2) Analytical methods used to process the data. 3) Automation of an insight.
The trend to make the common denominator across systems more SQL-centric through an API. SQL is how devs interact with data across different systems. Move to more open source and lower cost tooling as a visualization step. The difference between Power BI and Tableau is shrinking. Data-as-a-service makes tools for visualization less critical. Increasing role of the data steward bridge between analyst and data consumer to be more self-sufficient.
There is a continued drive for standardization of data ingestion, with many companies looking to Kafka for on-premises or private cloud or Kinesis for AWS cloud. Data management and analytic tools then become sinks and sources for data for those data movement frameworks, which creates a sort of data utility grid for these companies, sort of like the electrical system in a house. If you need electricity, you just need an appliance with a standard plug and you plug in. The same is occurring with data access — and is already in place at some companies — if you need to get use of data or provide data to someone else, you just plug your application into the data grid using their standard “plug” (or interface). This will also allow for more “best of breed” components to be used, like the best BI or analytics tool or best database for a particular workload rather than having to compromise on an inferior all-in-one product since the data integration will be more standard than custom. Localization of data is a great opportunity, too. That is, having data located in the world where it is needed rather than needed to traverse long networks in order to retrieve it, process it, change it, or analyze it. That means more master-master, active-active architectures which can create application challenges for any enterprise, so the right choice of components will be important.
Leading companies are increasingly standardizing on mature open source technologies like Apache Kafka, Apache Ignite, and Apache Spark to ingest, manage and analyze their big data. All of these projects have experienced major adoption growth in the past few years and it appears likely this will continue for the foreseeable future. As these technologies mature and become increasingly easier to install and use, they will create opportunities for those who know how to use and implement distributed computing technologies for an increasingly real-time world.

Other

Look at tagging, get the proper metadata models, ensure the context of the information. Tags and metadata draw context. Ensure proper metadata is wrapped around. Have traceability for reliability.
Focus on operationalization driven by the continued emergence of streaming always-on technology. Complete understanding of what’s going on. The cloud drives this home where cloud-based application architectures are always on and being updated. The same needs to happen with data architectures with automation. Customers see themselves going down a data operations path.
All three parts of big data can lead to a considerably successful project in terms of ROI as well as data governance. I would order them hierarchically. First, we need to be able to collect data in large amounts from many different sources. Once the data becomes available, the proper management, like the proper creation of informative KPIs, might already lead to some unexpected discoveries. Finally, after the data have been so transformed, their analysis produces even further insights that are vital for the company business. So, as you see, you already get information from step 1. But you can get step 2 without having completed step 1 first.
This will all become easier. Things that are challenging today will become second nature and automated in the future. See ease of accessing big data just as easy as anything we do on a computer. Handling, moving, connecting will have far less friction. Using big data identifying the value proposition within the data is where the opportunities lie within each business.
Augmented analytics pulling together natural language, data, and analytics to drive answers. How do we get to analyzing based on identifying what you don’t know to query?
Data analysts and scientists don’t care where the data is, they just want the data and the tools they need to analyze it. Catalog and know where the data is. Next step just want data where I want it. Build a virtual catalog to access delivery. There’s a logical progression of what we’re doing.
Regardless of on-prem or cloud needs companies to ensure the engine keeps working so you can get value. As a service model doesn’t automatically solve the problems. Need to know and manage performance problems. Bring performance transparency. Think through security from end-to-end.
The future is big data analytical platforms that provide proven capabilities for ingestion, management, and analysis at the speed and scale that enables businesses to compete and win. The greatest opportunities are for businesses to no longer be constrained by the imagination of the business in getting accurate insight so that they can act on all opportunities – understand exactly which customers are likely to churn and grow your business, establish entirely new business models based on the revenue-generating capabilities of data (think pay-as-you-drive insurance, as an example). Every single industry can differentiate itself based on the valuable insight of the data. Make an investment in a proven data analytical platform that offers no compromises and prepares you for whatever the future holds in terms of deployment models (clouds, on-premises, on Hadoop, etc.) or seamless integration with emerging technologies in the data ecosystem.
The greatest opportunities lie in delivering true agile data engineering processes that allow companies to quickly create data pipelines to answer new business questions without requiring business people to depend on IT. This requires the automation of the end-to-end development, operationalization, and ongoing governance of big data environments in an integrated fashion. The key to success is automating away the complexity so organizations can use people with basic SQL and data management skills to fully leverage big data for competitive advantage.
There is a very bright future ahead for all of these. One area of great opportunity is in the IoT arena. There are over 9 billion devices deployed and the rate of deployment is speeding up as the cost of devices decreases and the sophistication of devices increases. This device data requires very high-speed ingestion and robust management. It is also ripe for advanced analytics such as machine learning for outlier detection.
We see three mission-critical opportunities in the future of data-driven marketing and sales. 1) Cord-Cutters — Our clients’ customers are more mobile and digital than ever. Traditional data elements and IDs such as home phone, home address, business extension, etc. have to be complemented with digital IDs such as mobile phone number, GPS coordinates, cookie ID, device ID, MAIDs, etc. 2) Predictive World — Artificial intelligence is woven throughout our everyday lives and experiences. Our phones predict the next few words in the sentence we are texting. Our thermostats predict what temperature is optimal for personal warmth and cost savings. Our cars brake for us before an accident happens. Consumers now expect marketing and sales experiences will also be predictive, using data and intelligence to improve their brand experiences in real-time. 3) B2B2C Life — There is a blending of our business and consumer selves. Research shows that approximately 43% of consumer work remotely and the number of people that spend > 50% of their time working at home has grown 115% over the past 10 years. Therefore, marketers must be able to connect the data IDs, attributes and behaviors of individuals versus siloed B2B or B2C targeting.

Here’s who we spoke to:

Source de l’article sur DZONE

29 novembre 2018/0 Commentaires/par Service comm.