Articles

Elasticsearch (ES) is the most common open-source distributed search engine. It’s based on Lucene, an information-retrieval library, and provides powerful search and query capabilities. To learn its search principles, you must understand Lucene. To learn the ES architecture, you must know how to implement a distributed system. Consistency is at the core of distributed systems.

This article describes the ES cluster composition, node discovery, master election, error detection, and scaling. In terms of node discovery and master election, ES uses its own implementation instead of external components such as ZooKeeper. We will describe how this mechanism works, and the problems with it. This series covers:

Source de l’article sur DZONE

Facial recognition technology has become an increasingly deployed tool in law enforcement, whether checking people as they navigate the airport or even in stop and search. A recent report from Cardiff University explores how effective these technologies have been.

The research evaluated the use of automated facial recognition (AFR) at numerous sport and entertainment events by the South Wales Police force over the course of a year. The analysis found that while the technology can help the police identify suspect individuals they might otherwise have missed, it requires a large investment and considerable changes in standard operating procedures to enable consistent results to be achieved.

Source de l’article sur DZONE

Advocates of data-driven transformation firmly believe its tentacles will extend into every walk of life, but the delivery of public services has traditionally been slightly slower on the uptake of new technologies than many domains. A recent report from Cardiff University explores whether that is also the case with data analytics.

The researchers examined the various data systems that underpin government services in the UK, with a specific emphasis on the number of decisions that are underpinned by data and algorithms. Rather surprisingly, they suggest that the collection and sharing of data across local and national governments is now pretty widespread.

Source de l’article sur DZONE

To understand the current and future state of big data, we spoke to 31 IT executives from 28 organizations. We asked them, "What’s the future of big data ingestion, management, and analysis from your perspective – where do the greatest opportunities lie?" Here’s what they told us:

AI/ML

  • We’ll see the transition from on-prem to the cloud and subsequently see traditional Hadoop make the transition to the cloud. This will lead to higher adoption of AI/ML. 
  • Just drive the digitization agenda of the company. You have sufficient compute power and data – what could you do? Take advantage of the capability. Use AI/ML to filter through the data. Enable more people to get involved. 
  • Leverage big data and ML anomaly detection with more sensors entering the world. Cameras checking on safety helmets, ML models from city sensors early warning indicators. The entire economy becomes information driven. Understand why anomalies might happen. 
  • 1) AI/ML becoming less hype and more of a trend. ML needs big data to work. Any ML requires big data. Big data is not useful by itself. Ability to have an engine automatically see trends and make suggestions on what to look at is valuable. 2) Expect more tools for visualizing and reporting on big data. Salesforce has Einstein. Tableau has a tool. Expect thousands more we haven’t seen yet. AI/ML will become more prevalent. 
  • AI protected systems. Maintain and keep the data safer. Create ethical and moral dilemmas for humans. Protect the data because at some point it will be turned over to machines which is terrifying because you don’t know what the machine may do with it and you cannot recover. 
  • The use of AI and ML technologies, like TensorFlow, providing the greatest possible future opportunities for big data applications. With AI the computer uncovers patterns that a human is unable to see. 
  • We’re going to suffer a talent problem in organizations. Ability to make the value of the data visible to people who are not data scientists is an important factor to deal with. AI/ML will focus on making sense of data to provide answers to people. Context is also important — how can we create context and get it out of people’s heads?
  • Maturing past the Hadoop data lake world. Hadoop is a workload good for some things and not good for everything. Everyone is taking a deep breath. Hadoop is good for these things. THe same is true for the data lake. You have to go through the growing pains to figure it out. Opportunity increases as we get more into the world of AI and the system is finding things in the future, that’s the reality, we’ll get there as an industry. Huge opportunity to do across data and workloads. You have to scope that. Some use cases and workloads.

Streaming

  • Streaming for more real-time, faster ingestion, and analysis. Still in the early days of animated actions with the data.
  • How to connect real-time data to get the most out of big data. Connect data and dots to explore and predict relationships.

Tools

  • Recognition about the variety of data. Rationalize across all the different kinds of data. Aggregation of variables – credit bureau, core banking system, data on Hadoop. There’s an opportunity with the proliferation of tools you can put in the hands of the data analysts or business users rather than relying on data governance or DBAs. Give people access to the data and the tools to manipulate. 
  • More maturity and more tools with the ability to interpret. 1) More data, more types, streaming more quickly. 2) Analytical methods used to process the data. 3) Automation of an insight. 
  • The trend to make the common denominator across systems more SQL-centric through an API. SQL is how devs interact with data across different systems. Move to more open source and lower cost tooling as a visualization step. The difference between Power BI and Tableau is shrinking. Data-as-a-service makes tools for visualization less critical. Increasing role of the data steward bridge between analyst and data consumer to be more self-sufficient. 
  • There is a continued drive for standardization of data ingestion, with many companies looking to Kafka for on-premises or private cloud or Kinesis for AWS cloud. Data management and analytic tools then become sinks and sources for data for those data movement frameworks, which creates a sort of data utility grid for these companies, sort of like the electrical system in a house. If you need electricity, you just need an appliance with a standard plug and you plug in. The same is occurring with data access — and is already in place at some companies — if you need to get use of data or provide data to someone else, you just plug your application into the data grid using their standard “plug” (or interface). This will also allow for more “best of breed” components to be used, like the best BI or analytics tool or best database for a particular workload rather than having to compromise on an inferior all-in-one product since the data integration will be more standard than custom. Localization of data is a great opportunity, too. That is, having data located in the world where it is needed rather than needed to traverse long networks in order to retrieve it, process it, change it, or analyze it. That means more master-master, active-active architectures which can create application challenges for any enterprise, so the right choice of components will be important.  
  • Leading companies are increasingly standardizing on mature open source technologies like Apache Kafka, Apache Ignite, and Apache Spark to ingest, manage and analyze their big data. All of these projects have experienced major adoption growth in the past few years and it appears likely this will continue for the foreseeable future. As these technologies mature and become increasingly easier to install and use, they will create opportunities for those who know how to use and implement distributed computing technologies for an increasingly real-time world.

Other

  • Look at tagging, get the proper metadata models, ensure the context of the information. Tags and metadata draw context. Ensure proper metadata is wrapped around. Have traceability for reliability.
  • Focus on operationalization driven by the continued emergence of streaming always-on technology. Complete understanding of what’s going on. The cloud drives this home where cloud-based application architectures are always on and being updated. The same needs to happen with data architectures with automation. Customers see themselves going down a data operations path.
  • All three parts of big data can lead to a considerably successful project in terms of ROI as well as data governance. I would order them hierarchically. First, we need to be able to collect data in large amounts from many different sources. Once the data becomes available, the proper management, like the proper creation of informative KPIs, might already lead to some unexpected discoveries. Finally, after the data have been so transformed, their analysis produces even further insights that are vital for the company business. So, as you see, you already get information from step 1. But you can get step 2 without having completed step 1 first.
  • This will all become easier. Things that are challenging today will become second nature and automated in the future. See ease of accessing big data just as easy as anything we do on a computer. Handling, moving, connecting will have far less friction.  Using big data identifying the value proposition within the data is where the opportunities lie within each business.
  • Augmented analytics pulling together natural language, data, and analytics to drive answers. How do we get to analyzing based on identifying what you don’t know to query?
  • Data analysts and scientists don’t care where the data is, they just want the data and the tools they need to analyze it. Catalog and know where the data is. Next step just want data where I want it. Build a virtual catalog to access delivery. There’s a logical progression of what we’re doing.
  • Regardless of on-prem or cloud needs companies to ensure the engine keeps working so you can get value. As a service model doesn’t automatically solve the problems. Need to know and manage performance problems. Bring performance transparency. Think through security from end-to-end.
  • The future is big data analytical platforms that provide proven capabilities for ingestion, management, and analysis at the speed and scale that enables businesses to compete and win. The greatest opportunities are for businesses to no longer be constrained by the imagination of the business in getting accurate insight so that they can act on all opportunities – understand exactly which customers are likely to churn and grow your business, establish entirely new business models based on the revenue-generating capabilities of data (think pay-as-you-drive insurance, as an example). Every single industry can differentiate itself based on the valuable insight of the data. Make an investment in a proven data analytical platform that offers no compromises and prepares you for whatever the future holds in terms of deployment models (clouds, on-premises, on Hadoop, etc.) or seamless integration with emerging technologies in the data ecosystem.
  • The greatest opportunities lie in delivering true agile data engineering processes that allow companies to quickly create data pipelines to answer new business questions without requiring business people to depend on IT. This requires the automation of the end-to-end development, operationalization, and ongoing governance of big data environments in an integrated fashion. The key to success is automating away the complexity so organizations can use people with basic SQL and data management skills to fully leverage big data for competitive advantage.
  • There is a very bright future ahead for all of these. One area of great opportunity is in the IoT arena. There are over 9 billion devices deployed and the rate of deployment is speeding up as the cost of devices decreases and the sophistication of devices increases. This device data requires very high-speed ingestion and robust management. It is also ripe for advanced analytics such as machine learning for outlier detection.
  • We see three mission-critical opportunities in the future of data-driven marketing and sales. 1) Cord-Cutters — Our clients’ customers are more mobile and digital than ever. Traditional data elements and IDs such as home phone, home address, business extension, etc. have to be complemented with digital IDs such as mobile phone number, GPS coordinates, cookie ID, device ID, MAIDs, etc. 2) Predictive World — Artificial intelligence is woven throughout our everyday lives and experiences. Our phones predict the next few words in the sentence we are texting. Our thermostats predict what temperature is optimal for personal warmth and cost savings. Our cars brake for us before an accident happens. Consumers now expect marketing and sales experiences will also be predictive, using data and intelligence to improve their brand experiences in real-time. 3) B2B2C Life — There is a blending of our business and consumer selves. Research shows that approximately 43% of consumer work remotely and the number of people that spend > 50% of their time working at home has grown 115% over the past 10 years. Therefore, marketers must be able to connect the data IDs, attributes and behaviors of individuals versus siloed B2B or B2C targeting. 

Here’s who we spoke to:

Source de l’article sur DZONE

After a decade of stop-and-go development, Artificial Intelligence has now begun to provide real, tangible value to the business world. McKinsey published an 80-page report titled "Artificial Intelligence: The Next Digital Frontier?" which provides a comprehensive analysis of the value that Artificial Intelligence (AI) creates for businesses.

The report points out that "wide application of Artificial Intelligence technology will bring great returns to businesses." This means that the disruptive nature of AI will continue to become more apparent in the future. Governments, enterprises, and developers should all be clear on this point. Moreover, the report raises some interesting points (all of which we will discuss later in this article):


Source de l’article sur DZONE (AI)

Kyvos Version 5 was released with new enhancements to scale growing workloads.

The update, which was designed specifically for the cloud, allows businesses to scale growing workloads and draw intelligence, which in turn enables businesses to run real-time queries on data before it enters the cube. 

Source de l’article sur DZONE

While Artificial Intelligence and Machine Learning provide ample possibilities for businesses to improve their operations and maximize their revenues, there is no such thing as a “free lunch.”

The “no free lunch” problem is the AI/ML industry adaptation of the age-old “no one-size-fits-all” problem. The array of problems the businesses face is huge, and the variety of ML models used to solve these problems is quite wide, as some algorithms are better at dealing with certain types of problems than the others. Thus said, one needs a clear understanding of what every type of ML models is good for, and today we list 10 most popular AI algorithms:


Source de l’article sur DZONE (AI)


What Is AdaBoost?

First of all, AdaBoost is short for Adaptive Boosting. Basically, Ada Boosting was the first really successful boosting algorithm developed for binary classification. Also, it is the best starting point for understanding boosting. Moreover, modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines.

Generally, AdaBoost is used with short decision trees. Further, the first tree is created, the performance of the tree on each training instance is used. Also, we use it to weight how much attention the next tree. Thus, it is created should pay attention to each training instance. Hence, training data that is hard to predict is given more weight. Although, whereas easy to predict instances are given less weight. 


Source de l’article sur DZONE (AI)


What Is NumPy?

NumPy is a powerful Python library that is primarily used for performing computations on multidimensional arrays. The word NumPy has been derived from two words — Numerical Python. NumPy provides a large set of library functions and operations that help programmers in easily performing numerical computations. These kinds of numerical computations are widely used in tasks like:

  • Machine Learning Models: while writing Machine Learning algorithms, one is supposed to perform various numerical computations on matrices. For instance, matrix multiplication, transposition, addition, etc. NumPy provides an excellent library for easy (in terms of writing code) and fast (in terms of speed) computations. NumPy arrays are used to store both the training data as well as the parameters of the Machine Learning models.
  • Image Processing and Computer Graphics: Images in the computer are represented as multidimensional arrays of numbers. NumPy becomes the most natural choice for the same. NumPy, in fact, provides some excellent library functions for fast manipulation of images. Some examples are mirroring an image, rotating an image by a certain angle, etc.
  • Mathematical tasks: NumPy is quite useful to perform various mathematical tasks like numerical integration, differentiation, interpolation, extrapolation, and many others. As such, it forms a quick Python-based replacement of MATLAB when it comes to Mathematical tasks.

NumPy Installation

The fastest and the easiest way to install NumPy on your machine is to use the following command on the shell: pip install numpy.

Source de l’article sur DZONE

Analytics forms a major part of the conceptual design of an app. Data tracking and collection for the purpose of analytics allows us to better update our app for consumer use. Data tracking is akin to the idea of feedback from a user. By collecting the data in a way that makes sense to us, we can add features or upgrade the existing elements of our app to meet the demand of the consumer. While this allows for a certain level of automatic feedback it does not mean that we can outright ignore our app users’ comments either. A firm balance of both is the middle ground that we should be chasing.

Relevant Data Tracking

As a developer, data tracking is simply one more SDK that needs to be built into our existing application framework. We obviously don’t need every bit of information the app can collect – most of this is useless to our determination of whether the app functions as expected or not. We can, however, produce separate use cases to test whether users find a certain button layout more conducive to their app use. We can also collect information such as uninstall/reinstall information, the orientation of the device, loading time of the application and, by extension, its performance on a number of different handsets (very useful in benchmarking the processing friendliness of the application), account information, and, of course, crashes and exception data that can help improve our user experience. David Cearley from Gartner Inc. is noted as saying that every app now needs to be an analytics app and we can only do this through tracking the data relevant to our app.

Source de l’article sur DZONE