Abstract: In the previous article, data acquisition and pre-process layer was introduced. After acquiring clean data, data analysis task should be performed. Analysis of data is a vital part of running a successful business. When data is used effectively, it helps businesses have better decision-making for their future activities. There are various types of data analysis that will be introduced in this article such as correlation analysis, statistic analysis, descriptive analysis, distribution analysis, diagnostic analysis, predictive analysis, prescriptive analysis, and so on. They are all linked together and build upon each other. They can be the simplest type to more complex. The more complex it is; the more valuable insight it adds.
Read more about Applications of machine learning in manufacturing
Statistical analysis is the science of collecting, exploring, and presenting large amounts of data to discover underlying patterns and trends. Note that the keyword here is “statistic”. Statistics are applied every day – in research, industry, and studying – to become more scientific about the decisions that need to be made. Some applications:
Manufacturers use statistics to weave quality into beautiful fabrics, bring a lift to the airline industry, and help guitarists make beautiful music
Researchers keep children healthy by using statistics to analyze data from the production of viral vaccines, which ensures consistency and safety.
Communication companies use statistics to optimize network resources, improve service and reduce customer churn by gaining greater insight into subscriber requirements.
Government agencies around the world rely on statistics for a clear understanding of their countries, their businesses and their people.
Traditional statistical methods have been used for a long time. However, Internet of Things (IoT) data volumes make statistics more valuable and powerful. Statistical computing has become more and more essential for today’s statisticians.
Correlation analysis Is a statistical method used to evaluate the strength of the relationship between two quantitative variables. A high correlation means that two or more variables have a strong relationship with each other, while a weak correlation describes that the connection between two variables is hardly related. In other words, it is the process of studying the strength of that relationship with available data. Correlations are useful because if the relationship between the two variables is detected, future behaviour can be predicted. A correlation coefficient is a way to show how strong the relationship between variables is.
The correlation coefficient has a value of between -1 and 1. “0” stands for no relationship between variables at all, while -1 and 1 mean that there is a perfect negative or positive correlation. The direction of the relationship is indicated by the sign of the coefficient; a + sign indicates a positive relationship and a – sign indicates a negative relationship. Correlation relation can be described as a plot or matrix below
Example of Correlation analysis of Chiller analyzed by Daviteq research and development team
The matrix above shows an example of the correlation between each feature of the chiller machine in the factory. It indicates which feature affected the electrical energy consumption of the chiller machine. From that result, a better optimization method for the chiller will be considered and recommended. It leads to the reduction of electrical energy in manufacturing. In general, detecting the correlation of all features, helps the manufacturer to optimize the resource.
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries of the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. With descriptive statistics, the data can be described with various dimensions. Descriptive statistics simply describes what is going on in our data.
Descriptive Statistics are used to present quantitative descriptions in a manageable form. In manufacturing, we may have a lot of measurements. Descriptive statistics help to simplify large amounts of data sensibly. In other words, descriptive statistics reduces lots of data into a simpler summary. Although descriptive statistics may have limitations, it provides a powerful summary that may enable comparisons across units. There is 3 common type of descriptive analysis include distribution, central tendency, and dispersion.
The distribution is a summary of the frequency of individual values or ranges of values for a variable. One of the most common ways to describe a single variable is with a frequency distribution. Depending on the particular variable, all of the data values may be represented, or data may be grouped into categories first. Frequency distributions can be depicted in two ways, as a table or as a graph. A frequency distribution can be depicted in a graph as shown in the figure below. This type of graph is often referred to as a histogram.
Example of distribution of chiller machine presented by histogram plot
It is a good idea to create a histogram to get an idea of the shape of the distribution. These analyses would also help to identify outlying and use to double-check data entry errors.
The central tendency of a distribution is an estimate of the “centre” of a distribution of values. Central tendency aims to provide an accurate description of the entire data. It is the single value that is the most typical/representative of the collected data. The term “number crunching” is used to illustrate this aspect of data description. There are three major types of estimates of central tendency: Mean, Median, and Mode.
In statistics, the measure of central tendency gives a single value that represents the whole value; however, the central tendency cannot describe the observation fully. The measure of dispersion helps us to study the variability of the items. In a statistical sense, dispersion has two meanings: first, it measures the variation of the items among themselves, and second, it measures the variation around the average. If the difference between the value and average is high, then dispersion will be high. Otherwise, it will be low. Researchers use this technique because it determines the reliability of the average. Dispersion also helps researchers in comparing two or more series. There are two common measures of dispersion, the range, and the standard deviation. The range is simply the highest value minus the lowest value. The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range.
(Source: Governance Analytics)
This is the next step of complexity in data analytics. Diagnostic analytics describes the techniques to describe “why did this happen?”. On assessment of the descriptive data, diagnostic analytical tools will empower an analyst to drill down and in so doing isolate the root cause of a problem. It helps get value out of collecting data by asking the right questions and making deep dives for the answers. This includes using processes such as data discovery, data mining, correlation, drill down and drill through.
The diagnostic analysis is one of the ways we uncover insights from our data and make it gain value for us. There are infinite ways to ask questions about data. Therefore, concentrate on which questions are the most critical part of manufacturing. The goal of all analytics should be more relevant information, which will lead to more valuable decisions and a more complete understanding of the factory.
Predictive analysis is a branch of advanced analytics. Predictive analysis is all about forecasting. Predictive models can estimate a quantifiable amount or a point in time at which something might happen. Predictive models typically utilize a variety of variable data to make the prediction.
Any industry is turning to predictive analytics to help solve difficult problems and uncover new opportunities. The first thing is that predictive analytics can help improve operations. A predictive model can be used to manage resources. For example, the coal will be used in a steam boiler machine. From that, it enables machines/equipment to function more efficiently. Second, it helps the manufacturer reduce risk. It helps to forecast when part of the machine will be broken. Therefore, the engineer can maintain it on time. Predicting maintenance helps to reduce downtime for the factory.
For manufacturers, it’s very important to identify factors leading to reduced quality and production failures, as well as to optimize parts, service resources and distribution. Lenovo is just one manufacturer that has used predictive analytics to better understand warranty claims - an initiative that led to a 10-15% reduction in warranty costs.
Prescriptive analysis is a type of data analytics - the use of technology - to help businesses make better decisions through the analysis of data. Specifically, prescriptive analytics factors information about possible situations or scenarios, available resources, past performance, and current performance, and suggests a course of action or strategy. It can be used to make decisions on any time horizon, from immediate to long term. The opposite of prescriptive analytics is descriptive analytics, which examines decisions and outcomes after the fact.
Example of Prescriptive analysis of chiller machine
Prescriptive analytics relies on artificial intelligence techniques, such as machine learning to understand the data acquired. Machine learning power makes it possible to process a large amount of data collected by IoT devices. When the new data is added, the computer program adjusts automatically to make use of it. Prescriptive analytics works with another type of data analytics, predictive analytics, which was introduced above. However, it goes further: Using the predictive analytics’ estimation of what is likely to happen, it recommends what future course to take. For example in the chiller machine, whenever new input comes, the computer program can calculate what the value of energy consumption should be.
In this article, several types of data analysis are introduced. Each of these types of data analysis is connected and relies on each other to a certain degree. Each of them is used for a different purpose and provides varying insight. Therefore, it is important to understand and use correctly these types of analysis. In the next article, we will investigate how to apply machine learning and build a model to gain benefits. If you have any questions, feel free to contact us at email: firstname.lastname@example.org