Predictive Analytics Model Methodology
Sub-field of computer science develop from computational learning and pattern reorganization theory in artificial intelligence, Machine learning is the method of making analytical models to automatically search previously unknown patterns from data that point out associations, anomalies (outliers), sequences, classifications, and clusters and segments. These patterns reveal hidden strategy as to why an event happened.
Businesses and organizations can take benefit of various types of uses for machine learning:
Segmentation, sets of clients who have same or similar purchase patterns for objective marketing
Classification based on a set of attributes to make a prediction
Forecasts—When purchase projections based on time series
Pattern detection that associates one product with other one to reveal cross-sell sequences and opportunities.
Anomaly detection— fraud detecting (for example)
Predictive analytics model methodology
The most widely used Cross Industry Standard Process for Data Mining methodology is used to develop predictive analytical models. It includes 6 phases:
- business understanding
- data understanding
- data preparation
- model development using supervised
- unsupervised learning
- model evaluation and model deployment
Business understanding
The understanding of business phase involves understand and define the use case or business problem, the business target and the business query that require to be answered. It also include defining success criteria. Then the criterion project-related action require to be process. These tasks involve defining resource needs such as defining any constraints, technology, people, money, creating a project plan, requirements, assessing risks and creating a contingency plan.
Data understanding
The understanding of data phase includes data needs such as internal and external data sources, origin and data characteristics (feature and quality) including 3Vs data volumes, variety, velocity, formats and so on, also whether the data is in a relational database, flat files, a Hadoop Distributed File System (HDFS) or if it is live, streaming data. This phase also includes data exploration and investigation using statistical analysis to look at hug data, In addition, a data quality assessment includes understanding the degree to which data is missing, has errors, is duplicated, and is inconsistent.
Data preparation
The objective of the data preparation phase is to produce a set of information that can be fed into machine-learning algos. This process requires a number of tasks including filtering and cleaning; data conversion; data transformation; data enrichment; and variable identification, which is also known as dimensionality reduction or feature selection. Variable identification’s objective is to create a data set of the most relevant variables to be used as model input to get optimal results. The intention is also to remove variables from a data set that are not useful as model input without compromising the model’s accuracy—for illustration, the accuracy of the predictions it makes.
Model development
The model development phase is about the development of a machine-learning model. Models can be build up to predict, forecast or analyze information to find patterns such as sets, groups and associations
Two types of machine learning can be used in model development:
supervised learning
unsupervised learning
Typically, predictive models are build up using supervised learning. For illustration, if we require to develop a model for equipment failure prediction, we can use data that describes equipment that has actually failed. We can use that data to train the new model to distinguish the profile of a piece of equipment that is colorable going to fail. To fulfill this profile recognition, we divide the data segments which inclusive failed equipment data records into a test data set and a training data set. Then we train the model by fill the training data set and segments into an algorithm, various of which can be used for prediction. Then we test the model by test data set.
Unsupervised learning is a method of analyzing data to try and search masked patterns in the data that indicate product association and groupings—for illustration, customer segmentation. Grouping is based on minimizing or maximizing similarity. The K-indicates clustering algorithm is a most widely used algorithm for this approach. Predictive and descriptive analytical models can be build up using advanced Developed data mining tools, analytics clouds, data science interactive workbooks with procedural or declarative programming languages and automated model development tools.
Model evaluation
Afterward Model developed, the next phase is to evaluate the accuracy and purity of predictions. For predictions, this assessment means understanding how many predictions were correct and incorrect? Various process can achieve this evaluation. Key measures in model evaluation are the number of true positives, true negatives, false positives and false negatives. The surface line is that we need to make surely that the model is accurate; otherwise, it could generate hug false positives that may result in incorrect actions and decisions.
Model deployment
Once we are happy with the model we’ve developed, the final phase involves deploying models to run in many various environment. These environments include spreadsheets, analytics servers, database management systems (DBMSs), applications, analytical relational database management systems, Apache Hadoop, Apache Spark and streaming analytics platforms.
Comments