It’s time I thought of talking to you again. I happened to have some exposure to SPSS modeler and wanted to share some of the information with you. There are several articles, blogs, demos and product manuals of the same to guide/confuse you. I would just share a gist. Here’s how I would go. I would start with exploring the IBM stack in this space and then give you tiny tit-bits on the modeler. Again when I explore the stack it’s based on my individual perception and might not align with the formal positioning. Most of the products are individually licensed. We will have to mix and match while offering for customer based on their need of operational and analytical decision management. Currently I am limiting to the structured data space and am not touching upon Social Network, Text Analytics or products like Big Insight or Watson which envisages AI to be taken to a crescendo and have capabilities to make many highly paid business consultants redundant.
· SPSS Modeler desktop :It’s a GUI based cool tool , with blocks available for different tasks, where you can define your data mining , predictive analytics tasks. The desktop version can really commoditize data mining with its drag and drop features. Suited for data scientists as well as business users who have a flare to the algorithms.
· SPSS Modeler Server edition would be required for scheduling of jobs, batch mode execution etc
· CADS will be required for collaboration, deployment and scoring. Collaboration will allow multiple people participate in model building and scoring will allow real time integration, with exposing the models as web services. So for a real time analytical application like fraud detection or day to day web experiences like association for cross sell and up sell it’s surely needed. Its champion, challenger model sounds fresh, where the challenger models become new champion on better result.
· SPSS Statistica : Offering much more flexibility with scripting and customization. More suited for techies and statisticians. Can complement modeler for advanced statistical tasks.
· Analytical Decision Management: This is for the business users. They can lay out the skeleton of the models here and the tech team can work in the background to put flesh and bone. This allows combining both business rules defined in the analytical decision management and rules coming out of SPSS modeler. It uses CPLEX which is a constraint based optimizer.
· Entity Analytics: As the name suggests this is aligned for identifying logical duplicates and de-duplication. Results of this can significantly improve modelers’ accuracy.
Again in this section after talking about the nodes, I only talk about some of the features I liked
- It provides you an interface to follow crisp DM methodology, which is a cross industry standard consisting of stages like (Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and deployment)
- The data mining workflow will be defined in terms of various nodes like
o Source node : For reading the data
o Record node: It affects the no. of records. It can be as simple as filtering or aggregation.
o Field node: Used for data transformations, cleaning and preparation. Automated data preparation (ADP) is a very handy filter note, allowing many easy and custom transformations with a significant probability of improving the accuracy. Anonymize allows to suppress/mask private information which is very relevant given so many prevalent compliance norms.
o Graph Node: Allows many types of visualization as well as evaluation of the models.
o Modeling Node : This is the cream, which will have the data mining models. There is another group within the modeling node which is statistical node, which yet again offers useful functionalities like PCA, Factor, discriminant analysis etc.
o Output Node : This will be required for analyzing the results
o Export Node: Allows data to be transported to other software tools like excel, SAS etc.
o Super Node: Allows grouping of multiple nodes in more reusable and modular fashion.
- Offers quite a few standard algorithms for common tasks like classification, regression, clustering, time series & association.
- Auto classifier, auto clusters really makes evaluation of models so easy. To clarify little bit more, we can use a classifier to detect a risky loan, we can be in two minds so as to which algorithms to be picked is it neural net or decision tree or may be logistic regression. Auto classifier can do an evaluation on all of them and pick the best.
- SQL Pushback: Allows to push back some of the computations to the database.
- In Database Mining : Allows SPSS Modeller , leverage native algorithms of other database vendors offering data mining capabilities like IBM Netezza, IBM DB2 InfoSphere Warehouse, Oracle Data Miner, and Microsoft Analysis Services
Overall modeler is a great tool which is easy to use and is intuitive and IBM has a rich portfolio of advanced analytics and decision management products, however so wide range may be confusing to the end customer and industry specific packaged solutions with combinations of products can demystify the same. Also packaging and readily available blocks are so far so good but need for deep domain knowledge, statistical understanding is here to stay, for superior results
I intentionally wanted to keep it short and just tickle your curiosities. The festival of lights is nearing. Wish you and your family a safe and joyous time!
Will meet you again real soon! I hope you enjoyed.