From Data to Decisions – Journey of an Enterprise

September 25, 2018 | Debiprasad Banerjee

With so much data available from all parts of the business, it is often hard to determine which set of data is helpful for what decisions. It is also widely accepted that deeper insights come when we combine data from various aspects of the business and try to take a more holistic view of the problem. We precisely raised the same question in our previous discussion. Once we have framed the right questions, we should determine the data sets needed to feed into the AI system to derive the correct answers.

Workflow for Training an AI Model

Any large enterprise will typically have its data distributed across multiple records in systems residing in various departments with strict ownership, access, and usage rights. The map below broadly shows the steps involved in the workflow to identify and prepare the data for training an AI model.

Data Identification >> Data Collation >> Data Analysis >> Data Transformation >> Data Validation >> Data Tagging >> Data Splitting and creating sets for AI model training and validation

The first challenge is locating the appropriate data sources and getting access to them. Once that is done and the data is sourced and collated into a central location, we start the task of detailed analysis. It is probably the most critical step and typically would take a lot of time. This analysis should reveal gaps in the present data format and the desired one, which could be fed to the AI model. It is very crucial to have a business (domain) specific view of the data at this stage and should involve experts who understand the business context and the desired outcome from the AI model as the output. Any missing data would also be identified in this step, and we should loop back to the first step of locating and sourcing it, either from within or (sometimes) outside the enterprise. The data transformation and cleaning exercise that follows should continue to be a joint effort between the data scientists and the domain experts, concluding with a thorough data validation exercise. Once the desired data set is identified and cleansed, it is tagged for supervised learning of the AI model. While the tagging exercise can be very detailed and time-consuming and can be performed by relatively lower-skilled resources, it is again imperative to have strict QC measures implemented with the oversight of domain experts. It will ensure that the tagged data is of the highest quality, and the AI model will use it to learn appropriately and produce the desired results. Finally, the data scientists split the tagged data set into training and validation sets before training the AI model.

While the overall data flow remains the same in training the AI models, the individual steps can become exceedingly complex and time-consuming when dealing with deep learning models that process images, videos, or audio files as inputs. Without going into details of the processes themselves, it is easy to imagine the difference between analyzing a set of 500 rows of text data versus 500 video or audio files. Typically, training data sets are orders of magnitude larger than that in real-world use cases.


As discussed in the last two threads on this topic, enterprises often underestimate the time, effort, and cost while planning to implement AI to improve their business outcomes. There are several documented examples of trying to implement AI applications within the enterprise ecosystem, which has led to a 3 to 5-year data consolidation and transformation program. While some will go for the big bang approach, it is not the only way to get there. A focused and well-thought-out AI strategy can target specific business outcomes and ring-fence the data requirements after a detailed analysis. We can also execute several such smaller projects in parallel that have relatively independent data requirements and could be spread across various enterprise divisions. These can also be coupled with existing data transformation programs that may be underway within specific areas of the enterprise. There could be other benefits of adopting a strategy like this whereby smaller project help improve the overall data quality of the enterprise and enable it to execute bigger, cross-division transformation programs.

There are different ways an enterprise can successfully transit into the world of data-driven decision-making powered by AI – one size certainly does not fit all! However, no matter which path one chooses, it is imperative that such programs are seen as part of the larger data ecosystem of the enterprise and the right resources are allocated at the appropriate stages.

Related Articles

Want to explore all the ways you can start, run & grow your business?