Test Data in ERP Projects

Or: how to provide meaningful data for tests while preparing a data migration.

Introduction

Very often in multi-functional projects — such as an ERP implementation — different teams will evaluate their solution independently from each other until the later stages of testing. And regularly, the data migration project works the same way: identifying and validating data transformation in parallel with the rest of the project.

The risk appears that the new solution and the new processes are tested with “positive” data — data meant to produce an expected result — and not with “negative” data, which is not meant to produce a negative result but does cause one. With today’s integrated ERP solutions, different values entered in a functional area can negatively impact another for unforeseen reasons.

To reduce that risk, and also to accelerate the design of the data transformation, it is useful to work with real data and to sample it in a smart way. Real data is extracted from the operational databases, transformed for the new data model as necessary, and injected into the test environment. Let’s look at this process in more detail.

Sampling real data

There are 2 prerequisites.

You must know what data, or what kind of data, is needed. Do we need products and customers to test an order-management solution? Do we need bills of materials? Do we need raw materials, engineering documents, demand forecasts, historical data?
You must obtain agreement from all parties engaged. Once the scope is understood, we can start picking pieces of data — but it is essential to explain the sampling process and obtain alignment about it. Every group will have different requirements, from high level to very detailed, and will demand data to satisfy all these requirements. In my experience, the less detailed the requirements are, the more data (in volume) is asked for. I call this the “just-in-case (I need more)” syndrome.

No alignment is a sure path to scope creep and change requests.

To make a good sample, we must also bear in mind that this is a dynamic exercise. Most tests can start with a basic sample. But it needs to evolve, in terms of breadth and volume — we can work it up from a few records to hundreds of thousands.

Building a basic sample

In most circumstances, the business people will know best. No matter what the project is about, they know what the key business cases are and they know the data used for executing them. For example, in goods manufacturing: your business people will know what the different goods are, what makes them different from each other, how they are made or sold. That enables you to propose a sample based on types or categories, production methods, sites, trade channels, transportation means, accounting methods, warehousing methods, and so on. In asset management, it might be the classes of assets, their depreciation rules, the different kinds of charges they cause.

A basic sample like this will usually do fine for early test stages, where the need is higher for versatility of the data rather than volume. But as testing progresses, more data is required — to repeat test scenarios with an increasing number of small variances, and to provide data for training purposes. For now, we have a basic sample and, because of its small size, we can manipulate it manually — updating records one by one if needed — and transform it as required for the target solution. Beware, however, of hand-picking the data. You want to define a set of rules to select a sample, not to pick records directly from a list or from a database.

Why should we prepare the initial sample with business people? Because as we try to transform the data, we challenge — early on — data quality and transformation rules with data our business people can relate to. We are not working with artificial data created from assumptions, but with actual, operational, current information. That helps create commitment from the people, working with their data and not with some data. It also helps getting into the action before blueprinting fully completes. Not all teams complete their blueprinting at the same time, and that usually does not mean one can sit idle until everyone else catches up. They will want to start testing, if only to validate some assumptions. With this approach, we have the key data elements and business cases covered — we can leap into action without losing any effort for the main data migration.

To summarise, we enable:

Commitment from business people
Data for prototyping and early tests
Data that is consistent across the scope of the project, and not for a single team alone
An early move into action

What if you are challenged, and some people want a lot more data or different data? In my experience, challenging back with one short question does wonders: why? Why is the initial sample not working for them? Why do they need more or different data? Very often, the additional data will anyway not be used — it is only just-in-case. With that information in your hands, you may factually push back, or accept, depending on time and resources.

Profiling and business intelligence

Ideally, this exercise needs to happen early in the project — before or during the blueprinting phase. At the very least, it should be done immediately after creating an initial sample.

Roughly speaking, profiling is about identifying patterns in the data. It is often used with data quality in mind: it helps define data quality rules, and we easily stop there. But the profiling itself can achieve more. We can use the vast amount of information collected to understand how the business is structured. Looking into the operational or transactional data, we can gather:

Master data, such as customers, materials or assets
Transactional data, such as sales orders or bills of lading
Reference data, such as order types, plant codes or terms of payment

This data will help answer questions such as: “what kind of orders are used by what sort of customers, in which countries, and from which warehouse are they supplied?”

We cannot get into more details here, as there are many particular business cases — but the logic is simple. With all this information, we have a proper grasp of the data and the business logic that unites it. We are ready to discuss the new samples.

Re-sampling

We are now able to break the data down according to the business scenarios that we need to test. And we can still limit the data with additional filters like a period of time — for example, selecting the data used in the last 6 months of operations. Obviously, there will be scenarios that do not fit within such a window. But they will be easily identified by comparing the profiling with the data selected, and the related data can then be added to the sample.

Let’s take our sales orders again: we might have identified the types of orders used by the different customers, the products they typically purchase, the shipping conditions agreed, and so on. To help us with sampling the data, we can also make rankings and select, for example, the top 10 customers in terms of volume of sales or in terms of volume of transactions. We can also identify the products making 80% of the sales. By combining the identified business cases with these rankings, we can build a very strong database.

Supporting the project even further

With this new set of information, we are now able to assist the general project with deeper insight into some business scenarios. We have an inventory of transactions and trends, in such a way that we can rank them — but also in a way that we can pinpoint meaningful exceptions (a once-a-year order that amounts to a fair size of the turnover, for example) and oddities. All of these facts can help improve the blueprint by including overlooked cases and removing noise.

Conclusion

I have shown the interest of building samples and executing data profiling for business intelligence early in the project. This new set of information can support the data migration and the general project with high efficiency in several ways:

Involve business people early and constructively
Heighten the understanding of the business for all parties involved
Provide meaningful data for testing with manageable volumes

Sometimes it is difficult to follow this approach. There are customers who will reject data profiling for fear of what it might reveal, and people who will challenge the findings. There is no “one size fits all”. In such situations, I try to educate my stakeholders with a few examples, or to keep the results for the project team alone.