Phase 2: Hadoop Migration Business Requirements Definitions
The purpose of this phase is to come to an agreement on defining business requirements. This process can be difficult if the output is not clearly defined. This blog post is related to a large Hadoop migration that I did at Microsoft so I will use that project as an example.
The goal of this project was to move from one technology stack to another. Whatever target tables were being created in the original system should be duplicated in the new system. At least that was the goal. For a large migration, this becomes non-trivial.
What makes defining business requirements difficult?
- Owners do not exist for business definitions
For example, consider a situation where someone defined a metric, did a poor job documenting the metric, then left the company. These are difficult problems to solve. - Business definitions not exist
For example, most engineers at small companies will be happy saying “the documentation is in the code.” This is unfortunate and something overlooked at small companies because code is less likely to be shared. In large corporations, this becomes an issue because you’re more likely to work in teams and share code. As such, you’re more likely to find good documentation at large companies. My solution to that problem was to get a consensus on ambiguous definitions and make good educated guesses. - The business logic in the new system is different from the new system.
For example, what happens when you find a bug in the original system? Can you fix the bug in the original system? If not, can you fix the bug in the new system? If you do fix the bug in the new system, how will you show it has been fixed and how will you prepare analysts for the new data? My solution to that problem was to provide analysis on why the original data was wrong, fix it in the new system and show how much the data changed.
To illustrate the problem of business logic changing, consider these two rough ER diagrams below. The diagram on the left shows reporting tables and sources for those reporting tables. The diagram on the right shows the new system. The new system was distinctly different from the old system. Business logic was drastically refactored. When I was at the business requirements phase of this project, I looked at these two diagrams and told myself it was technically possible to achieve this outcome. What I did not do was accurately estimate how hard it would be to achieve this outcome. Underestimating the effort to do the refactoring resulted in a lot of back and forth between myself and manager to convince them that the cost was worth it.
Before | After |
![]() |
![]() |
Advice for defining business requirements
- Prioritize which metrics and reports are important
If time becomes a problem, then you will want to deliver the most important reports and metrics first. - The more refactoring you do, the more you should prepare for scope creep.
This is particularly true for business logic. Refactoring business logic becomes a tedious process that leads to you digging into the weeds of data and processes. It leads to scope creep and unexpected meetings. Solving complicated problems almost always take more time than you originally thought. - Don’t be afraid to advertise expensive costs.
Complex problems should not be undersold. It is better for everyone to know up front that an easy problem is actually difficult to solve.
Conclusion
Defining business requirements is a necessary evil that may save you some headaches down the road. Do not be afraid to oversell the cost of some problems and undersell your ability to fix them. Most engineers probably hate underselling their abilities but the people your speaking to may not be as technically literate about the project as yourself. If they hold you to unrealistic expectations then you will be paying for it later. Complicated migration jobs generally have land mines and scope creep that you will not predict.