Our JourneyBackground: The Heart of Our Platform
Coupons are central to our platform, driving user engagement and retention through credit points and discounts. Historically, this critical system resided within our extensive PHP monolith, deeply integrated with almost all major modules like accounting, payment processing, checkout, listings, and item management. While this architecture initially facilitated rapid iteration, its limitations became apparent as our product and team grew.
Why We Decided to Migrate
The coupon domain’s deep entanglement with critical modules increasingly hampered agility and introduced operational risks. As some parts of our system transitioned to microservices while others remained in the monolith, complexity mounted. Our key motivations for this migration included:
- Technical Debt: The monolith’s increasing complexity hindered development and heightened regression risks.
- Scalability: Coupon processing needed to accommodate a growing user base and higher loads.
- Reliability and Ownership: We needed clearer data ownership boundaries and a straightforward path for performance optimization.
What This Blog Post Covers
In this post, I will detail the approach we adopted for migrating our coupon system to a robust, scalable microservice, including our methodology, the challenges we encountered, and the lessons we learned.The Challenges
Migrating a critical domain like coupons involves more than just moving code. Here’s what made this journey particularly complex:
- Tight Coupling: The coupon domain’s pervasive connections to accounting, payments, listing, and item screens meant that moving one piece affected many others.
- Always-On Expectations: Our services needed to remain continuously available without significant downtime or interruptions.
- Data Consistency: The migration demanded meticulous management to prevent any coupon, transaction, or reward notification from being missed.
- Scalability: The new architecture had to match or surpass the original system’s scale.
- Performance: Achieving the monolith’s native speed despite additional network hops was a significant hurdle.
- Unclear Ownership: Ambiguities regarding endpoint and database ownership required resolution before effective progress could be made.
Our Approach: Double Writing and Expand-Contract
To mitigate risks, we implemented a double writing strategy, utilizing the proven expand-contract migration pattern. This involved:
Step 1: Double Writing
During the migration, critical operations in the legacy system were configured to write simultaneously to both the original PHP monolith and the new microservice. This ensured data parity and allowed us to validate the new system without affecting end users.
Step 2: Historic Data Copy
Beyond live operations, we needed to transfer vast amounts of historical coupon data from MySQL in the monolith to Google Spanner in the new service. This included active coupons, past transactions, and records for active coupons, all requiring careful, phased migration.
Step 3: Gradual Deprecation
Once both systems were synchronized, we began progressively retiring coupon-related APIs and workers from the monolith. This involved shifting responsibility for cron jobs, notifications, and reward payouts to the new service.
We incrementally rolled out endpoints behind a feature flag. If any issues arose, we would roll back traffic to zero, fix the problems, and then re-roll out. This ensured that critical endpoint rollouts occurred with minimal disruption.
Step 4: Full Cutover
Once all traffic and logic transitioned to the microservice, we eliminated proxy calls from the monolith. `merpay-coupon-jp` became the sole owner of all coupon-related functionality, ready for a new era of agility and ownership.Digging Deeper: The Old Monolith, the New Stack
While I won’t delve into the generic drawbacks of monoliths here, it’s important to note that our organization has standardized on Golang and microservices, making it the clear choice for the new coupon service. We leveraged existing boilerplates and infrastructure to facilitate rapid and reliable development.
The monolith’s coupon interactions with other modules (old)
How data and API ownership shift in the new architecture (new arch)
Key Implementation Challenges
No migration of this magnitude is without its difficulties. Here are some notable issues and our solutions:
- Robust Data Migration: Migrating 150 million of records—without interruption or corruption—necessitated building tools for resumable, chunked migration, alongside extensive validation routines.
- MySQL to Spanner: This transition required new indexing and data modeling strategies to leverage Spanner effectively while maintaining consistency and performance.
- Error Recovery: We ensured that partial migrations could be safely retried or rolled back.
- Monitoring & Consistency: From the outset, we instrumented the system for comprehensive monitoring, using shadow traffic and parallel runs to validate accuracy across both the old and new systems.
Results and Impact
- Performance Gains: The new microservice resulted in reduced API latencies and improved reliability, even under heavy load.
- Reliable Data Ownership: Singular ownership enabled us to optimize and extend the domain with confidence.
- Operational Agility: Adding features, testing improvements, and scaling the system became significantly easier.
- Onwards and Upwards: The deprecation of monolithic coupon logic represents a crucial step towards fully modernizing our stack.
Key Takeaways
- Start Early with Data Validation: Early and frequent validation for both live and migrated data prevented deeply rooted bugs.
- Robust Dual Write : majority of the complexity in the problem lies in dual write. Ensuring data consistency through proper dual write significantly reduces the challenges, potentially by half.
- Plan for Rollback: A robust rollback plan provided the confidence to proceed aggressively.
- Automate Where Possible: Automated monitoring, migration, and consistency checks proved invaluable, saving both time and providing peace of mind.
- Incremental Rollout Approach: We can utilize various methods such as feature flags or canary deployments. An incremental rollout strategy is beneficial for early issue identification, thereby minimizing the impact radius.
If we were to undertake this project again, we would prioritize even stricter API contracts from the outset and invest further in migration observability tools.
Conclusion: A New Foundation for Innovation
Migrating our coupon system was more than just adopting new technology; it was about establishing a foundation for rapid innovation, reliability, and scalability. With the monolith behind us, we are now poised to deliver better features faster and meet future challenges head-on.
Stay tuned for technical deep dives and open-sourced tooling from our migration!
Got questions or want to share your own migration story? Drop them in the comments below!
Stay tuned for more stories, lessons, and reflections as we continue to learn and grow together.
Follow us on
LinkedIn
Twitter
Facebook
Instagram
Youtube