Grab & DataHub Community Case Study | Summary and Q&A

TL;DR
Grab, the leading super app in Southeast Asia, shares their experience improving data discoverability and governance using computational data governance with data hub.
Key Insights
- 🌟 Grab is the leading super app in Southeast Asia, operating across eight countries and serving nearly 700 million people, generating a huge volume of data that is used by Grabbers on a daily basis.
- 🌐 Metadata management is crucial for Grab to ensure data discoverability and governance, and they have chosen to invest in Data Hub for its autonomy, flexibility, and strong community support.
- 👥 Data Hub has been highly customizable for Grab, allowing them to streamline their release process and make code changes for data discovery and governance use cases.
- ⚡️ Data Hub's metadata scalability has allowed Grab to ingest 3.5 million metadata change proposals daily, enabling the ingestion of metadata for over 100,000 people within just 15 minutes using Presto on Hive plugin.
- 🛠 Data Hub's modular operation and open-sourced features, such as Presto on Hive plugin, have greatly improved grab's metadata ingestion time and efficiency.
- 🔎 Data Hub's extensible metadata model has allowed Grab to add new entities and aspects, such as a time series aspect and a generic entity called "others," to cater to unique use cases and democratize data discovery across all entities.
- 👥 Grab has leveraged Data Hub to facilitate computational data governance, ensuring that data is treated as something owned and managed by everyone in the company rather than belonging to a single role or department.
- 🔒 With Data Hub, Grab has been able to establish data governance processes, such as information classification and ownership validation, to ensure the right level of protection and access control for their valuable data.
Transcript
hi everyone and thanks to Maggie and the rest of the wonderful team at the crew data for having us today uh I'm sorry this will be recording but our team is based in Singapore and we're trying to keep a healthy schedule just kidding obviously working in data engineering we're quite nocturnal but uh we just wanted to save you from our late night pha... Read More
Questions & Answers
Q: What challenges did Grab face in data discoverability and governance?
Grab faced challenges in finding the right data in a timely manner and ensuring proper governance measures were in place to protect and manage their data assets. They needed a solution that could address these challenges effectively.
Q: Why did Grab choose to invest in data hub?
Grab decided to invest in data hub for several reasons. Firstly, it offered autonomy and flexibility, allowing them to customize it according to their specific data needs. Secondly, it had a mature technology and advanced metadata architecture. Lastly, the strong community behind data hub was a major factor in their decision, with almost 5,000 people contributing to its development.
Q: How did Grab improve data discoverability using data hub?
Grab scaled their data hub infrastructure to handle 3.5 million metadata change proposals per day and achieved ingestion of metadata for over 100,000 people within 15 minutes. They also enhanced data discovery by adding new entities and aspects to the metadata graph and utilizing a new plugin called Preston Hive for faster injection.
Q: How did Grab implement computational data governance with data hub?
Grab established data governance by classifying the sensitivity level of their information and applying appropriate protection measures. They modeled metadata into glossary terms and nodes, added validations for user interface updates, and utilized the data hub actions framework for enforcing ownership and access controls. This allowed users to self-serve and ensured compliance with governance policies.
Summary & Key Takeaways
-
Grab, the leading super app in Southeast Asia, operates across eight countries and generates a huge volume of data.
-
They recognized the importance of metadata management for data discoverability and governance.
-
After evaluating multiple options, they decided to invest in data hub for its autonomy, flexibility, and the strong community driving its development.