Today, data engineering in modern enterprises has become increasingly more complex and resource-consuming, particularly because (1) the rich amount of organizational data is often distributed across data centers, cloud regions, or even cloud providers, and (2) the complexity of the big data stack has been quickly increasing over the past few years with an explosion in big-data analytics and machine-learning engines (like MapReduce, Hive, Spark, Presto, Tensorflow, PyTorch to name a few).
To address these challenges, it is critical to provide a single and logical namespace to federate different storage services, on-prem or cloud-native, to abstract away the data heterogeneity, while providing data locality to improve the computation performance. [Bin Fan] will share his observation and lessons learned in designing, architecting, and implementing such a system – Alluxio open-source project — since 2015.
This will be a hybrid event. Attendees can join both in-person as well as online.