MODULES
COMCAST (KEB-ATHENA PROJECT)
(KEB)-ATHENA
Build a data lake that acts as a single source of truth for all Comcast internal business applications regarding sales and marketing data. We pull data from more than 50+ sources, merge with our business and contact data, build efficient data sets which helps in streamlining the process and helps business in making important decisions.
Initially built 200+ ETL Workflows using Shell, Python Scripting, Pentaho tool and scheduled them using UC4. Developed few RESTful end points using Spring Boot and deployed using Pivotal Cloud Foundry. Currently working on migrating them to AWS(Cloud) and kicking off Athena V2 (API based architecture).
MigRATION TO AWS (POC)
Migrate 200+ ETL workflows, Oracle DWH to AWS using Glue and Redshift
ATHENA-EBI AUTOMATION
Built process to read HDFS files, build external HIVE tables, run ranking logic, stitch with business data, and write to Oracle
aTHENA-EBI INTEGRATION
Build the process to stitch the statistical model data with the business data using reference/driver/metadata tables
ATHENA-EBI ONBOARDING
Get access to HDFS files, build shell scripts to build external hive tables based on the model location and aggregation logic
SEARCH BY PLACE API ENDPOINT
Adding search feature by passing place ID -RESTful End point using Spring, Pivotal Cloud Foundry
CHANGE DATA CAPTURE/DELTA PROCESS
Built CDC process for Athena Data Lake, including hashing, versioning changes, error logging, performance monitoring using shell, python scripting and Pentaho
BUILD MATCH AND MERGE FLOW ON DATABRICKS
Run Match and Merge Component for Contact Matching/Deduplcation using Python's Record Linkage Library and Spark Jobs on Databricks