MODULES

COMCAST (KEB-ATHENA PROJECT)

(KEB)-ATHENA

Build a data lake that acts as a single source of truth for all Comcast internal business applications regarding sales and marketing data. We pull data from more than 50+ sources, merge with our business and contact data, build efficient data sets which helps in streamlining the process and helps business in making important decisions.


Initially built 200+ ETL Workflows using Shell, Python Scripting, Pentaho tool and scheduled them using UC4. Developed few RESTful end points using Spring Boot and deployed using Pivotal Cloud Foundry. Currently working on migrating them to AWS(Cloud) and kicking off Athena V2 (API based architecture).

MigRATION TO AWS (POC)

Migrate 200+ ETL workflows, Oracle DWH to AWS using Glue and Redshift

ATHENA-EBI AUTOMATION

Built process to read HDFS files, build external HIVE tables, run ranking logic, stitch with business data, and write to Oracle

aTHENA-EBI INTEGRATION

Build the process to stitch the statistical model data with the business data using reference/driver/metadata tables

ATHENA-EBI ONBOARDING

Get access to HDFS files, build shell scripts to build external hive tables based on the model location and aggregation logic

SEARCH BY PLACE API ENDPOINT

Adding search feature by passing place ID -RESTful End point using Spring, Pivotal Cloud Foundry

CHANGE DATA CAPTURE/DELTA PROCESS

Built CDC process for Athena Data Lake, including hashing, versioning changes, error logging, performance monitoring using shell, python scripting and Pentaho

BUILD MATCH AND MERGE FLOW ON DATABRICKS

Run Match and Merge Component for Contact Matching/Deduplcation using Python's Record Linkage Library and Spark Jobs on Databricks