Technical Lead - Data Engineering with Databricks and PySpark

CRISIL Ltd

Mumbai/Bombay

Not disclosed

Work from Office

Full Time

Min. 10 years

Job Details

Job Description

Technical Lead – Databricks & PySpark

Department

None

Job Description

We are seeking a highly skilled Technical Lead with strong expertise in Databricks, Python, and PySpark to lead data engineering initiatives. The ideal candidate will drive the design, development, and optimization of scalable data pipelines while mentoring a team of engineers and collaborating with cross-functional stakeholders.

Key Responsibilities

  • Lead the design and development of data pipelines and ETL/ELT workflows using Databricks and PySpark
  • Architect and implement scalable, high-performance data solutions on cloud platforms (AWS/GCP)
  • Collaborate with data architects, analysts, and business teams to translate requirements into technical solutions
  • Optimize data processing jobs for performance, reliability, and cost efficiency
  • Ensure data quality, governance, and security standards are followed
  • Mentor and guide junior engineers; perform code reviews and enforce best practices
  • Drive adoption of CI/CD, DevOps, and automated testing in data engineering workflows
  • Troubleshoot and resolve production issues, ensuring high availability of data systems

Required Skills & Qualifications

  • Strong experience in Python and PySpark development
  • Hands-on expertise with Databricks (workflows, Delta Lake, notebooks, cluster management)
  • Solid understanding of data engineering concepts, distributed computing, and big data processing
  • Experience with SQL and relational/NoSQL databases
  • Expertise in data modeling, partitioning, and performance tuning
  • Proficiency with cloud platforms (AWS/GCP equivalents)
  • Familiarity with Delta Lake, streaming (Structured Streaming), and batch workloads
  • Strong knowledge of Git, CI/CD pipelines, and DevOps practices
  • Experience with workflow orchestration tools (Airflow, Temporal, etc.)

Preferred Qualifications

  • Experience with data warehousing and lakehouse architecture
  • Knowledge of ML pipelines or MLOps integration
  • Exposure to data governance tools and frameworks
  • Certification in Databricks is a plus

Leadership & Soft Skills

  • Proven experience in technical leadership and team management
  • Strong problem-solving and analytical abilities
  • Excellent communication and stakeholder management skills
  • Ability to work in an agile environment and handle multiple priorities

Key Deliverables

  • High-quality, scalable data pipelines
  • Optimized data workflows in Databricks
  • Well-documented architecture and processes
  • Mentored and productive engineering team

  

 

 

Case Study: Financial Data Engineering Solution on Databricks

Background

A financial services company processes large volumes of data from multiple systems:

  • Trade transactions (Equities, Derivatives, FX)
  • Market data feeds (real-time stock prices, indices)
  • Customer/account data (KYC, portfolios)
  • Risk and compliance data

The existing system suffers from:

  • High latency in risk reporting
  • Data inconsistency across systems
  • Lack of real-time insights
  • Scalability challenges

The company wants to implement a modern lakehouse architecture using Databricks to enable real-time risk analytics, regulatory reporting, and portfolio insights.

 

Objective

Design and build a scalable, secure, and high-performance financial data platform using Databricks and PySpark to support:

  • Near real-time trade and risk analytics
  • Regulatory reporting (e.g., daily reporting, audit trails)
  • Historical analysis for portfolio performance

 

Task Requirements

1. Data Ingestion

  • Ingest data from:
    • Trade data (batch files / APIs)
    • Real-time market feeds (Kafka/Event Hub)
    • Reference data (customer, instruments)
  • Use:
    • Databricks Auto Loader for batch ingestion
    • Structured Streaming for real-time feeds

 

2. Data Transformation

  • Perform:
    • Data cleansing (nulls, incorrect formats)
    • Trade enrichment (join with instrument & customer data)
    • Currency conversion using FX rates
  • Implement key business logic:
    • Daily P&L calculations
    • Exposure aggregation (by asset class, customer, region)
    • Risk metrics (VaR, notional exposure)

 

3. Data Storage (Lakehouse Design)

  • Implement Medallion Architecture:
    • Bronze: Raw ingested data
    • Silver: Cleaned & standardized data
    • Gold: Aggregated datasets for reporting
  • Use Delta Lake features:
    • ACID transactions
    • Time travel (for audit and compliance)
    • Schema evolution

 

 

4. Performance Optimization

  • Optimize PySpark pipelines:
    • Partitioning by trade date, asset class
    • Z-ordering on frequently queried columns (e.g., account_id)
    • Cache intermediate datasets
  • Tune cluster configurations (autoscaling, job clusters)

 

5. Data Quality & Governance

  • Implement:
    • Data validation rules (e.g., missing trade IDs, invalid prices)
    • Reconciliation checks (trade counts vs source)
  • Ensure:
    • Data lineage tracking
    • Role-based access control (RBAC)
    • Sensitive data masking (PII, financial data)

 

6. Streaming & Real-Time Processing

  • Build streaming pipelines for:
    • Real-time market data ingestion
    • Intraday risk calculations
  • Ensure:
    • Low latency processing
    • Fault-tolerant design (checkpointing, retries)

 

7. Orchestration

  • Implement pipeline orchestration using:
    • Databricks Workflows / Airflow / Azure Data Factory
  • Handle:
    • Dependencies (e.g., reference data before trade enrichment)
    • Job retries and alerts

 

8. CI/CD & Deployment

  • Use Git-based workflows:
    • Branching strategy
    • Code reviews
  • Implement CI/CD pipelines for:
    • Automated testing
    • Deployment to environments (Dev/Test/Prod)

Open Positions

1

Mandatory Skills

Pyspark,databrics,Data Engineer,Lead Data Engineer,Python

Education Qualification

Post Graduation or Graduation in Computers or it's equalent

Experience

10 to 12 years

Job role

Work location

Hyderabad / Mumbai

Department

Data Science & Analytics

Role / Category

Data Science & Machine Learning

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 10 years

About company

Name

CRISIL Ltd

Job posted by CRISIL Ltd

Apply on company website