What is the eligibility criteria to apply for Technical Lead - Data Engineering with Databricks and PySpark in CRISIL Ltd in undefined?

The candidate should have completed undefined degree and people who have 10 to 12 years are eligible to apply for this job. You can apply for more jobs in undefined to get hired quickly.

Is there any specific skill required for this job?

The candidate should have undefined skills and sound communication skills for this job.

Who can apply for this job?

Both Male and Female candidates can apply for this job.

Is it a work from home job?

No, it’s not a work from home job and can’t be done online. You can explore and apply for other work from home jobs in undefined at apna.

Are there any charges or deposits required while applying for the role or while joining?

No work-related deposit needs to be made during your employment with the company.

How can I apply for this job?

Go to the apna app and apply for this job. Click on the apply button and call HR directly to schedule your interview.

What is the last date to apply?

The last date to apply for this job is . For more details, download apna app and find Full Time jobs in undefined . Through apna, you can find jobs in 74 cities across India. Join NOW

Technical Lead - Data Engineering with Databricks and PySpark

CRISIL Ltd

Mumbai/Bombay

Not disclosed

Work from Office

Full Time

Min. 10 years

Job Details

Job Description

Technical Lead – Databricks & PySpark

Department

None

Job Description

We are seeking a highly skilled Technical Lead with strong expertise in Databricks, Python, and PySpark to lead data engineering initiatives. The ideal candidate will drive the design, development, and optimization of scalable data pipelines while mentoring a team of engineers and collaborating with cross-functional stakeholders.

Key Responsibilities

Lead the design and development of data pipelines and ETL/ELT workflows using Databricks and PySpark
Architect and implement scalable, high-performance data solutions on cloud platforms (AWS/GCP)
Collaborate with data architects, analysts, and business teams to translate requirements into technical solutions
Optimize data processing jobs for performance, reliability, and cost efficiency
Ensure data quality, governance, and security standards are followed
Mentor and guide junior engineers; perform code reviews and enforce best practices
Drive adoption of CI/CD, DevOps, and automated testing in data engineering workflows
Troubleshoot and resolve production issues, ensuring high availability of data systems

Required Skills & Qualifications

Strong experience in Python and PySpark development
Hands-on expertise with Databricks (workflows, Delta Lake, notebooks, cluster management)
Solid understanding of data engineering concepts, distributed computing, and big data processing
Experience with SQL and relational/NoSQL databases
Expertise in data modeling, partitioning, and performance tuning
Proficiency with cloud platforms (AWS/GCP equivalents)
Familiarity with Delta Lake, streaming (Structured Streaming), and batch workloads
Strong knowledge of Git, CI/CD pipelines, and DevOps practices
Experience with workflow orchestration tools (Airflow, Temporal, etc.)

Preferred Qualifications

Experience with data warehousing and lakehouse architecture
Knowledge of ML pipelines or MLOps integration
Exposure to data governance tools and frameworks
Certification in Databricks is a plus

Leadership & Soft Skills

Proven experience in technical leadership and team management
Strong problem-solving and analytical abilities
Excellent communication and stakeholder management skills
Ability to work in an agile environment and handle multiple priorities

Key Deliverables

High-quality, scalable data pipelines
Optimized data workflows in Databricks
Well-documented architecture and processes
Mentored and productive engineering team

Case Study: Financial Data Engineering Solution on Databricks

Background

A financial services company processes large volumes of data from multiple systems:

Trade transactions (Equities, Derivatives, FX)
Market data feeds (real-time stock prices, indices)
Customer/account data (KYC, portfolios)
Risk and compliance data

The existing system suffers from:

High latency in risk reporting
Data inconsistency across systems
Lack of real-time insights
Scalability challenges

The company wants to implement a modern lakehouse architecture using Databricks to enable real-time risk analytics, regulatory reporting, and portfolio insights.

Objective

Design and build a scalable, secure, and high-performance financial data platform using Databricks and PySpark to support:

Near real-time trade and risk analytics
Regulatory reporting (e.g., daily reporting, audit trails)
Historical analysis for portfolio performance

Task Requirements

1. Data Ingestion

Ingest data from:
- Trade data (batch files / APIs)
- Real-time market feeds (Kafka/Event Hub)
- Reference data (customer, instruments)
Use:
- Databricks Auto Loader for batch ingestion
- Structured Streaming for real-time feeds

2. Data Transformation

Perform:
- Data cleansing (nulls, incorrect formats)
- Trade enrichment (join with instrument & customer data)
- Currency conversion using FX rates
Implement key business logic:
- Daily P&L calculations
- Exposure aggregation (by asset class, customer, region)
- Risk metrics (VaR, notional exposure)

3. Data Storage (Lakehouse Design)

Implement Medallion Architecture:
- Bronze: Raw ingested data
- Silver: Cleaned & standardized data
- Gold: Aggregated datasets for reporting
Use Delta Lake features:
- ACID transactions
- Time travel (for audit and compliance)
- Schema evolution

4. Performance Optimization

Optimize PySpark pipelines:
- Partitioning by trade date, asset class
- Z-ordering on frequently queried columns (e.g., account_id)
- Cache intermediate datasets
Tune cluster configurations (autoscaling, job clusters)

5. Data Quality & Governance

Implement:
- Data validation rules (e.g., missing trade IDs, invalid prices)
- Reconciliation checks (trade counts vs source)
Ensure:
- Data lineage tracking
- Role-based access control (RBAC)
- Sensitive data masking (PII, financial data)

6. Streaming & Real-Time Processing

Build streaming pipelines for:
- Real-time market data ingestion
- Intraday risk calculations
Ensure:
- Low latency processing
- Fault-tolerant design (checkpointing, retries)

7. Orchestration

Implement pipeline orchestration using:
- Databricks Workflows / Airflow / Azure Data Factory
Handle:
- Dependencies (e.g., reference data before trade enrichment)
- Job retries and alerts

8. CI/CD & Deployment

Use Git-based workflows:
- Branching strategy
- Code reviews
Implement CI/CD pipelines for:
- Automated testing
- Deployment to environments (Dev/Test/Prod)

Open Positions

Mandatory Skills

Pyspark,databrics,Data Engineer,Lead Data Engineer,Python

Education Qualification

Post Graduation or Graduation in Computers or it's equalent

Experience

10 to 12 years

Job role

Work location

Hyderabad / Mumbai

Department

Data Science & Analytics

Role / Category

Data Science & Machine Learning

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 10 years

About company

Name

CRISIL Ltd

Job posted by CRISIL Ltd

Apply on company website

AI Resume builder

AI Resume checker

AI Cover letter generator

Blog

Technical Lead - Data Engineering with Databricks and PySpark

Department

Job Description

Open Positions

Mandatory Skills

Education Qualification

Experience

Job role

Job requirements

About company