Technical Lead - Data Engineering with Databricks and PySpark
CRISIL Ltd
Apply on company website
Technical Lead - Data Engineering with Databricks and PySpark
CRISIL Ltd
Mumbai/Bombay
Not disclosed
Job Details
Job Description
Technical Lead – Databricks & PySpark
Department
None
Job Description
We are seeking a highly skilled Technical Lead with strong expertise in Databricks, Python, and PySpark to lead data engineering initiatives. The ideal candidate will drive the design, development, and optimization of scalable data pipelines while mentoring a team of engineers and collaborating with cross-functional stakeholders.
Key Responsibilities
- Lead the design and development of data pipelines and ETL/ELT workflows using Databricks and PySpark
- Architect and implement scalable, high-performance data solutions on cloud platforms (AWS/GCP)
- Collaborate with data architects, analysts, and business teams to translate requirements into technical solutions
- Optimize data processing jobs for performance, reliability, and cost efficiency
- Ensure data quality, governance, and security standards are followed
- Mentor and guide junior engineers; perform code reviews and enforce best practices
- Drive adoption of CI/CD, DevOps, and automated testing in data engineering workflows
- Troubleshoot and resolve production issues, ensuring high availability of data systems
Required Skills & Qualifications
- Strong experience in Python and PySpark development
- Hands-on expertise with Databricks (workflows, Delta Lake, notebooks, cluster management)
- Solid understanding of data engineering concepts, distributed computing, and big data processing
- Experience with SQL and relational/NoSQL databases
- Expertise in data modeling, partitioning, and performance tuning
- Proficiency with cloud platforms (AWS/GCP equivalents)
- Familiarity with Delta Lake, streaming (Structured Streaming), and batch workloads
- Strong knowledge of Git, CI/CD pipelines, and DevOps practices
- Experience with workflow orchestration tools (Airflow, Temporal, etc.)
Preferred Qualifications
- Experience with data warehousing and lakehouse architecture
- Knowledge of ML pipelines or MLOps integration
- Exposure to data governance tools and frameworks
- Certification in Databricks is a plus
Leadership & Soft Skills
- Proven experience in technical leadership and team management
- Strong problem-solving and analytical abilities
- Excellent communication and stakeholder management skills
- Ability to work in an agile environment and handle multiple priorities
Key Deliverables
- High-quality, scalable data pipelines
- Optimized data workflows in Databricks
- Well-documented architecture and processes
- Mentored and productive engineering team
Case Study: Financial Data Engineering Solution on Databricks
Background
A financial services company processes large volumes of data from multiple systems:
- Trade transactions (Equities, Derivatives, FX)
- Market data feeds (real-time stock prices, indices)
- Customer/account data (KYC, portfolios)
- Risk and compliance data
The existing system suffers from:
- High latency in risk reporting
- Data inconsistency across systems
- Lack of real-time insights
- Scalability challenges
The company wants to implement a modern lakehouse architecture using Databricks to enable real-time risk analytics, regulatory reporting, and portfolio insights.
Objective
Design and build a scalable, secure, and high-performance financial data platform using Databricks and PySpark to support:
- Near real-time trade and risk analytics
- Regulatory reporting (e.g., daily reporting, audit trails)
- Historical analysis for portfolio performance
Task Requirements
1. Data Ingestion
- Ingest data from:
- Trade data (batch files / APIs)
- Real-time market feeds (Kafka/Event Hub)
- Reference data (customer, instruments)
- Use:
- Databricks Auto Loader for batch ingestion
- Structured Streaming for real-time feeds
2. Data Transformation
- Perform:
- Data cleansing (nulls, incorrect formats)
- Trade enrichment (join with instrument & customer data)
- Currency conversion using FX rates
- Implement key business logic:
- Daily P&L calculations
- Exposure aggregation (by asset class, customer, region)
- Risk metrics (VaR, notional exposure)
3. Data Storage (Lakehouse Design)
- Implement Medallion Architecture:
- Bronze: Raw ingested data
- Silver: Cleaned & standardized data
- Gold: Aggregated datasets for reporting
- Use Delta Lake features:
- ACID transactions
- Time travel (for audit and compliance)
- Schema evolution
4. Performance Optimization
- Optimize PySpark pipelines:
- Partitioning by trade date, asset class
- Z-ordering on frequently queried columns (e.g., account_id)
- Cache intermediate datasets
- Tune cluster configurations (autoscaling, job clusters)
5. Data Quality & Governance
- Implement:
- Data validation rules (e.g., missing trade IDs, invalid prices)
- Reconciliation checks (trade counts vs source)
- Ensure:
- Data lineage tracking
- Role-based access control (RBAC)
- Sensitive data masking (PII, financial data)
6. Streaming & Real-Time Processing
- Build streaming pipelines for:
- Real-time market data ingestion
- Intraday risk calculations
- Ensure:
- Low latency processing
- Fault-tolerant design (checkpointing, retries)
7. Orchestration
- Implement pipeline orchestration using:
- Databricks Workflows / Airflow / Azure Data Factory
- Handle:
- Dependencies (e.g., reference data before trade enrichment)
- Job retries and alerts
8. CI/CD & Deployment
- Use Git-based workflows:
- Branching strategy
- Code reviews
- Implement CI/CD pipelines for:
- Automated testing
- Deployment to environments (Dev/Test/Prod)
Open Positions
1
Mandatory Skills
Pyspark,databrics,Data Engineer,Lead Data Engineer,Python
Education Qualification
Post Graduation or Graduation in Computers or it's equalent
Experience
10 to 12 years
Job role
Work location
Hyderabad / Mumbai
Department
Data Science & Analytics
Role / Category
Data Science & Machine Learning
Employment type
Full Time
Shift
Day Shift
Job requirements
Experience
Min. 10 years
About company
Name
CRISIL Ltd
Job posted by CRISIL Ltd
Apply on company website