Big Data Engineer - PySpark & Hadoop Specialist

Synechron Technologies

Pune

Not disclosed

Work from Office

Full Time

Min. 4 years

Job Details

Job Description

Big Data Engineer | PySpark, Hadoop Ecosystem, Cloud Integration & Data Migration

Job Summary
Synechron is seeking a seasoned Big Data Engineer specialized in PySpark to support complex data processing and ETL workflows within enterprise environments. The role involves designing, developing, and optimizing scalable data pipelines supporting analytics, data migration, and high-volume processing needs. The candidate will leverage their expertise in Hadoop ecosystem components, distributed computing, and storage formats to deliver high-performance, maintainable solutions aligned with business and regulatory requirements.

Software Requirements

Required Software Proficiency:

  • SQL (T-SQL, HiveQL, or ANSI SQL) — strong skills supporting data validation, query optimization, and data management (4+ years)

  • Hadoop Ecosystem: HDFS, Hive, Pig, Sqoop, Spark, or Impala — extensive experience supporting large-scale data processing and pipeline development (4+ years)

  • Data Ingestion and ETL tools supporting enterprise workflows — proven ability to develop and optimize data pipelines (4+ years)

  • Distributed computing concepts (MapReduce, Spark) supporting high-volume data processing

  • Knowledge of file formats: Parquet, ORC, Avro, JSON, CSV — supporting data storage and retrieval efficiency

  • Performance tuning for queries and data pipelines supporting operational and analytical workloads

  • Scripting skills: Python, Shell, or Scala support automation and pipeline scripting (preferred)

Preferred Software Skills:

  • Cloud data platforms (Azure, AWS, GCP) supporting scalable data processing (supporting deployment, storage, and processing)

  • Data workflow orchestration tools supporting automation of data pipelines (e.g., Apache Airflow, Oozie)

Overall Responsibilities

  • Design, develop, and optimize scalable data pipelines supporting analytics, migration, and operational reporting

  • Build high-performance ETL workflows using PySpark, Spark SQL, and Hadoop ecosystem components

  • Support data ingestion, transformation, and validation activities ensuring data quality and consistency

  • Collaborate with data science, data engineering, and business teams to translate requirements into technical solutions

  • Tune performance of data queries, Spark jobs, and storage formats to support high-volume workloads

  • Implement data governance, security, and compliance practices supporting industry standards and regulations

  • Maintain operational documentation, data lineage, and best practices for pipeline management

  • Lead efforts to improve automation, pipeline reliability, and system scalability supporting enterprise growth

Technical Skills (By Category)

  • Languages & Data Tools (Essential):

    • Python, Spark SQL, HiveQL, or ANSI SQL supporting scalable data transformations and queries

    • Hadoop ecosystem components: HDFS, Hive, Pig, Sqoop, Impala supporting large-scale data pipelines

  • Databases & Data Management:

    • Relational: SQL Server, Oracle, PostgreSQL support for transactional and reference data validation

    • Data storage formats: Parquet, ORC, Avro support efficient data management and retrieval

  • Cloud & Infrastructure:

    • Support for cloud platforms (Azure, AWS, GCP) supporting scalable storage and processing (preferred)

    • Data orchestration tools supporting automation (e.g., Airflow, Oozie) (preferred)

  • Frameworks & Libraries:

    • PySpark, Spark SQL support for large-scale data transformation and processing

  • Tools & Methodologies:

    • ETL/ELT development, workflow automation, performance tuning practices supporting agile environments

  • Security & Governance:

    • Data masking, encryption, and access controls aligned with compliance standards (HIPAA, GDPR) support

Experience Requirements

  • 4+ years of experience supporting large-scale data processing, data pipelines, and ETL workflows in enterprise environments

  • Proven expertise in Hadoop ecosystem components, Spark, and distributed data processing support

  • Experience in data validation, reconciliation, and storage optimization supporting analytics and migration

  • Knowledge in supporting regulated environments with compliance, security, and data governance standards (preferred)

  • Alternative pathways include extensive experience in data engineering, supporting high-volume data systems, and automation

Day-to-Day Activities

  • Develop, test, and optimize data pipelines using PySpark, Hive, and Hadoop ecosystem components

  • Support data ingestion, transformation, and validation supporting business analytics and migration projects

  • Monitor system performance, troubleshoot data processing issues, and implement optimizations

  • Collaborate with data analysts, data scientists, and enterprise data teams on technical solutions

  • Support cloud or on-premises data warehouse environments supporting enterprise analytics

  • Implement and support data governance practices, security controls, and compliance measures

  • Maintain detailed documentation supporting operational procedures, data flows, and data lineage

  • Automate workflows and iteratively improve pipeline reliability and performance

Qualifications

  • Bachelor’s or Master’s degree in Data Engineering, Computer Science, or a related field

  • 4+ years supporting big data solutions, ETL workflows, and data migration in enterprise settings

  • Experience with Hadoop ecosystem, Spark, and distributed data processing platforms

  • Support for cloud data services supporting large-scale, high-volume workloads (preferred)

  • Certifications in Hadoop, Spark, or cloud platforms (e.g., AWS, GCP, Azure) are a plus

Professional Competencies

  • Strong analytical and troubleshooting skills supporting complex data workflows

  • Leadership skills to guide junior team members and promote best practices in data engineering

  • Excellent communication for stakeholder engagement, documentation, and reporting

  • Adaptability to evolving data standards, tools, and regulatory frameworks

  • Commitment to data quality, security, and operational efficiency

  • Time management and organizational skills for handling multiple data projects in a fast-paced environment

S​YNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.


All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Experience Level

Senior Level

Job role

Work location

Pune - Hinjewadi (Ascendas), India

Department

Data Science & Analytics

Role / Category

DBA / Data warehousing

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 4 years

About company

Name

Synechron Technologies

Job posted by Synechron Technologies

Apply on company website