Job Details
Type: Full Time
Post Date: 30+ days ago
Industry: Data And Analytics
Job Description
Overview
G42 Healthcare is an Abu Dhabi based health technology company active across data and AI, digital health, advanced OMICS, pharma, diagnostics and environmental sciences.
G42 Healthcare has recently merged with the healthcare arm of the Abu Dhabi Sovereign Fund Mubadala (Mubadala Health) creating M42, a unique organisation with over 7,000 staff spanning omics, technology and healthcare provision. We have recently acquired a 13,000 staff global leader in dialysis making M42 the biggest healthcare provider in the region.
G42 Healthcare is committed to developing a world-class healthcare sector in the UAE as well as around the world. We aim to empower healthcare outcomes, clinical operations, and medical R&D by harnessing the power of data & AI to unlock the true potential of personalized and preventive care.
We are seeking an experienced data engineer to join our team building a scalable and secure health data platform. In this role, you will design, build and optimise data pipelines (batch & streaming) for big data systems. Extracting, Analysing and Modelling of rich & diverse health data sets.
Responsibilities
Design and implement data pipelines, ETL processes, schemas, and data models to ingest, process, and prepare multi-petabyte scale datasets for downstream analytics and machine learning.
Build and optimize data processing systems on modern platforms like Spark, Delta Lake, Kafka, etc.
Implement data quality, validation, and monitoring measures leveraging tools such as Great Expectations.
Ensure compliance with security, access control, and regulatory requirements related to PHI and other sensitive data types.
Support adoption of emerging standards like FHIR for healthcare data exchange.
Collaborate with data scientists, analysts, and engineers to understand data needs and deliver performant, reliable data products.
Keep track of emerging technologies & trends in the Data Engineering world, incorporating modern tooling and best practices at Craft.
Qualifications
4+ years experience building and operating production big data platforms and pipelines.
Strong experience with SQL, Spark, workflow orchestrators, distributed message bus, Python, Presto, Deltalake, apache big data tool suites, Docker, Kubernetes, MPP.
Hands on with the design and implementation of cloud-based data solutions using platforms like AWS, Azure, or GCP, optimizing for scalability, cost-efficiency, and performance.
Implement and maintain data lakes and warehouses, lakehouses including data modeling, ETL processes, and data quality assurance to empower data-driven decision-making.
Develop real-time data pipelines using streaming technologies like Apache Kafka or AWS Event hub, enabling timely insights and actions from incoming data streams.
Manage and enhance distributed data systems (e.g., Hadoop, Spark) to efficiently process large-scale datasets, ensuring data availability and reliability.
Previous experience of working on health data and Azure cloud is a strong plus.
Strong track record of designing and implementing scalable data models, schemas, ETL logic
Experience with data governance, master data management, data pseudonimization and anonymization, and data catalog solutions.
A strong interest in learning new things and team player ethics.
Strong analytical skills and good understanding of data structures and algorithms.
Some exposure to Nextflo and or Nextflow Tower
Nice to have:
Experience building data pipelines for machine learning.
Knowledge of genomics, medical imaging, and/or EHR data domains
Knowledge of HIPAA, HL7 and other healthcare data privacy requirements
Hands on experience with fully managed data warehousing solutions Azure Synapse, AWS Redshift ,Bigquery, Snowflake etc:
Azure Batch & Blob Storage
Design and implement data pipelines, ETL processes, schemas, and data models to ingest, process, and prepare multi-petabyte scale datasets for downstream analytics and machine learning.
Build and optimize data processing systems on modern platforms like Spark, Delta Lake, Kafka, etc.
Implement data quality, validation, and monitoring measures leveraging tools such as Great Expectations.
Ensure compliance with security, access control, and regulatory requirements related to PHI and other sensitive data types.
Support adoption of emerging standards like FHIR for healthcare data exchange.
Collaborate with data scientists, analysts, and engineers to understand data needs and deliver performant, reliable data products.
Keep track of emerging technologies & trends in the Data Engineering world, incorporating modern tooling and best practices at Craft.
Qualifications
4+ years experience building and operating production big data platforms and pipelines.
Strong experience with SQL, Spark, workflow orchestrators, distributed message bus, Python, Presto, Deltalake, apache big data tool suites, Docker, Kubernetes, MPP.
Hands on with the design and implementation of cloud-based data solutions using platforms like AWS, Azure, or GCP, optimizing for scalability, cost-efficiency, and performance.
Implement and maintain data lakes and warehouses, lakehouses including data modeling, ETL processes, and data quality assurance to empower data-driven decision-making.
Develop real-time data pipelines using streaming technologies like Apache Kafka or AWS Event hub, enabling timely insights and actions from incoming data streams.
Manage and enhance distributed data systems (e.g., Hadoop, Spark) to efficiently process large-scale datasets, ensuring data availability and reliability.
Previous experience of working on health data and Azure cloud is a strong plus.
Strong track record of designing and implementing scalable data models, schemas, ETL logic
Experience with data governance, master data management, data pseudonimization and anonymization, and data catalog solutions.
A strong interest in learning new things and team player ethics.
Strong analytical skills and good understanding of data structures and algorithms.
Some exposure to Nextflo and or Nextflow Tower
Nice to have:Experience building data pipelines for machine learning.
Knowledge of genomics, medical imaging, and/or EHR data domains
Knowledge of HIPAA, HL7 and other healthcare data privacy requirements
Hands on experience with fully managed data warehousing solutions Azure Synapse, AWS Redshift ,Bigquery, Snowflake etc:
Azure Batch & Blob Storage
A leading AI & Cloud Computing company based in Abu Dhabi, committed to inventing a better everyday through the power of people and technology.
View All JobsVisit WebsiteJoin Our MuslimJobs Community
Join our community of professionals looking to grow in our careers and in our deen