In the Data Engineer / Data Scientist role, you will work closely with our product management and professional services teams to understand and solve complex data engineering, data science, and privacy management challenges. You will look for ways to generalize and integrate your solutions into features of the PHEMI data management platform. The position will include a mix of data engineering and data science, with opportunities to get involved in product development.

As a Data Engineer, you will be responsible for building, validating, troubleshooting, optimizing, and operationalizing data collection, de-identification, quality assurance, curation, and preparation pipelines.

As a Data Science team member, you will be applying machine learning and data science techniques to create analytic solutions for our customers to solve business and healthcare challenges. Examples of data science work we do include automated data classification and de-identification, synthetic data generation, federated machine learning, and image processing at a petabyte-scale. We work with a variety of healthcare data formats including structured and unstructured text, medical imaging, genomic and proteomic data, common healthcare data exchange formats like HL7 and DICOM to name a few.


  • Develop and maintain Spark-based automated data processing pipelines
  • Benchmark, optimize, and troubleshoot pipeline performance and data quality issues
  • Participate in selection and development of data profiling, classification, quality assurance, and de-identification libraries and tools
  • Prepare datasets that meet customer analytics requirements
  • Develop, train, validate, and operationalize ML models
  • Understand and comply with data privacy and governance rules
  • Participate in customer meetings, presentations, consulting, and training

Required Skills

  • High-level proficiency in Spark ecosystem: Spark Core/Scala, PySpark, and Spark SQL
  • High-level proficiency with SQL troubleshooting and optimization, particularly in the context of Spark SQL
  • Production experience with clean up and validation of complex datasets and pipelines
  • Experience with Spark YARN configuration
  • Knowledge of Linux operating system
  • Ability to work cohesively in a team environment
  • Strong communication skills, able to give customer-facing presentations

Preferred Skills

  • Experience with PyTorch and TensorFlow
  • Working knowledge of Python and R
  • Working knowledge of Apache Airflow
  • Familiarity with web-based notebook environments such as Databricks, Jupyter, or Zeppelin

Professional Qualifications & Experience

  • Degree or diploma in Computer Science, Software Engineering or a related field
  • Minimum one year of production experience with Spark in Data Engineering and Data Science projects
  • Please provide a link to your project portfolio with your application

Why you should join

Cool Technology

At PHEMI, you will get to work with various big data technologies like Spark, Accumulo, NiFi, and various Machine Learning frameworks; and push the envelope in advancing the state of the art in big data privacy, security, and governance.

Smart People

Our growing team of engineers has many years of big data and complex systems engineering expertise. They are supported by a PHEMI leadership team that includes some of the most innovative and experienced entrepreneurs in BC who have successfully founded and grown several startups that have generated over $1 billion in shareholder value. The team is rounded out with strong sales & marketing leadership and battle-hardened storage, data warehouse, data science, and medical domain expertise. You will be hard-pressed to find a team like this in BC, for sure, but we also like to get together for a beer, gourmet potluck meals, a round of dodge ball, or a pumpkin carving competition.

Making a Difference

Your contribution will make a difference in how healthcare is managed and delivered in this province and around the world. We are proud of helping our customers save lives and improve health by enabling their medical research, finding better treatment protocols, and improving the delivery of health care. Our customers use PHEMI systems for their healthcare and life sciences organizations to improve efficiency, understand patient experience, improve outcomes, research and develop new medicines, and so much more.


The PHEMI Trustworthy Health DataLab is a unique, cloud-based, integrated big data management system that allows healthcare organizations to enhance innovation and generate value from healthcare data by simplifying the ingestion and de-identification of data with NSA/military-grade governance, privacy, and security built-in.

Conventional products simply lock down data, PHEMI goes further, solving privacy and security challenges and addressing the urgent need to secure, govern, curate, and control access to privacy-sensitive personal healthcare information (PHI). This improves data sharing and collaboration inside and outside of an enterprise—without compromising the privacy of sensitive information or increasing administrative burden.

Built on privacy-by-design principles, the software gives researchers, scientists, and clinicians faster access to more information while ensuring that they only see data on a need-to-know basis. Responsible data sharing and a governance framework facilitate compliance with privacy regulations.

PHEMI Trustworthy Health DataLab can scale to any size of organization, is easy to deploy and manage, connects to hundreds of data sources, and integrates with popular data science and business analysis tools.

For more information, visit and follow us on Twitter @PHEMISystems, Linkedin, Youtube, and Facebook

document id: f3b7ed2a709682ba •20210421


This is a full-time position

How to apply

Send your CV to