National Cancer Centre of Singapore Pte Ltd

Biological Data Scientist (DCS)

Job Category:  Research
Posting Date:  21 Aug 2024

We are looking for highly motivated and talented individual with passion in oncology, genomics and data science research to join the Data and Computational Science Core at the National Cancer Centre Singapore. You will primarily work with the Senior Data Scientist lead and the current research and clinical teams. The selected individual is expected to actively contribute to our core multi-omics research and big data analysis infrastructure that is focused on using electronic medical records, next-generation sequencing (NGS), radiological imaging and other multi-modal datatypes to develop biomarkers predictive of clinical responses in cancer patients. Specifically, the Biological Data Scientist will be expected to optimise genomic, transcriptomic, proteomic and radiomic data processing, statistical analysis pipelines, feature engineering for statistical machine and deep-learning and survival analysis, and other relevant computational analyses to better understand the complexity of cancer progression and treatment resistance across multiple cancer types. There will also be ample opportunities for inter-departmental and cross-institution collaborations with oncologists, pathologists, and scientists. For more information, please feel free to refer to the laboratory website - www.chualabnccs.com.

Responsibilities:
- Designing, running, interpreting and validating published and novel models towards improving healthcare via precision medicine and clinical predictions.
- Developing biomarkers for oncology research by applying AI on multi-omics data.
- Guiding research discussions with other scientists and mentor students.

Requirements:
- Postgraduate degree (at least Masters; PhD preferred) in Bioinformatics, (quantitative/computational) Biology or Data Science.
- Strong coding skills in most major programming languages and key AI libraries (scikit-learn/tidymodels, Tensorflow/Pytorch).
- At least two years of experience in handling multi-omics/clinical data using statistics, classical ML, DL and LMs.
- A passion for healthcare, precision medicine, scientific research and problem-solving.
- Familiarity with statistical analysis methods and common bioinformatics data types, tools and workflows (somatic/germline variant calling, sc/RNAseq).
- A keenness to continually learn and integrate new tools and parameters to keep up with industry best practices, adapting them to local needs.
- Ability to independently plan and execute data analysis and AI projects, in collaboration with teammates and external parties.
- Strong organizational, interpersonal and presentation skills.

Desired:
- Experience with tuning GPU libraries for DL and generative AI models.
- Interest in ELT data transformations using Large models (HuggingFace, Ollama).
- Big data wrangling experience across multimodal datatypes (e.g. Polars, Arrow, vector DBs and column-stores).
- Familiarity with Deep Learning methods for medical image, spatial and other higher-dimensional data.
- Familiarity with pipeline management systems (Nextflow, Snakemake, CWL, WDL).
- Familiarity with job schedulers (SLURM, PBS, SGE, LSF).
- Familiarity with container/virtualization systems (Docker, Singularity, Podman, Kubernetes).