The Exposure Biology Informatics Core (EBIC), directed by Dr. Jason Moore, is a CEET funded service core that is dedicated to provide bioinformatics support to CEET members in the areas of bioinformatics application development, multi-omics data management and integration, NGS data analysis, and machine learning. The main role of EBIC is assist the CEET investigator community with bioinformatics needs that will support a pilot-project or analysis of preliminary data for a grant application. It can also assist investigators in the analysis of discrete data sets for manuscript publication. Project intensive bioinformatics support can be provided only if the investigator includes bioinformatics support in pending or awarded grants as a fee-for-service. EBIC is pleased to consult on an individual basis to scope project need and support.
We are in the process of building a centralized cheminformatics repository which will include popular toxicological databases (e.g. ACToR, ChEMBL, Chemistry Dashboard Comparative Toxicogenomics Data Base, CTD, EPA Integrated Risk Information System (IRIS), T3DB, TOXNET and TOXCAST) and tools. To make our resource more powerful, we will develop a customized data analysis and visualization platform so the users can perform some preliminary analysis as well as retrieve ensembled data.
Our staff are familiar with cutting-edge bioinformatics analysis methods and techniques including the implementation and use of software pipelines for the analysis of large-scale genomic data from next-generation sequencing. We make heavy use of the open-source R package for data science.
We have established a comprehensive infrastructure for accessing and carrying out research with clinical data from the Penn Data Store (PDS) that contains more than 4 million patients in an enterprise data warehouse. We have established a close working relationship with the Data Analytics Center (DAC) that manages the data as part of the University of Pennsylvania Health System (UPHS).
We offer accessible and user-friendly artificial intelligence software (PennAI) developed at Penn for computational analysis of complex biomedical data.
DNA sequence analysis
We have developed about ten different types of sequence data analysis pipelines. Depending on the popularity and maturity of the sequencing technology, we apply different strategies for the pipeline development. For newer sequencing data type such as scRNA-seq, pVAC-seq and CRISPR, we start with developing and implementing individual modules in the pipeline then run the pipeline step by step manually which usually includes data QC, basic statistics, core analysis, result annotation and the result visualization.
We maintain an Oracle-based relational database that can be adapted to virtually any data storage, management, and retrieval needs. This includes a web interface for data entry and management.
We are familiar with many popular machine learning algorithms for the identification of complex patterns in large-scale data. These include cluster analysis, neural networks, random forests, support vector machines, and several novel methods such as the Tree-Based Pipeline Optimization Tool (TPOT) developed locally by Penn bioinformatics investigators for automated machine learning.
We can plan and implement complex software packages for specific projects. These can include standalone software with user-friendly graphic user interfaces or software implemented through the web.
We have advanced capability for data visualization and interactive visual analysis. This includes priority access to a state-of-the-art immersive visualization facility.
We can assist with all aspects of web programming and web page design and implementation.