DePaul BioMedicalInformatics Research

The DePaul BioMedicalInformatics Research Lab was created in early 2003 by Dave Angulo. Dave's main research interests lie in Grid Programming, BioMedicalInformatics, and Web Programming (especially XML and Web Services). This Research Lab specializes in BioInformatics and MedicalInformatics research, but mainly in those BioInformatics or MedicalInformatics applications that are compute intensive or data intensive, and thus require the resources of Grid Programming. Since BioInformatics, MedicalInformatics, and Grid Programming are all fertile grounds of research, the intersection of the three areas is quite novel and exciting.

For more information on any of these efforts, please contact Dave Angulo at dangulo@cti.depaul.edu

There are a number of active research activities being conducted by the DePaul BioMedicalInformatics Research Lab. These categories are:

Illinois Bio-Grid
Titan (Kaveri) and Bioinformatics Support
Genomics (Homology searching and Mass Spectrometry)
Proteomics
Phylogenetics
IBG Software Suite

These research activities are expounded upon below.

Other items of interest:

Illinois Bio-Grid

Dave Angulo is the primary co-founder of the Illinois Bio-Grid, along with collaborators from the Chicago Technology Park, the Supercomputing Center of Chicago, the Argonne National Lab MCS, the University of Chicago BCR and Proteomics lab, the Illinois Institute of Technology CS department, the DePaul Supercomputing Center, and the DePaul Geography department.

The main purpose of the Illinois Bio-Grid is to share compute and software resources amongst the members. It is a computational grid for Grid and BioMedicalInformatics software research and for production BioMedicalInformatics research.

Titan and Bioinformatics Support

DePaul CTI and the Illinois Bio-Grid are home to many computational resources to support researchers with bioinformatics software. Titan is a Sun Fire 6800 MidFrame Server, a powerhouse incorporating 24 processors. It features the 64-bit UltraSPARC III processor with an 8MB pipelined burst level 2 cache, with a four-way associative on-chip 64-KB data and 32KB instruction level 1 cache and integrated memory controller capable of addressing up to 16 GB of main memory per processor at 2.4 GB/s. Storage is provided by a Sun T3 StorEdge array with 2 Terabytes of disk storage and a fiber optic interconnect to the SunFire. Titan (previously Kaveri) is available for use without charge for researchers in the member institutions of the Illinois Bio-Grid. The Illinois Bio-Grid also has thousands of other processors available for use.

Software

The following software is currently available on Titan. Additionally, the Illinois Bio-Grid team is developing software for use on this machine and other machines on the grid.

Clustering and Alignment Tool (CAT) a program which analyzes (Screens, clusters and aligns) the DNA sequence data (DoubleTwist)
ClustalW program for protein sequence analysis
GenScan command line program for gene identification
GCG Wisconsin Package software for comprehensive sequence analysis.
Hmmer program for protein sequence analysis
Pblast parallel/distributed BLAST server
Prophecy search application to find data in the Annotated Human Genome Database.

Accounts

For more information, please contact <dangulo@cti.depaul.edu>.

Other Machines in the Illinois Bio-Grid at Depaul CTI

The z-cluster is an high performance Linux cluster installed at the School of Computer Science, Telecommunications and Information Systems (CTI) of DePaul University in Chicago. The z-cluster is a core computational facility at CTI.

The z-cluster is constituted by 20 computational nodes. Each node consists of a 3.2 GHz P4 processor with Hyperthreading technology, 2 GBytes of Ram and 200 GBytes Hard Disk. The nodes are connected by a Gigabit switch.

Information on the z-clcuster can be found at http://z-login.cti.depaul.edu/ganglia/

Genomics

Genomics is the field of investigating proteins based on their primary structure (DNA nucleic acid sequence or amino acid sequence). Biologists frequently are able to inexpensively determine the sequences of amino acids in their proteins (or the nucleic acid sequence in the DNA that equates to their protein). Those Biologists then frequently want to look for homologous proteins, viz. have a similar evolutionary origin. Software tools are used to search for homologous proteins that are in a national database of sequenced proteins: the NCBI's GenBank. If they find such a protein and if the protein in GenBank has a known function, then they have a good idea what the function of the new protein probably is.

The growth of Data at NCBI (GenBank) has been exponential, and the computation time grows by at least the square of the size of the data. This is quickly growing beyond the capacity of normal computers to compute. Additionally, Biologists would like to be able to search for homologous proteins against a batch of input protein sequences (derived from the mass spectrometry equipment), finding target proteins that are homologous to all of the input sequences. It is unlikely that the NCBI will ever expand their software to include such functionality because it is so computationally intensive. We are developing a toolkit of such software. This toolkit, called the IBG Workbench, includes FASTA, BLAST, and Smith-Waterman algorithms, all converted to run with batches of input sequences and also to run in a distributed environment on the Grid. Parts of this workbench were demonstrated at SuperComputing 2002 convention and won two of the three Grand Challenge competitions.

Proteomics

Mass Spectrometry Analysis

In a second Genomics project, we are working on a Grid enabled version of software algorithms that will take raw data from a Mass Spectrometer and calculate the amino acid sequence of the input protein. For example, a Biologist might start with a whole cell digest of some organism. They would inject a sample of the extract into a series of columns where peptides released from one column are separated on a second column and then are detected and fragmented by the mass spectrometer. The mass spectrometer is acquiring data at about the rate of 3000 spectra per hour. Massive calculations on each spectra must be done for de novo sequencing. In order to handle this huge compute load, we are working on an algorithm to do this in parallel on the Grid. This tool will be part of the IBG Workbench.

Computational Protein Folding

We are working on additional modules for the IBG Workbench (mentioned above) that will be useful to proteomics researchers trying to predict tertiary structures of proteins from their amino acid sequences. The intention is to produce reusable modules that could be loaded together allowing researchers to concentrate on their particular areas of research interest. This framework will include modules to read DNA and amino acid sequences from the various GenBank databases as well as primary, secondary, and tertiary structures of proteins. These IBG Workbench modules will also include chemical libraries to calculate energy levels of molecules, as well as modules that use these chemical libraries to perform ab-initio calculations of protein folding. Other methodologies of predicting protein structure, include rule-based and “lego” algorithms will also be supported with their own modules. Having a suite of modules for researchers to choose from will allow them to minimize their development time because they will only need to concentrate on the portion of the problem that their research addresses.

Phylogenetics

Phylogenetics is the study of evolutionary relationships (phylogeny). We are working with Phylogenetics collaborators at the Field Museum of Natural History in Chicago on determining feasible evolutionary relationships of given taxa by looking at differences in DNA sequences and determining the evolutionary tree starting at some hypothetical evolutionary ancestor of all of the taxa and determining minimum number of mutations required to reproduce the differences in the taxa studied.

IBG Software Suite

IBG Workbench

All of the above BioMedicalInformatics applications share quite a bit of functionality. Certainly, all of the interactions with the Grid is common functionality; however, connections to the NCBI databases (GenBank), sequence comparisons, etc. are common to many of these applications. This common functionality is useful to a wide array of other BioMedical applications as well. Understanding the usefulness in producing a workbench of such tools and a platform to allow development of other tools using the common infrastructure, we are developing the IBG Workbench of these modular Grid enabled tools. All software developed will be open source and available to all Computer Science or BioMedicalInformatics researchers world-wide.

GeneDesigner

GeneDesigner is an open-source product available for free download. It is still in development, but it can be used to design a nucleotide sequence for a particular protein (amino acid sequence) such that it will be optimized for the highest possible expression when used as a recombinant DNA fragment in a particular organism.

Get GeneDesigner here.

IBG High Throughput Task Allocator

This library addresses the problem of searching huge biological databases on the scale of several gigabytes by utilizing parallel processing. Biological databases storing DNA sequences, protein sequences, or mass spectra are growing exponentially. Searches through these databases consume exponentially growing computational resources as well. The library provides a general use, MPI based, C++ framework for generically splitting databases amongst several computational nodes. The combined RAM of the nodes working in tandem is often sufficient to keep the entire database in memory, and therefore to search it efficiently without paging to disk.

Get the IBG High Throughput Task Allocator here

Angulo Consulting

I am the president of Angulo Consulting, established 1985. We do the following in several different areas. See below

Ruby on Rails

Ruby on Rails Consulting (including Python and Perl)
see http://AnguloConsulting.com/RubyOnRails.php

Object Oriented Development Consulting

Object Oriented Development Consulting (including C++, Java, and C#)
see http://AnguloConsulting.com/Programming.php

Java Technologies Consulting

Java Technologies Consulting (including Ant, JUnit, Eclipse, J2EE, JBoss, Web Services, Client-Server architectures, and Java GUI applications)
see http://AnguloConsulting.com/Java.php

Web Programming Consulting

Web Programming Consulting (including JavaScript, AJAX, Servlets, Applets, PHP, CGI, JSP, Tomcat, Axis, and E-commerce)
see http://AnguloConsulting.com/WebApps.php

Microsoft platform Web Programming Consulting

Microsoft platform Web Programming Consulting (including .NET, C#, Web services ActiveX, and ASP)
see http://AnguloConsulting.com/WebApps.php

XML Consulting

XML Consulting (including AJAX, XSLT, JaxB, SAX, DOM, SOAP, and WSDL)
see http://AnguloConsulting.com/Java.php

Database Consulting

Database Consulting (including mySQL, postgres, SQL Server, and Oracle)
see http://AnguloConsulting.com/Database.php

Business Applications

Angulo Consulting can help you with business applications consulting, accounting applications consulting, manufacturing applications consulting and inventory applications consulting

Bioinformatics Consulting

I am a world leader in Bioinformatics applications.
for Bioinformatics Consulting see http://AnguloConsulting.com/Bioinformatics.php

Windows, Linux, Web Applications, and Web Services

Angulo Consulting can help you with Windows, Linux, Web Applications, and Web Services consulting.

Linux Consulting

for Linux Consulting see http://AnguloConsulting.com/Linux.php

Project Management Consulting

Angulo Consulting can develop applications as part of your client's team, we can bring in our own team, or we can be the project manager for the client's team. For Project Management Consulting see http://AnguloConsulting.com/SoftwareDevelop.php

Agile Consulting

The consultants at Angulo Consulting always develop high quality software using the techniques of Agile development Consulting see http://AnguloConsulting.com/Agile.php