My research interest lies in the areas of machine learning for bioinformatics, machine learning applications and data clustering.

In the last 10 years, I am involved in machine learning for bioinformatics. My research efforts are focused on developing and applying computational techniques for the analysis of biological data and modeling of biological processes at the molecular level. The broad aim is to provide computational tools to assist researchers to understand, explain and predict the behavior of complex biological systems. My research activities take place in the Cancer Systems Biology Laboratory.

I am currently involved in the following two research projects:

  • Genome Annotation based on Sequence Analysis

For the analysis of sequences, we have developed a generative system based on feature space mapping, called Subsequence Profile Map (SPMap). SPMap can be fed to a discriminative classifier for prediction purposes. Instead of focusing motifs in a sequence, SPMap considers all of the subsequences as a distribution over a quantized space by discretizing and reducing the dimension of an otherwise huge space of all possible subsequences. We have already applied SPMap and other feature generating methods to the following problems:

Automated Protein Function Prediction
Enzyme Class Prediction

  • Generating Readable Layouts for Biological Graphs

Although force-directed layout algorithm could be used to draw biological graphs, modification is required when we would like to embed domain-specific knowledge. We proposed a modified and improved (Kamada-Kawai) force-directed layout algorithm, EClerize, to generate more readable layouts for biological graphs that represent pathways in which the vertices are identified with EC (Enzyme Commission) numbers. While the vertices with the same EC class numbers are treated as members of the same cluster, positions of vertices in clusters are affected by the biological similarity of each vertex in the same cluster and the theoretical length between the vertices.


  • Evaluating the Biological Activity of Genes and Processes in Pathways

As an alternative to already existing functional enrichment methods aimed at identifying significant biological processes/pathways on the basis of experimental data, we have proposed and developed an approach to assess the activity of cellular pathways on the basis of experimental data. The approach is based on a conversion of the pathway to a directed graph and on a score flow algorithm that initializes scores of pathway nodes relying on experimental data and then iteratively updates scores until convergence is reached. The algorithm has been implemented as a Cytoscape plug-in, Pathway Scoring Application and tested by relying on different sets of paired transcriptome/Chip-seq data and relying on KEGG pathways. The algorithm has been further tested as an in silico gene knockout tool by relying on a manually constructed pathway. Our current effort is on developing a probabilistic computational method for this approach.

Z. Isik, T. Ersahin, V. Atalay, C. Aykanat, R. Cetin-Atalay, “A signal transduction score flow algorithm for cyclic cellular pathway analysis, which combines transcriptome and ChIP-seq data”, Molecular BioSystems, 8, p.3224-3231, 2012, doi:10.1039/C2MB25215E.

Z. Işık, V. Atalay, R. Çetin-Atalay, “Evaluation of Signaling Cascades Based on the Weights from Microarray and ChIP-seq Data”, Journal of Machine Learning Research W&C ProceedingsMIT Press, Vol.8, pp.44-54, 2010.

Z. Işık, V. Atalay, C. Aykanat, R. Çetin-Atalay, “Data and Model Driven Hybrid Approach to Activity Scoring of Cyclic Pathways”, 25th International Symposium on Computer and Information Sciences (ISCIS 2010), London, UK, September 22-24, 2010, Lecture Notes in Electrical Engineering, Springer, Vol. 62 pp.91-94.

Z. Isik, V. Atalay, R. Cetin-Atalay, “Evaluation of Signaling Cascades Based on the Weights from Microarray and ChIP-seq Data”, Third International Workshop on Machine Learning in Systems Biology (MLSB 2009), Ljubljana, Slovenia, September 5-6, 2009.

  • Identification of Novel Reference Genes Based on MeSH Categories

Even the most frequently used reference genes are subject to differential regulation under specific treatments or between different cell lines or tissues. We have devised a method that provides alternative reference gene lists for global and cell-type specific normalization of transcriptome data. Gene lists are scored based on their expression stability, and classified according to the Medical Subject Headings (MeSH) associated with the transcriptome study that was published and indexed by National Library of Medicine.

L. Çarkacıoğlu, R. Çetin-Atalay, Ö. Konu, V. Atalay, T. Can, “Bi-k-Bi Clustering: Mining Large Scale Gene Expression Data Using Two-Level Biclustering”, International Journal of Data Mining and Bioinformatics, Inderscience, Vol. 4, No.6 pp.701-721, 2010, doi:10.1504/IJDMB.2010.037548.