Hierarchical Approach for Multi-label Protein Function Prediction

Protein function prediction is important for understanding life at the molecular level and therefore is highly demanded by biomedical research and pharmaceutical applications. To overcome the problem, in this work I have proposed a novel multi-label protein function prediction based on hierarchical approach. In Hierarchical multi-label classification problems, each instance can be classified into two or more classes simultaneously, differently from conventional classification. Mainly, the proposed methodology is consisting of three phases such as Creating clusters Generation of class vectors and classification of instances. At first, we create the clusters based on the hybridization of KNearest Neighbor and Expectation Maximization algorithm (KNN+EM). Based on the clusters we generate the class vectors. Finally, protein function prediction is carried out in the classification stage. The performance of the proposed method is extensively tested upon five types of protein datasets, and it is compared to those of the two methods in terms of accuracy. Experimental results show that our proposed multi-label protein function prediction significantly superior to the existing methods over different datasets.
Single-label classification is to learn from a set of instances, each associated with a unique class label from a set of disjoint class labels L. Multi-label classification is to learn from a set of instances where each instance belong to one or more classes in L. Text data sets can be binary, multi-class or multi-label in nature. For the first two categories, only a single class label can be associated with a document. However, in case of multi label data, more than one class labels can be associated with a document at the same time. However, even if a data set is multi-label, not all combinations of class-labels appear in a data set. Also, the probability with which a particular class label combination occurs is also different. It indicates that there is a correlation among the different class-labels and it varies across each pair of class labels. If we look into the literature for multi-label classification, we can see that most traditional approaches try to transform the multi-label problem to multiclass or binary class problem. For example, if there is T class labels in the multi-label problem, one binary SVM (i.e., one vs. rest SVM) classifier can be trained for each of the class labels and the classification results of these classifiers can be merged to get the final prediction. But, this does not provide a correct interpretation of the data.
Additionally, Ernando have explained the hierarchical multi-label classification ant colony algorithm for protein function prediction. This paper explained the hierarchical multi-label classification problem of protein function prediction. This problem was a very active research field, given the large increase in the number of un-characterized proteins available for analysis and the importance of determining their functions in order to improve the current biological knowledge. In this type of problem, each example may belong to multiple class labels and class labels are organized in a hierarchical structure either a tree or a Directed Acyclic Graph (DAG) structure. It presents a more complex problem than conventional flat classification, given that the classification algorithm has to take into account hierarchical relationships between class labels and be able to predict multiple class labels for the same example. Their ACO algorithm discovers an ordered list of hierarchical multi-label classification rules. It was evaluated on sixteen challenging bioinformatics data sets involving hundreds or thousands of class labels to be predicted and compared against state-of-theart decision tree induction algorithms for hierarchical multilabel classification.
For more kindly go through: Biomedical Research
Biomedical Research accepts direct submissions from authors: Attach your word file with e-mail and send it to biomedres@emedsci.com
Media Contact:
Joel James
Managing Editor
Biomedical Research