Frequency: Quarterly E- ISSN: 2250-2920 P- ISSN: Awaited Abstracted/ Indexed in: Ulrich's International Periodical Directory, Google Scholar, SCIRUS, getCITED, Genamics JournalSeek
Quarterly published in print and online "Inventi Impact: Bioinformatics" publishes high quality unpublished as well as high impact pre-published research and reviews catering to the needs of researchers and professionals from IT as well as of life sciences domains. It focuses on storing, retrieving, organizing and analyzing the biological data, and on new developments in genome bioinformatics and computational biology.
Background: Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information\nfor the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a\ndifferential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for\npatients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and\nK-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database.\nResults: The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The\npulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction\npathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the\npre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately\ninto the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique.\nThe statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are\nsignificantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19%\nand 98.26%, respectively.\nConclusion: Although the data used to train and test the classifiers are limited, the classification accuracies found are\nsatisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals\nfrom pathological and normal subjects obtained from the RALE database....
Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however,\nanalyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain\nhomologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a\nreference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly,\nmultiple genome alignment, and annotation.\nResults: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic\nsignal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate\nphylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of\nplacental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using\ndatasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent\nwith the major hypotheses for the relationships among mammals, all of which have been supported previously by\ndifferent molecular datasets.\nConclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for\nexpensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is\nopen source and freely available at https://github.com/rachelss/SISRS/releases....
The genetic microarrays give to researchers a huge amount of data of many diseases represented\nby intensities of gene expression. In genomic medicine gene expression analysis is guided to find\nstrategies for prevention and treatment of diseases with high rate of mortality like the different\ncancers. So, genomic medicine requires the use of complex information technology. The purpose\nof our paper is to present a multi-agent system developed in order to improve gene expression\nanalysis with the automation of tasks about identification of genes involved in a cancer, and classification\nof tumors according to molecular biology. Agents that integrate the system, carry out\nreading files of intensity data of genes from microarrays, pre-processing of this information, and\nwith machine learning methods make groups of genes involved in the process of a disease as well\nas the classification of samples that could propose new subtypes of tumors difficult to identify\nbased on their morphology. Our results we prove that the multi-agent system requires a minimal\nintervention of user, and the agents generate knowledge that reduce the time and complexity of\nthe work of prevention and diagnosis, and thus allow a more effective treatment of tumors....
Background: Human leukocyte antigen (HLA) genes are critical genes involved in important bio medical aspects,\nincluding organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most\npolymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many\ncases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico\nmethods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such\nmethods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the\nreads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates.\nThe PacBio system has the longest read length among available NGS platforms, and therefore is the only platform\ncapable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem\ncaused by the ââ?¬Å?phasingââ?¬Â issue.\nResults: We proposed a new method Bayes Typing1 to assign HLA alleles for Pac Bio circular consensus sequencing\nreads using Bayesââ?¬â?¢ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1.\nThe experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise\nreads.\nConclusions: The Bayes Typing1 method could overcome the problems of HLA typing using PacBio reads, which\nmostly arise from sequencing errors of Pac Bio reads and the divergence of HLA genes, to some extent....
Identifying the various gene expression response patterns is a challenging issue in expression microarray time-course experiments. Due to heterogeneity in the regulatory reaction among thousands of genes tested, it is impossible to manually characterize a parametric form for each of the time-course pattern in a gene by gene manner. We introduce a growth curve model with fractional polynomials to automatically capture the various time-dependent expression patterns and meanwhile efficiently handle missing values due to incomplete observations. For each gene, our procedure compares the performances among fractional polynomial models with power terms from a set of fixed values that offer a wide range of curve shapes and suggests a best fitting model. After a limited simulation study, the model has been applied to our human in vivo irritated epidermis data with missing observations to investigate time-dependent transcriptional responses to a chemical irritant. Our method was able to identify the various nonlinear time-course expression trajectories. The integration of growth curves with fractional polynomials provides a flexible way to model different time-course patterns together with model selection and significant gene identification strategies that can be applied in microarray-based time-course gene expression experiments with missing observations....
Gene function prediction is a complicated and challenging hierarchical multi-label classification\n(HMC) task, in which genes may have many functions at the same time and these functions are organized\nin a hierarchy. This paper proposed a novel HMC algorithm for solving this problem based on the\nGene Ontology (GO), the hierarchy of which is a directed acyclic graph (DAG) and is more difficult\nto tackle. In the proposed algorithm, the HMC task is firstly changed into a set of binary classification\ntasks. Then, two measures are implemented in the algorithm to enhance the HMC performance by\nconsidering the hierarchy structure during the learning procedures. Firstly, negative instances selecting\npolicy associated with the SMOTE approach are proposed to alleviate the imbalanced data set problem.\nSecondly, a nodes interaction method is introduced to combine the results of binary classifiers. It can\nguarantee that the predictions are consistent with the hierarchy constraint. The experiments on eight\nbenchmark yeast data sets annotated by the Gene Ontology show the promising performance of the\nproposed algorithm compared with other state-of-the-art algorithms...
Ebola virus (EBOV) is a deadly virus that has caused several fatal outbreaks. Recently it caused another outbreak and resulted in\nthousands afflicted cases. Effective and approved vaccine or therapeutic treatment against this virus is still absent. In this study,\nwe aimed to predict B-cell epitopes from several EBOV encoded proteins which may aid in developing new antibody-based\ntherapeutics or viral antigen detection method against this virus. Multiple sequence alignment (MSA) was performed for the\nidentification of conserved region among glycoprotein (GP), nucleoprotein (NP), and viral structural proteins (VP40, VP35, and\nVP24) of EBOV. Next, different consensus immunogenic and conserved sites were predicted from the conserved region(s) using\nvarious computational tools which are available in Immune Epitope Database (IEDB). Among GP, NP, VP40, VP35, and VP30\nprotein, only NP gave a 100% conserved GEQYQQLR B-cell epitope that fulfills the ideal features of an effective B-cell epitope and\ncould lead a way in themilieu of Ebola treatment. However, successful in vivo and in vitro studies are prerequisite to determine the\nactual potency of our predicted epitope and establishing it as a preventing medication against all the fatal strains of EBOV....
This paper presents a hybrid method to extract endocardial contour of the right ventricular (RV) in 4-slices from 3D\nechocardiography dataset. The overall framework comprises four processing phases. In Phase I, the region of interest (ROI) is\nidentified by estimating the cavity boundary. Speckle noise reduction and contrast enhancement were implemented in Phase II as\npreprocessing tasks. In Phase III, the RV cavity region was segmented by generating intensity threshold which was used for once\nfor all frames. Finally, Phase IV is proposed to extract the RV endocardial contour in a complete cardiac cycle using a combination\nof shape-based contour detection and improved radial search algorithm. The proposed method was applied to 16 datasets of 3D\nechocardiography encompassing the RV in long-axis view. The accuracy of experimental results obtained by the proposed method\nwas evaluated qualitatively and quantitatively. It has been done by comparing the segmentation results of RV cavity based on\nendocardial contour extraction with the ground truth.The comparative analysis results show that the proposed method performs\nefficiently in all datasets with overall performance of 95% and the root mean square distances (RMSD) measure in terms of mean\n�± SD was found to be 2.21 �± 0.35mm for RV endocardial contours....
Background: Drug-target interaction prediction is of great significance for narrowing\ndown the scope of candidate medications, and thus is a vital step in drug discovery.\nBecause of the particularity of biochemical experiments, the development of new\ndrugs is not only costly, but also time-consuming. Therefore, the computational\nprediction of drug target interactions has become an essential way in the process of\ndrug discovery, aiming to greatly reducing the experimental cost and time....
Background: Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The\nclassification and annotation of these genomes constitute important assets in the discovery of genomic variability,\ntaxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific\nwell-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and\naccurate tools for classifying and typing newly sequenced strains of diverse virus families.\nResults: Here, we introduce a virus classification platform, CASTOR, based on machine learning methods. CASTOR\nis inspired by a well-known technique in molecular biology: restriction fragment length polymorphism (RFLP). It\nsimulates, in silico, the restriction digestion of genomic material by different enzymes into fragments. It uses two\nmetrics to construct feature vectors for machine learning algorithms in the classification step. We benchmark CASTOR\nfor the classification of distinct datasets of human papillomaviruses (HPV), hepatitis B viruses (HBV) and human\nimmunodeficiency viruses type 1 (HIV-1). Results reveal true positive rates of 99%, 99% and 98% for HPV Alpha species,\nHBV genotyping and HIV-1 M subtyping, respectively. Furthermore, CASTOR shows a competitive performance\ncompared to well-known HIV-1 specific classifiers (REGA and COMET) on whole genomes and pol fragments.\nConclusion: The performance of CASTOR, its genericity and robustness could permit to perform novel and accurate\nlarge scale virus studies. The CASTOR web platform provides an open access, collaborative and reproducible machine\nlearning classifiers. CASTOR can be accessed at http://castor.bioinfo.uqam.ca....
Loading....