Loading…

iDTi-CSsmoteB: Identification of Drug-Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE

Identifying interaction between drug and protein is a crucial challenge in drug discovery, which can lead the researchers to develop novel drug compounds or new target proteins for the existing drugs. The determination of drug-target interactions (DTIs) is an extremely time-consuming, costly, and te...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2019, Vol.7, p.48699-48714
Main Authors: Mahmud, S. M. Hasan, Chen, Wenyu, Jahan, Hosney, Liu, Yongsheng, Sujan, Nasir Islam, Ahmed, Saeed
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c408t-53ecff2cf1094687e902a5b37dd2751d4650c6e3d7002c90619325f6628f5b563
cites cdi_FETCH-LOGICAL-c408t-53ecff2cf1094687e902a5b37dd2751d4650c6e3d7002c90619325f6628f5b563
container_end_page 48714
container_issue
container_start_page 48699
container_title IEEE access
container_volume 7
creator Mahmud, S. M. Hasan
Chen, Wenyu
Jahan, Hosney
Liu, Yongsheng
Sujan, Nasir Islam
Ahmed, Saeed
description Identifying interaction between drug and protein is a crucial challenge in drug discovery, which can lead the researchers to develop novel drug compounds or new target proteins for the existing drugs. The determination of drug-target interactions (DTIs) is an extremely time-consuming, costly, and tedious task with wet-lab experiments. To date, multiple computational techniques have been presented to simplify the drug discovery process, but a huge number of interactions are still undiscovered. Furthermore, a class imbalance is a critical challenge regarding this experiment which can significantly degrade the classification accuracy that has not been effectively addressed yet. In this paper, we proposed a novel high-throughput computational model, called iDTi-CSsmoteB, for identification of DTIs based on drug chemical structures and protein sequences. More specifically, the protein sequence is extracted through position-specific scoring matrix (PSSM)-Bigram, amphiphilic pseudo amino acid composition (AM-PseAAC) and dipeptide PseAAC descriptors which represents evolutionary and sequence information. The drug chemical structure is represented as a molecular substructure fingerprint (MSF) which describes the existence of the functional fragments or groups. Finally, we used the over-sampling SMOTE technique to overcome the imbalance issue of the datasets and applied XGBoost algorithm as a classifier to predict DTIs. To evaluate the performance of iDTi-CSsmoteB, several experiments have been conducted on four benchmark datasets, namely, enzyme, ion channel, GPCR, and nuclear receptor based on fivefold cross validation. The experimental analysis exhibits that our model outperforms similar methods in terms of area under the ROC (auROC) curve. In addition, our achieved results indicate the effectiveness of the feature extraction techniques, balancing methods, and classifier for predicting the DTIs which can provide substance for new drug development. iDTi-CSsmoteB webserver is available online at http://idticssmoteb-uestc.me/ .
doi_str_mv 10.1109/ACCESS.2019.2910277
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2455610815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8686077</ieee_id><doaj_id>oai_doaj_org_article_bf50c48bca5b402e82f24cb14def2c44</doaj_id><sourcerecordid>2455610815</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-53ecff2cf1094687e902a5b37dd2751d4650c6e3d7002c90619325f6628f5b563</originalsourceid><addsrcrecordid>eNpNUV1r2zAUNWOFlq6_oC-CPTuTZEm299a4WRfoyMAu7ZuQ5atEIbEySR7s1-yvTqlLmV50ufeccz9Olt0SvCAE11_ummbVtguKSb2gNcG0LD9kV5SIOi94IT7-F19mNyHscXpVSvHyKvtr7zubN204ugjLr2g9wBitsVpF60bkDLr30zbvlN9CROsxglf6tbRUAQaUgjMANTs4JtIBtdFPOk4ekBoH9NMnWTuiFn5NMGpAT8GOW_TysHQuRPRs4w5tfoPPW3U8Hc6lDvRutAmN2h-bbvUpuzDqEODm7b_Onr6tuuZ7_rh5WDd3j7lmuIo5L0AbQ7VJB2GiKqHGVPG-KIeBlpwMTHCsBRRDiTHVNRakLig3QtDK8J6L4jpbz7qDU3t58vao_B_plJWvCee3Uvlo9QFkb5IWq3qdGjBMoaKGMt0TNkCagLGk9XnWOnmXFglR7t3kxzS-pIxzQdL1eUIVM0p7F4IH896VYHk2Vs7GyrOx8s3YxLqdWRYA3hmVqARO1X8HCZ8f</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455610815</pqid></control><display><type>article</type><title>iDTi-CSsmoteB: Identification of Drug-Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE</title><source>IEEE Open Access Journals</source><creator>Mahmud, S. M. Hasan ; Chen, Wenyu ; Jahan, Hosney ; Liu, Yongsheng ; Sujan, Nasir Islam ; Ahmed, Saeed</creator><creatorcontrib>Mahmud, S. M. Hasan ; Chen, Wenyu ; Jahan, Hosney ; Liu, Yongsheng ; Sujan, Nasir Islam ; Ahmed, Saeed</creatorcontrib><description>Identifying interaction between drug and protein is a crucial challenge in drug discovery, which can lead the researchers to develop novel drug compounds or new target proteins for the existing drugs. The determination of drug-target interactions (DTIs) is an extremely time-consuming, costly, and tedious task with wet-lab experiments. To date, multiple computational techniques have been presented to simplify the drug discovery process, but a huge number of interactions are still undiscovered. Furthermore, a class imbalance is a critical challenge regarding this experiment which can significantly degrade the classification accuracy that has not been effectively addressed yet. In this paper, we proposed a novel high-throughput computational model, called iDTi-CSsmoteB, for identification of DTIs based on drug chemical structures and protein sequences. More specifically, the protein sequence is extracted through position-specific scoring matrix (PSSM)-Bigram, amphiphilic pseudo amino acid composition (AM-PseAAC) and dipeptide PseAAC descriptors which represents evolutionary and sequence information. The drug chemical structure is represented as a molecular substructure fingerprint (MSF) which describes the existence of the functional fragments or groups. Finally, we used the over-sampling SMOTE technique to overcome the imbalance issue of the datasets and applied XGBoost algorithm as a classifier to predict DTIs. To evaluate the performance of iDTi-CSsmoteB, several experiments have been conducted on four benchmark datasets, namely, enzyme, ion channel, GPCR, and nuclear receptor based on fivefold cross validation. The experimental analysis exhibits that our model outperforms similar methods in terms of area under the ROC (auROC) curve. In addition, our achieved results indicate the effectiveness of the feature extraction techniques, balancing methods, and classifier for predicting the DTIs which can provide substance for new drug development. iDTi-CSsmoteB webserver is available online at http://idticssmoteb-uestc.me/ .</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2910277</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; AM-PseAAC ; Chemicals ; Classifiers ; Datasets ; DP-PseAAC ; drug-target interactions ; Drugs ; Feature extraction ; Ion channels ; Molecular structure ; molecular substructure fingerprint ; over-sampling SMOTE ; Predictive models ; Protein sequence ; Proteins ; PSSM-Bigram ; Sampling methods ; Substructures ; Target recognition ; XGBoost classifier</subject><ispartof>IEEE access, 2019, Vol.7, p.48699-48714</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-53ecff2cf1094687e902a5b37dd2751d4650c6e3d7002c90619325f6628f5b563</citedby><cites>FETCH-LOGICAL-c408t-53ecff2cf1094687e902a5b37dd2751d4650c6e3d7002c90619325f6628f5b563</cites><orcidid>0000-0002-6828-3559 ; 0000-0001-9578-4351 ; 0000-0002-2867-9823</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8686077$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,786,790,4043,27666,27956,27957,27958,55284</link.rule.ids></links><search><creatorcontrib>Mahmud, S. M. Hasan</creatorcontrib><creatorcontrib>Chen, Wenyu</creatorcontrib><creatorcontrib>Jahan, Hosney</creatorcontrib><creatorcontrib>Liu, Yongsheng</creatorcontrib><creatorcontrib>Sujan, Nasir Islam</creatorcontrib><creatorcontrib>Ahmed, Saeed</creatorcontrib><title>iDTi-CSsmoteB: Identification of Drug-Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE</title><title>IEEE access</title><addtitle>Access</addtitle><description>Identifying interaction between drug and protein is a crucial challenge in drug discovery, which can lead the researchers to develop novel drug compounds or new target proteins for the existing drugs. The determination of drug-target interactions (DTIs) is an extremely time-consuming, costly, and tedious task with wet-lab experiments. To date, multiple computational techniques have been presented to simplify the drug discovery process, but a huge number of interactions are still undiscovered. Furthermore, a class imbalance is a critical challenge regarding this experiment which can significantly degrade the classification accuracy that has not been effectively addressed yet. In this paper, we proposed a novel high-throughput computational model, called iDTi-CSsmoteB, for identification of DTIs based on drug chemical structures and protein sequences. More specifically, the protein sequence is extracted through position-specific scoring matrix (PSSM)-Bigram, amphiphilic pseudo amino acid composition (AM-PseAAC) and dipeptide PseAAC descriptors which represents evolutionary and sequence information. The drug chemical structure is represented as a molecular substructure fingerprint (MSF) which describes the existence of the functional fragments or groups. Finally, we used the over-sampling SMOTE technique to overcome the imbalance issue of the datasets and applied XGBoost algorithm as a classifier to predict DTIs. To evaluate the performance of iDTi-CSsmoteB, several experiments have been conducted on four benchmark datasets, namely, enzyme, ion channel, GPCR, and nuclear receptor based on fivefold cross validation. The experimental analysis exhibits that our model outperforms similar methods in terms of area under the ROC (auROC) curve. In addition, our achieved results indicate the effectiveness of the feature extraction techniques, balancing methods, and classifier for predicting the DTIs which can provide substance for new drug development. iDTi-CSsmoteB webserver is available online at http://idticssmoteb-uestc.me/ .</description><subject>Algorithms</subject><subject>AM-PseAAC</subject><subject>Chemicals</subject><subject>Classifiers</subject><subject>Datasets</subject><subject>DP-PseAAC</subject><subject>drug-target interactions</subject><subject>Drugs</subject><subject>Feature extraction</subject><subject>Ion channels</subject><subject>Molecular structure</subject><subject>molecular substructure fingerprint</subject><subject>over-sampling SMOTE</subject><subject>Predictive models</subject><subject>Protein sequence</subject><subject>Proteins</subject><subject>PSSM-Bigram</subject><subject>Sampling methods</subject><subject>Substructures</subject><subject>Target recognition</subject><subject>XGBoost classifier</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1r2zAUNWOFlq6_oC-CPTuTZEm299a4WRfoyMAu7ZuQ5atEIbEySR7s1-yvTqlLmV50ufeccz9Olt0SvCAE11_ummbVtguKSb2gNcG0LD9kV5SIOi94IT7-F19mNyHscXpVSvHyKvtr7zubN204ugjLr2g9wBitsVpF60bkDLr30zbvlN9CROsxglf6tbRUAQaUgjMANTs4JtIBtdFPOk4ekBoH9NMnWTuiFn5NMGpAT8GOW_TysHQuRPRs4w5tfoPPW3U8Hc6lDvRutAmN2h-bbvUpuzDqEODm7b_Onr6tuuZ7_rh5WDd3j7lmuIo5L0AbQ7VJB2GiKqHGVPG-KIeBlpwMTHCsBRRDiTHVNRakLig3QtDK8J6L4jpbz7qDU3t58vao_B_plJWvCee3Uvlo9QFkb5IWq3qdGjBMoaKGMt0TNkCagLGk9XnWOnmXFglR7t3kxzS-pIxzQdL1eUIVM0p7F4IH896VYHk2Vs7GyrOx8s3YxLqdWRYA3hmVqARO1X8HCZ8f</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Mahmud, S. M. Hasan</creator><creator>Chen, Wenyu</creator><creator>Jahan, Hosney</creator><creator>Liu, Yongsheng</creator><creator>Sujan, Nasir Islam</creator><creator>Ahmed, Saeed</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6828-3559</orcidid><orcidid>https://orcid.org/0000-0001-9578-4351</orcidid><orcidid>https://orcid.org/0000-0002-2867-9823</orcidid></search><sort><creationdate>2019</creationdate><title>iDTi-CSsmoteB: Identification of Drug-Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE</title><author>Mahmud, S. M. Hasan ; Chen, Wenyu ; Jahan, Hosney ; Liu, Yongsheng ; Sujan, Nasir Islam ; Ahmed, Saeed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-53ecff2cf1094687e902a5b37dd2751d4650c6e3d7002c90619325f6628f5b563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>AM-PseAAC</topic><topic>Chemicals</topic><topic>Classifiers</topic><topic>Datasets</topic><topic>DP-PseAAC</topic><topic>drug-target interactions</topic><topic>Drugs</topic><topic>Feature extraction</topic><topic>Ion channels</topic><topic>Molecular structure</topic><topic>molecular substructure fingerprint</topic><topic>over-sampling SMOTE</topic><topic>Predictive models</topic><topic>Protein sequence</topic><topic>Proteins</topic><topic>PSSM-Bigram</topic><topic>Sampling methods</topic><topic>Substructures</topic><topic>Target recognition</topic><topic>XGBoost classifier</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mahmud, S. M. Hasan</creatorcontrib><creatorcontrib>Chen, Wenyu</creatorcontrib><creatorcontrib>Jahan, Hosney</creatorcontrib><creatorcontrib>Liu, Yongsheng</creatorcontrib><creatorcontrib>Sujan, Nasir Islam</creatorcontrib><creatorcontrib>Ahmed, Saeed</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mahmud, S. M. Hasan</au><au>Chen, Wenyu</au><au>Jahan, Hosney</au><au>Liu, Yongsheng</au><au>Sujan, Nasir Islam</au><au>Ahmed, Saeed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>iDTi-CSsmoteB: Identification of Drug-Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2019</date><risdate>2019</risdate><volume>7</volume><spage>48699</spage><epage>48714</epage><pages>48699-48714</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Identifying interaction between drug and protein is a crucial challenge in drug discovery, which can lead the researchers to develop novel drug compounds or new target proteins for the existing drugs. The determination of drug-target interactions (DTIs) is an extremely time-consuming, costly, and tedious task with wet-lab experiments. To date, multiple computational techniques have been presented to simplify the drug discovery process, but a huge number of interactions are still undiscovered. Furthermore, a class imbalance is a critical challenge regarding this experiment which can significantly degrade the classification accuracy that has not been effectively addressed yet. In this paper, we proposed a novel high-throughput computational model, called iDTi-CSsmoteB, for identification of DTIs based on drug chemical structures and protein sequences. More specifically, the protein sequence is extracted through position-specific scoring matrix (PSSM)-Bigram, amphiphilic pseudo amino acid composition (AM-PseAAC) and dipeptide PseAAC descriptors which represents evolutionary and sequence information. The drug chemical structure is represented as a molecular substructure fingerprint (MSF) which describes the existence of the functional fragments or groups. Finally, we used the over-sampling SMOTE technique to overcome the imbalance issue of the datasets and applied XGBoost algorithm as a classifier to predict DTIs. To evaluate the performance of iDTi-CSsmoteB, several experiments have been conducted on four benchmark datasets, namely, enzyme, ion channel, GPCR, and nuclear receptor based on fivefold cross validation. The experimental analysis exhibits that our model outperforms similar methods in terms of area under the ROC (auROC) curve. In addition, our achieved results indicate the effectiveness of the feature extraction techniques, balancing methods, and classifier for predicting the DTIs which can provide substance for new drug development. iDTi-CSsmoteB webserver is available online at http://idticssmoteb-uestc.me/ .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2910277</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-6828-3559</orcidid><orcidid>https://orcid.org/0000-0001-9578-4351</orcidid><orcidid>https://orcid.org/0000-0002-2867-9823</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2019, Vol.7, p.48699-48714
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2455610815
source IEEE Open Access Journals
subjects Algorithms
AM-PseAAC
Chemicals
Classifiers
Datasets
DP-PseAAC
drug-target interactions
Drugs
Feature extraction
Ion channels
Molecular structure
molecular substructure fingerprint
over-sampling SMOTE
Predictive models
Protein sequence
Proteins
PSSM-Bigram
Sampling methods
Substructures
Target recognition
XGBoost classifier
title iDTi-CSsmoteB: Identification of Drug-Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-23T00%3A28%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=iDTi-CSsmoteB:%20Identification%20of%20Drug-Target%20Interaction%20Based%20on%20Drug%20Chemical%20Structure%20and%20Protein%20Sequence%20Using%20XGBoost%20With%20Over-Sampling%20Technique%20SMOTE&rft.jtitle=IEEE%20access&rft.au=Mahmud,%20S.%20M.%20Hasan&rft.date=2019&rft.volume=7&rft.spage=48699&rft.epage=48714&rft.pages=48699-48714&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2910277&rft_dat=%3Cproquest_ieee_%3E2455610815%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-53ecff2cf1094687e902a5b37dd2751d4650c6e3d7002c90619325f6628f5b563%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2455610815&rft_id=info:pmid/&rft_ieee_id=8686077&rfr_iscdi=true