Loading…

Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis

The industrial world has entered the era of industrial revolution 4.0. In this era, there is an urgent data requirement from the community to support service policies. Because of that, Surabaya Government made Media Center Surabaya. This media is used to accommodate all the aspiration of Surabaya ci...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qomariyah, Siti, Iriawan, Nur, Fithriasari, Kartika
Format:	Conference Proceeding
Language:	English
Subjects:	Algorithms Data mining Digital media Dirichlet problem Government services Modelling Performance enhancement Preprocessing Semantic analysis Semantics Social networks Unstructured data
Citations:	Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c328t-956625199f8c2acec8e202b26fd6251965e6e49106370d9a423d4fe682bf84b3
cites
container_end_page
container_issue	1
container_start_page
container_title
container_volume	2194
creator	Qomariyah, Siti Iriawan, Nur Fithriasari, Kartika
description	The industrial world has entered the era of industrial revolution 4.0. In this era, there is an urgent data requirement from the community to support service policies. Because of that, Surabaya Government made Media Center Surabaya. This media is used to accommodate all the aspiration of Surabaya citizen. To access this media, a citizen can use Twitter. The topic which is discussed in Twitter is important information that we need to know. The information can be used to improve the performance of Surabaya Government services. Twitter data is a text data that consists of thousands of variables. Text mining is frequently used to analyze this kind of data, including topic modeling and sentiment analysis. This study would work on topic modeling focused on the algorithm employing Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). The evaluation of the algorithm performance uses the topic coherence. As unstructured data, the Twitter data need preprocessing before the analysis. The stages of preprocessing include cleansing, stemming, and stop words. The advantages of LSA are fast and easy to implement. LSA, on the other hand, doesn’t consider the relationship between documents in the corpus, while LDA does. This study shows that LDA gives a better result than LSA.
doi_str_mv	10.1063/1.5139825
format	conference_proceeding
fullrecord	<record><control><sourceid>proquest_scita</sourceid><recordid>TN_cdi_scitation_primary_10_1063_1_5139825</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2328154455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-956625199f8c2acec8e202b26fd6251965e6e49106370d9a423d4fe682bf84b3</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKsL_0HAnTA170mWpT6h4MJZiJuQJhlNmU7GSar03zt9iDtXF879OPecC8AlRhOMBL3BE46pkoQfgRHmHBelwOIYjBBSrCCMvp6Cs5SWCBFVlnIE3qrYBQtX0fkmtO-w-g45-x46kw1cp600N9m3Gd6GPtiPxmc4bZpoTQ6xhaZ1v_sXvzJtHrymrWk2KaRzcFKbJvmLwxyD6v6umj0W8-eHp9l0XlhKZC4UF4JwrFQtLTHWW-kJIgsiarfTBffCM7VtVyKnDCPUsdoLSRa1ZAs6Bld7266Pn2ufsl7GdT9kSJoMBzBnjPOBut5TyYa8y667PqxMv9EY6a25xvrwuv_gr9j_gbpzNf0BHS9uww</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>2328154455</pqid></control><display><type>conference_proceeding</type><title>Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis</title><source>American Institute of Physics:Jisc Collections:Transitional Journals Agreement 2021-23 (Reading list)</source><creator>Qomariyah, Siti ; Iriawan, Nur ; Fithriasari, Kartika</creator><contributor>Ramli, Murni ; Nurhasanah, Farida ; Indriyanti, Nurma Yunita</contributor><creatorcontrib>Qomariyah, Siti ; Iriawan, Nur ; Fithriasari, Kartika ; Ramli, Murni ; Nurhasanah, Farida ; Indriyanti, Nurma Yunita</creatorcontrib><description>The industrial world has entered the era of industrial revolution 4.0. In this era, there is an urgent data requirement from the community to support service policies. Because of that, Surabaya Government made Media Center Surabaya. This media is used to accommodate all the aspiration of Surabaya citizen. To access this media, a citizen can use Twitter. The topic which is discussed in Twitter is important information that we need to know. The information can be used to improve the performance of Surabaya Government services. Twitter data is a text data that consists of thousands of variables. Text mining is frequently used to analyze this kind of data, including topic modeling and sentiment analysis. This study would work on topic modeling focused on the algorithm employing Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). The evaluation of the algorithm performance uses the topic coherence. As unstructured data, the Twitter data need preprocessing before the analysis. The stages of preprocessing include cleansing, stemming, and stop words. The advantages of LSA are fast and easy to implement. LSA, on the other hand, doesn’t consider the relationship between documents in the corpus, while LDA does. This study shows that LDA gives a better result than LSA.</description><identifier>ISSN: 0094-243X</identifier><identifier>EISSN: 1551-7616</identifier><identifier>DOI: 10.1063/1.5139825</identifier><identifier>CODEN: APCPCS</identifier><language>eng</language><publisher>Melville: American Institute of Physics</publisher><subject>Algorithms ; Data mining ; Digital media ; Dirichlet problem ; Government services ; Modelling ; Performance enhancement ; Preprocessing ; Semantic analysis ; Semantics ; Social networks ; Unstructured data</subject><ispartof>AIP Conference Proceedings, 2019, Vol.2194 (1)</ispartof><rights>Author(s)</rights><rights>2019 Author(s). Published by AIP Publishing.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-956625199f8c2acec8e202b26fd6251965e6e49106370d9a423d4fe682bf84b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>310,311,315,786,790,795,796,23958,23959,25170,27957,27958</link.rule.ids></links><search><contributor>Ramli, Murni</contributor><contributor>Nurhasanah, Farida</contributor><contributor>Indriyanti, Nurma Yunita</contributor><creatorcontrib>Qomariyah, Siti</creatorcontrib><creatorcontrib>Iriawan, Nur</creatorcontrib><creatorcontrib>Fithriasari, Kartika</creatorcontrib><title>Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis</title><title>AIP Conference Proceedings</title><description>The industrial world has entered the era of industrial revolution 4.0. In this era, there is an urgent data requirement from the community to support service policies. Because of that, Surabaya Government made Media Center Surabaya. This media is used to accommodate all the aspiration of Surabaya citizen. To access this media, a citizen can use Twitter. The topic which is discussed in Twitter is important information that we need to know. The information can be used to improve the performance of Surabaya Government services. Twitter data is a text data that consists of thousands of variables. Text mining is frequently used to analyze this kind of data, including topic modeling and sentiment analysis. This study would work on topic modeling focused on the algorithm employing Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). The evaluation of the algorithm performance uses the topic coherence. As unstructured data, the Twitter data need preprocessing before the analysis. The stages of preprocessing include cleansing, stemming, and stop words. The advantages of LSA are fast and easy to implement. LSA, on the other hand, doesn’t consider the relationship between documents in the corpus, while LDA does. This study shows that LDA gives a better result than LSA.</description><subject>Algorithms</subject><subject>Data mining</subject><subject>Digital media</subject><subject>Dirichlet problem</subject><subject>Government services</subject><subject>Modelling</subject><subject>Performance enhancement</subject><subject>Preprocessing</subject><subject>Semantic analysis</subject><subject>Semantics</subject><subject>Social networks</subject><subject>Unstructured data</subject><issn>0094-243X</issn><issn>1551-7616</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNp9kEtLAzEUhYMoWKsL_0HAnTA170mWpT6h4MJZiJuQJhlNmU7GSar03zt9iDtXF879OPecC8AlRhOMBL3BE46pkoQfgRHmHBelwOIYjBBSrCCMvp6Cs5SWCBFVlnIE3qrYBQtX0fkmtO-w-g45-x46kw1cp600N9m3Gd6GPtiPxmc4bZpoTQ6xhaZ1v_sXvzJtHrymrWk2KaRzcFKbJvmLwxyD6v6umj0W8-eHp9l0XlhKZC4UF4JwrFQtLTHWW-kJIgsiarfTBffCM7VtVyKnDCPUsdoLSRa1ZAs6Bld7266Pn2ufsl7GdT9kSJoMBzBnjPOBut5TyYa8y667PqxMv9EY6a25xvrwuv_gr9j_gbpzNf0BHS9uww</recordid><startdate>20191218</startdate><enddate>20191218</enddate><creator>Qomariyah, Siti</creator><creator>Iriawan, Nur</creator><creator>Fithriasari, Kartika</creator><general>American Institute of Physics</general><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>20191218</creationdate><title>Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis</title><author>Qomariyah, Siti ; Iriawan, Nur ; Fithriasari, Kartika</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-956625199f8c2acec8e202b26fd6251965e6e49106370d9a423d4fe682bf84b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Data mining</topic><topic>Digital media</topic><topic>Dirichlet problem</topic><topic>Government services</topic><topic>Modelling</topic><topic>Performance enhancement</topic><topic>Preprocessing</topic><topic>Semantic analysis</topic><topic>Semantics</topic><topic>Social networks</topic><topic>Unstructured data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qomariyah, Siti</creatorcontrib><creatorcontrib>Iriawan, Nur</creatorcontrib><creatorcontrib>Fithriasari, Kartika</creatorcontrib><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qomariyah, Siti</au><au>Iriawan, Nur</au><au>Fithriasari, Kartika</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis</atitle><btitle>AIP Conference Proceedings</btitle><date>2019-12-18</date><risdate>2019</risdate><volume>2194</volume><issue>1</issue><issn>0094-243X</issn><eissn>1551-7616</eissn><coden>APCPCS</coden><abstract>The industrial world has entered the era of industrial revolution 4.0. In this era, there is an urgent data requirement from the community to support service policies. Because of that, Surabaya Government made Media Center Surabaya. This media is used to accommodate all the aspiration of Surabaya citizen. To access this media, a citizen can use Twitter. The topic which is discussed in Twitter is important information that we need to know. The information can be used to improve the performance of Surabaya Government services. Twitter data is a text data that consists of thousands of variables. Text mining is frequently used to analyze this kind of data, including topic modeling and sentiment analysis. This study would work on topic modeling focused on the algorithm employing Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). The evaluation of the algorithm performance uses the topic coherence. As unstructured data, the Twitter data need preprocessing before the analysis. The stages of preprocessing include cleansing, stemming, and stop words. The advantages of LSA are fast and easy to implement. LSA, on the other hand, doesn’t consider the relationship between documents in the corpus, while LDA does. This study shows that LDA gives a better result than LSA.</abstract><cop>Melville</cop><pub>American Institute of Physics</pub><doi>10.1063/1.5139825</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0094-243X
ispartof	AIP Conference Proceedings, 2019, Vol.2194 (1)
issn	0094-243X 1551-7616
language	eng
recordid	cdi_scitation_primary_10_1063_1_5139825
source	American Institute of Physics:Jisc Collections:Transitional Journals Agreement 2021-23 (Reading list)
subjects	Algorithms Data mining Digital media Dirichlet problem Government services Modelling Performance enhancement Preprocessing Semantic analysis Semantics Social networks Unstructured data
title	Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-23T07%3A31%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_scita&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Topic%20modeling%20Twitter%20data%20using%20Latent%20Dirichlet%20Allocation%20and%20Latent%20Semantic%20Analysis&rft.btitle=AIP%20Conference%20Proceedings&rft.au=Qomariyah,%20Siti&rft.date=2019-12-18&rft.volume=2194&rft.issue=1&rft.issn=0094-243X&rft.eissn=1551-7616&rft.coden=APCPCS&rft_id=info:doi/10.1063/1.5139825&rft_dat=%3Cproquest_scita%3E2328154455%3C/proquest_scita%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c328t-956625199f8c2acec8e202b26fd6251965e6e49106370d9a423d4fe682bf84b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2328154455&rft_id=info:pmid/&rfr_iscdi=true