Publications
List of relevant publications on Social Media Mining.
Title | Authors | Year | |||
---|---|---|---|---|---|
|
Abba Suganda Girsang, Veren Prisscilya | 2024 | |||
AbstractPada era global saat ini, perkembangan teknologi dan informasi sangat pesat, sehingga sangat mudah untuk mendapatkan informasi/berita dari internet. Karena kemudahan untuk mendapatkan informasi tersebut, maka banyak beredar berita palsu (hoax), berita tersebut tidak tersaring sehingga siapa saja dapat menyebarkan berita yang tidak jelas isinya. Hal tersebut dapat menurunkan kredibilitas seseorang dalam dunia profesional, menimbulkan perpecahan, mengancam kesehatan fisik dan mental, serta dapat pula mengakibatkan kerugian materil. Berdasarkan hal tersebut, untuk menghentikan penyebaran hoax adalah dengan mendeteksinya sedini mungkin dan memblokirnya. Deteksi tersebut dapat menggunakan metode deep learning yang juga merupakan salah satu arsitektur transformer yaitu gabungan dari BERTopic yang digunakan untuk mencari kata-kata penting dari narasi berita, kemudian kata-kata tersebut digabungkan ke dalam narasi dan diklasifikasikan menggunakan Indo Bidirectional Encoder Representation dari Transformer (IndoBERT). Untuk percobaan, penulis menggunakan dataset yang diambil dari website kaggle.com yang berjudul dataset Indonesia False News (HOAX). Penelitian ini menggunakan laju pembelajaran 1e-5, ukuran batch 16, dan 5 epoch sehingga hasil f1-Score adalah 92% untuk data validasi dan 91% untuk data pengujian. |
|||||
|
Abba Suganda Girsang, Miracle Aurelia, Sheila Monica | 2024 | |||
AbstractThe volume of data created, captured, copied, and consumed worldwide has increased from 2 zettabytes in 2010 to over 97 zettabytes in 2020, with an estimation of 181 zettabytes in 2025. Automatic text summarization will ease giving points of information and will increase efficiency at the time consumed to understand the information. Therefore, improving Automatic Text Summarization performance in summarizing news articles is the goal of this paper. This work will fine-tune the BART model using IndoSum, Liputan6, and Liputan6 augmented dataset for abstractive summarization. Data augmentation for Liputan6 will be augmented with the ChatGPT method. This work will also use ROUGE as an evaluation metric. The data augmentation with ChatGPT used 10% of the clean news article from the Liputan6 training dataset and ChatGPT generated the abstractive summary based on that input, culminating in over 36 thousand data for the model’s fine-tuning. BART model that was finetuned using Indosum, Liputan6, and augmented Liputan6 dataset has the best ROUGE-2 score, outperforming ORACLE’s model although ORACLE still has the best ROUGE-1 and ROUGE-L score. This concludes that fine-tuning the BART model with multiple datasets will increase the performance of the model to do abstractive summarization tasks. |
|||||
|
Abba Suganda Girsang, Rahmi Fadillah Busyra | 2024 | |||
AbstractSpam refers to unsolicited messages containing harmful content such as malware, viruses, phishing, and data theft. The web form of a government ministry's website is frequently targeted by spammers, causing disruptions, database overload, hindering communication with the public, and security risks. While numerous studies have focused on spam detection, none have addressed spam detection on web form submissions and multilingual spam detection, specifically in English and Indonesian. This study developed a spam detection model to address the growing challenge of spam messages received through ministry website web forms. The proposed model employs the Long Short-Term Memory (LSTM) algorithm to detect spam in English and Indonesian effectively. The LSTM model incorporates additional stages to enhance its performance, including language detection, data augmentation, and word embedding. Evaluation results demonstrate the model's effectiveness in classifying spam and non-spam messages, particularly in datasets with balanced class distributions. This research holds practical implications for implementing the model on websites, particularly the government ministry's website, to effectively categorize incoming messages and mitigate the impact of spam. The study also contributes theoretically by showcasing the effectiveness of LSTM in spam detection and emphasizing the importance of data augmentation in handling imbalanced datasets. Overall, this study provides valuable insights and practical solutions for spam detection in web forms, applicable to government ministry websites, and expands the scope of spam detection in multiple languages, specifically English and Indonesian. |
|||||
|
Abba Suganda Girsang, Anindra Ageng Jihado | 2024 | |||
AbstractNetwork security has become crucial in an era where information and data are valuable assets. An effective Network Intrusion Detection System (NIDS) is required to protect sensitive data and information from cyberattacks. Numerous studies have created NIDS using machine learning algorithms and network datasets that do not accurately reflect actual network data flows. Increasing hardware capabilities and the ability to process big data have made deep learning the preferred method for developing NIDS. This study develops a NIDS model using two deep learning algorithms: Convolutional Neural Network (CNN) and Bidirectional Long-Short Term Memory (BiLSTM). CNN extracts spatial features in the proposed model, while BiLSTM extracts temporal features. Two publicly available benchmark datasets, CICIDS2017 and UNSW-NB15, are used to evaluate the model. The proposed model surpasses the previous method in terms of accuracy, achieving 99.83% and 99.81% for binary and multiclass classification on the CICIDS2017 dataset. On the UNSW-NB15 dataset, the model achieves accuracies of 94.22% and 82.91% for binary and multiclass classification, respectively. Moreover, Principal Component Analysis (PCA) is also used for feature engineering to improve the speed of model training and reduce existing features to ten dimensions without significantly impacting the model’s performance. |
|||||
|
Abba Suganda Girsang, Natasha Alyaa Anindyaputri | 2024 | |||
AbstractGangguan depresi merupakan masalah kejiwaan yang memiliki kontribusi tinggi terhadap penyebab disabilitas di dunia, karena dapat menyebabkan orang yang mengalaminya tidak dapat berprestasi dalam kehidupan sehari-hari yang dapat mempengaruhi kehidupan kerja mereka. Untuk membantu pasien yang membutuhkan penanganan segera, Organisasi Kesehatan Dunia (WHO) telah membuat program yang disebut Mental Health Gap Action Program (mhGAP) dalam rangka meningkatkan pembangunan yang berfokus pada pemantauan kesehatan mental di dunia yang mengglobal. Penelitian ini mengeksplorasi beberapa arsitektur model klasifikasi yang berbeda dengan pendekatan deep learning untuk mendeteksi tweet pengguna Twitter yang diklasifikasikan sebagai tweet dengan gangguan depresi dan kemudian membandingkan hasilnya antara model yang diusulkan. Hasilnya, model Long Short-Term Memory (LSTM) yang dipasangkan dengan model representasi kata Transformer telah memberikan kinerja yang lebih baik jika dibandingkan dengan menggunakan klasifikasi dasar Bidirectional Encoder Representations from Transformers (BERT) atau model klasifikasi LSTM saja. Evaluasi arsitektur model klasifikasi mengungkapkan bahwa model LSTM, jika dipasangkan dengan MentalBERT, menunjukkan kinerja akurasi tertinggi, mencapai skor akurasi 0,86. Pendekatan ini memanfaatkan keunggulan masing-masing model transformer dan LSTM untuk memfasilitasi analisis akurat informasi sintaksis dan kontekstual yang terkait dengan kata-kata individual, sehingga memungkinkan representasi yang tepat atau domain spesifik dari data sekuensial. |
|||||
|
Abba Suganda Girsang, Bima Krisna Noveta | 2024 | |||
AbstractPurpose The purpose of this study is to provide the location of natural disasters that are poured into maps by extracting Twitter data. The Twitter text is extracted by using named entity recognition (NER) with six classes hierarchy location in Indonesia. Moreover, the tweet then is classified into eight classes of natural disasters using the support vector machine (SVM). Overall, the system is able to classify tweet and mapping the position of the content tweet. Design/methodology/approach This research builds a model to map the geolocation of tweet data using NER. This research uses six classes of NER which is based on region Indonesia. This data is then classified into eight classes of natural disasters using the SVM. Findings Experiment results demonstrate that the proposed NER with six special classes based on the regional level in Indonesia is able to map the location of the disaster based on data Twitter. The results also show good performance in geocoding such as match rate, match score and match type. Moreover, with SVM, this study can also classify tweet into eight classes of types of natural disasters specifically for the Indonesian region, which originate from the tweets collected. Research limitations/implications This study implements in Indonesia region. Originality/value (a)NER with six classes is used to create a location classification model with StanfordNER and ArcGIS tools. The use of six location classes is based on the Indonesia regional which has the large area. Hence, it has many levels in its regional location, such as province, district/city, sub-district, village, road and place names. (b) SVM is used to classify natural disasters. Classification of types of natural disasters is divided into eight: floods, earthquakes, landslides, tsunamis, hurricanes, forest fires, droughts and volcanic eruptions. |
|||||
Abba Suganda Girsang, Bambang Nursandi | 2024 | ||||
AbstractDi Indonesia, pencemaran limbah menimbulkan tantangan lingkungan dan kesehatan yang mendesak, sehingga klasifikasi yang akurat sangat penting untuk upaya mitigasi yang terarah. Penelitian kami bertujuan untuk mengekstrak data yang relevan dari Twitter guna mengatasi masalah ini dan menilai seberapa efektif model DistilBERT dapat mengklasifikasikan teks dalam bahasa Indonesia terkait pencemaran limbah. DistilBERT, padanan yang lebih ramping dari arsitektur BERT yang diakui, dirancang untuk mencerminkan pemahaman linguistik BERT yang canggih tetapi dengan tuntutan komputasi yang lebih rendah. Dengan memanfaatkan esensi pembelajaran transfer, metode yang diusulkan menggunakan DistilBERT mendapat manfaat dari kumpulan data tekstual yang luas, sehingga ideal untuk skenario dengan aksesibilitas data terbatas. Kami mengadopsi DistilBERT untuk tantangan khusus dalam mengklasifikasikan jenis limbah menggunakan kumpulan data terbatas yang berasal dari percakapan Twitter dalam bahasa Indonesia—media yang dikenal karena kontennya yang ringkas dan sering kali ambigu. Meskipun cakupan kumpulan data terbatas dan gangguan yang melekat pada data Twitter, hasil penelitian menggunakan DistilBERT menunjukkan kemanjuran yang mencengangkan, mencapai Presisi: 98%, Ingat: 98%, dan Skor F1: 98%. Hasil ini menggarisbawahi kemampuan DistilBERT untuk menavigasi dan memahami nuansa tekstual yang kompleks dalam lingkungan yang terbatas datanya. Penelitian kami juga mencakup analisis komparatif dengan metode lain, yang selanjutnya menyoroti pentingnya pembelajaran transfer dalam mengatasi tantangan pemrosesan bahasa alami, khususnya dalam konteks kritis seperti upaya pengelolaan limbah di Indonesia. |
|||||
|
Abba Suganda Girsang, Julyanto Wijaya | 2024 | |||
AbstractThe surge in global technological advancements has led to an unprecedented volume of information sharingacross diverse platforms. This information, easily accessible through browsers, has created an overload, making it challenging for individuals to efficiently extract essential content. In response, this paper proposes a hybrid Automatic Text Summarization (ATS) method, combining LexRank and YAKE algorithms. LexRank determines sentence scores, while YAKE calculates individual word scores, collectively enhancing summarization accuracy. Leveraging an unsupervised learning approach, the hybrid model demonstrates a 2% improvement over its base model. To validate the effectiveness of the proposed method, the paper utilizes 5000 Indonesian news articles from the Indosum dataset. Ground-truth summaries are employed, with the objective of condensing each article to 30% of its content. The algorithmic approach and experimental results are presented, offering a promising solution to information overload. Notably, the results reveal a two percent improvement in the Rouge-1 and Rouge-2 scores, along with a one percent enhancement in the Rouge-L score. These findings underscore the potential of incorporating a keyword score to enhance the overall accuracy of the summaries generated by LexRank. Despite the absence of a machine learning model in this experiment, the unsupervised learning and heuristic approach suggest broader applications on a global scale. A comparative analysis with other state-of-the-art text summarization methods or hybrid approaches will be essential to gauge its overall effectiveness. |
|||||
|
Abba Suganda Girsang, Andien Dwi Novika | 2024 | |||
AbstractThis study introduces an innovative hyperparameter optimization approach for enhancing multilayer perceptrons (MLP) using the Jaya algorithm. Addressing the crucial role of hyperparameter tuning in MLP’s performance, the Jaya algorithm, inspired by social behavior, emerges as a promising optimization technique without algorithm-specific parameters. Systematic application of Jaya dynamically adjusts hyperparameter values, leading to notable improvements in convergence speeds and model generalization. Quantitatively, the Jaya algorithm consistently achieves convergences at first iteration, faster convergence compared to conventional methods, resulting in 7% higher accuracy levels on several datasets. This research contributes to hyperparameter optimization, offering a practical and effective solution for optimizing MLP in diverse applications, with implications for improved computational efficiency and model performance |
|||||
|
Abba Suganda Girsang, Yusuf Priyo Anggodo | 2023 | |||
AbstractThe development of financial technology (Fintech) in emerging economies such as Indonesia has been rapid in the last few years, opening a great potential for loan businesses, from venture capital to micro and personal loans. To survive in such competitive markets, new companies need a robust credit-scoring model. However, building a reliable model requires large stable data. The challenge is that datasets are often small, covering only a few months (short-period datasets). Therefore, this study proposes a modified binning method, namely changing a variable’s values into two groups with the smallest distribution differences possible. Modified binning can maintain data trends to avoid future shifting. The simulation was conducted using a real dataset from Indonesian Fintech, comprising 44,917 borrower-level observations with 396 variables. To match the actual conditions, the first three months of data were allocated for modeling and the remaining for testing. Implementing modified binning and logistics regression to testing data results in a more stable score band than standard binning. Compared with other classifier methods, the proposed method obtained the best AUC results on the testing data (0.73). In addition, the proposed method is highly applicable as it can provide a straightforward explanation to upper management or regulators. It is practical to use in real-case financial technology with short-period problems. |
|||||
|
Abba Suganda Girsang, DIAN ANGGRAINI | 2023 | |||
AbstractWeb shell is a malicious program used to remotely access web servers during cyberattacks. Malicious web shells closely resemble benign web shells, making them difficult to distinguish. The challenge in detecting pre-existing web shells is that this type of malware is hard to detect using an intrusion detection system (IDS) or antivirus techniques. This is because web shells are usually hidden within web applications, making them challenging to differentiate from regular web application source code. Therefore, traditional detection models that analyze the dynamic features of web shell script execution are more effective in detecting existing malware attacks. In this study, A method of web shell detection based on dynamic bytecode features using a convolutional neural network (CNN) has been proposed in this research. Word2vec is employed to obtain vectorized features from the bytecode or opcode. Experimental results using a training dataset of 2577 samples and a validation dataset of 645 samples yield the best model with an accuracy of 99.86% at epoch 100. The experiments demonstrate that this model effectively detects web shells, with a significant increase in accuracy levels. |
|||||
|
Abba Suganda Girsang, ELFRIDA BA SIAHAAN | 2023 | |||
AbstractDeteksi Injeksi SQL menyediakan kemampuan untuk memantau serangan Injeksi SQL pada situs web. Saat ini, para peneliti menggunakan Pembelajaran Mendalam untuk mendeteksi Injeksi SQL. Namun, deteksi ini memiliki keterbatasan, seperti False Positives (FP), False Negatives (FN) yang tinggi, dan Akurasi yang rendah karena deteksi Injeksi SQL hanya menggunakan data URI. Pada saat yang sama, serangan tidak hanya terjadi melalui URI tetapi juga Referrer. Oleh karena itu, penelitian ini bertujuan untuk menggunakan kombinasi URI dan Referrer untuk mendeteksi serangan dan membahas peningkatan kinerja model karena menambahkan Referrer. Langkah pertama penelitian ini adalah melakukan praproses dataset dan kemudian melakukan vektorisasi menggunakan Word2Vec. Metode Word2Vec dan CNN diusulkan menggunakan kombinasi URI dan Referrer, kemudian dibandingkan dengan CNN Word2Vec menggunakan URI. Hasil eksperimen menunjukkan bahwa metode yang diusulkan berkinerja lebih baik daripada metode lain dan mendapatkan akurasi lebih dari 99% dari payload dengan tingkat kesalahan yang rendah. |
|||||
|
Abba Suganda Girsang, ALFONSUS SUCAHYO HARIAJI | 2023 | |||
AbstractBotnets are one of the recent main cyber security threats. In order to avoid detection, botnets use Domain Generation Algorithm (DGA) to generate malicious domain names and maintain communication between infected bots and command and control server (C&C). Botnet malwares use various algorithm to generate domain names such as arithmetic, hashing, and wordlist/dictionary techniques. Recent traditional machine learning and deep learnin based detection methods need handcrafted domain name features which require more effort and advanced expertise and knowledge. This study aims to detect and classify DGA malicious domain without manually define and handcraft domain name features by only using the domain name. Ngrams method was used to create sequences of domain names and then vectorize the sequences using word embedding technique to create n-grams embedding model. After vectorization, Bidirectional Gated Recurrent Unit (BiGRU) was used for domain name classification and attention mechanism was used to improve classification performance by applying attention weight. The experiment results demonstrate the N-Grams Embedding and Attention-based BiGRU model proposed in this paper can detect and classify various type of DGA domains generated by arithmetic, hashing, and wordlist algorithm more effective compared to older algorithm such as CNN and LSTM for both DGA malicious domain detection and classification task. The use of attention mechanism can also improve the accuracy and performance of the DGA malicious domain detection model compared to models that do not use attention mechanism. |
|||||
|
Abba Suganda Girsang, Restu Herdian Herdian | 2023 | |||
AbstractIn recent years, the telecommunication industry is growth and become very competitive where has reached the point maintaining customer is very essential than acquiring new customer. And the two key factor for maintaining customer, the first is defining the segment of customer want to churn and the second is accuracy of predictive model. In this article we propose the hybrid model based on decision tree and artificial neural network (ANN) with the two stages of process to answer the problem of maintaining customer, the first is a segmentation phase with decision rules and the second is a prediction phase with artificial neural network (ANN). Our finding in benchmarked against the previous algorithms (decision tree and ANN) with the AUC metrics show the proposed model or hybrid achieves better accuracy and with the comprehensive information of what a drive customer churn. |
|||||
Abba Suganda Girsang, Daniel Wilianto | 2023 | ||||
AbstractGrading students’ answers has always been a daunting task which takes a lot of teachers’ time. The aim of this study is to grade students’ answers automatically in a high school’s e-learning system. The grading process must be fast, and the result must be as close as possible to the teacher assigned grades. We collected a total of 840 answers from 40 students for this study, each already graded by their teachers. We used Python library sentence-transformers and three of its latest pre-trained machine learning models (all-mpnet-base-v2, all-distilroberta-v1, all-MiniLM-L6-v2) for sentence embeddings. Computer grades were calculated using Cosine Similarity. These grades were then compared with teacher assigned grades using both Mean Absolute Error and Root Mean Square Error. Our results showed that all-MiniLM-L6-v2 gave the most similar grades to teacher assigned grades and had the fastest processing time. Further study may include testing these models on more answers from more students, also fine tune these models using more school materials. |
|||||
|
Abba Suganda Girsang, Fransisco Junius Amadeus | 2023 | |||
AbstractThe act of simplifying a text from its original source is known as text summarization. Instead of capturing the substance of the original content, an effective summary should be able to convey the information. Recent research on this form of extractive summarization has produced encouraging findings. A graphical model and a modified ant system method will be combined in this literature to provide a solution. The pheromone modification will decide which pertinent phrases will be selected to be a decent summary structure, while the modification process will focus on the point at which the graph construction will be built to represent an article. Additionally, a dataset (Indosum) including news stories that are often utilized in relevant research will be used in accordance to the summary in Indonesian. In addition, the ROUGE approach will be utilized as a tool for evaluation to rate the summary’s quality. Finally, this paper concludes with the challenges and future directions of text summarization. |
|||||
|
Abba Suganda Girsang, Diana, Suminar Ariwibowo | 2023 | |||
AbstractThis paper aims to build hate speech text classification model by applying a combination of LSTM and FastText. The features of hate speech & non-hate speech, target hate speech, and categories of the hate speech. Dataset of those features taken from previous research by Okky Ibrohim. FastText word embeddings is used for formation of text vectors that will be used as input of the LSTM training model. The evaluation results obtained by getting the level of accuracy using confusion matrix. The accuracy value of text classification in this study is 83.52% on the classification of hate speech, 78.44% on the classification of target labels for hate speech, 82.75% on the classification of the label for category of hate speech. |
|||||
|
Abba Suganda Girsang, William Harly | 2022 | |||
AbstractPurpose With the rise of online discussion and argument mining, methods that are able to analyze arguments become increasingly important. A recent study proposed the usage of agreement between arguments to represent both stance polarity and intensity, two important aspects in analyzing arguments. However, this study primarily focused on finetuning bidirectional encoder representations from transformer (BERT) model. The purpose of this paper is to propose convolutional neural network (CNN)-BERT architecture to improve the previous method. Design/methodology/approach The used CNN-BERT architecture in this paper directly uses the generated hidden representation from BERT. This allows for better use of the pretrained BERT model and makes finetuning the pretrained BERT model optional. The authors then compared the CNN-BERT architecture with the method proposed in the previous study (BERT and Siamese-BERT). Findings Experiment results demonstrate that the proposed CNN-BERT is able to achieve a 71.87% accuracy in measuring agreement between arguments. Compared to the previous study that achieve an accuracy of 68.58%, the CNN-BERT architecture was able to increase the accuracy by 3.29%. The CNN-BERT architecture is also able to achieve a similar result even without further pretraining the BERT model. Originality/value The principal originality of this paper is the proposition of using CNN-BERT to better use the pretrained BERT model for measuring agreement between arguments. The proposed method is able to improve performance and also able to achieve a similar result without further training the BERT model. This allows separation of the BERT model from the CNN classifier, which significantly reduces the model size and allows the usage of the same pretrained BERT model for other problems that also did not need to finetune their BERT model. |
|||||
|
Abba Suganda Girsang, Rifqi Ramadhani Almassar | 2022 | |||
AbstractMicroblogging is a form of communication between users to socialize by describing the state of events in real-time. Twitter is a platform for microblogging. Indonesia is one of the countries with the largest Twitter users, people can share information about traffic jams. This research aims to detect traffic jams by extracting tweets in the form of vectors and then inserting them into the Convolution neural network (CNN) model and getting the best model from CNN+Word2Vec, CNN+FastText, and support vector machine (SVM). Data retrieval was conducted using the Rapidminer application. Then, the context of the tweets was checked so that there were 2777 data consisting of 1426 congestion road data and 1351 smooth road data. The data was taken from certain coordinate points in around Jakarta, Indonesia. Then, preprocessing and changes to vector form were carried out using the Word2Vec and FastText methods, then inserted into the CNN model. The results of CNN+Word2Vec and CNN+FastText were compared to the SVM method. The evaluation was done manually using the actual traffic conditions. The highest result obtained using test data by the CNN+FastText method are 86.33% while CNN+Word2Vec is 85.79% and SVM is 67.62%. |
|||||
Abba Suganda Girsang, Dewa Bagus Gde Khrisna Jayanta Nugraha | 2022 | ||||
AbstractThe number of moviegoers in Indonesia continues to rise year after year until 2019. However, due to the COVID-19 pandemic, most Indonesian cinemas were closed in early 2020. Moviegoers are increasingly turning to digital platforms to watch films. Based on the films shown, they can be divided into three categories: films for children, films for adolescents, and films for adults. A system that can automatically classify the faces of the audience based on their age category is required. Using Deep Learning, this study aims to classify the audience's age based on facial photos. The first stage involves collecting data from three datasets: All-Age-Face, FaceAge, and FGNET, which are then combined and relabeled based on age group. Preprocessing and hyperparameter testing were also performed. Finding the best learning rate and bottleneck layer is the goal of hyperparameter testing. The training process employs learning rete and the two best bottleneck layers with six models, namely MobileNet, MobileNetV2, VGG16, VGG19, Xception, and ResNet101V2. Global Average Pooling was added at the end of the layer in each model. The MobileNet model on two bottleneck layers yielded the best testing accuracy value of 85.44 percent in this study. |
|||||
|
Abba Suganda Girsang, Eko Prasetyo, Ryan Ari Setyawan | 2019 | |||
AbstractInsurance company are certainly has a rich of data from their business process. One of the data is related to business process sales. From this sales data can be used to analyze the condition of a company whether the condition of the company is in a good condition or not. The purpose of this research is to develop a technique to analyze this data from the company. The method used to implement the data analysis in this paper is to design a data warehouse. Use ninestep methodology as the method for design data warehouse and Pentaho as a tool for ETL (Extract, Transform, Load), OLAP analysis and reporting data, the results of this research concluded that the implementation of data warehouse for perform data analysis and reporting better and more effective. |
|||||
|
Abba Suganda Girsang, Atria Dika Puspita, Finda Anisa Putri, Nindhia Hutagaol, Sani Muhamad Isa | 2019 | |||
AbstractAs the airline company in Indonesia which is the one of domestic leading airlines, Sriwijya Air has been created the electronic voucher called e-voucher Sriwijaya Air for loyal customers. The e-voucher is used as a flight ticket payment instead of cash payment to buy a ticket. The e-voucher consists of some attributes such as destination, flown date, and flight class. Particularly, a marketing manager needs to review the utilization of e-voucher when a ticket has been approved in every year. The proposed solution is to redeploy existing OLTP database into data warehouse. The data warehouse already developed using nine-step Kimball's methodology. All data will be analyzed using OLAP to present a report in a visual form such as dashboard. Furthermore, airlines company can produce an analysis report using business intelligence. |
|||||
|
Abba Suganda Girsang, Joseph Tarigan | 2018 | |||
AbstractSarcasm detection is an important task in natural language processing (NLP). Sarcasm flips the polarity of a sentence and will affect the accuracy of sentiment analysis task. Recent researches incorporate machine learning and deep learning methods to detect sarcasm. Sarcasm can be detected by the occurrence of context disparity. This feature can be detected by observing the similarity score of each word in the sentence. Word embedding vector is used to calculate word similarity score. In this work, the word similarity score is incorporated as an augmented feature in the deep learning model. Three augmenting schemes in deep learning models are observed. Results show that in general, a word similarity score boosts the performance of the classifier. The accuracy of 85.625% with F-Measure of 84.884% was achieved at its best. |
|||||
|
Abba Suganda Girsang, Fajar Ciputra Daeng Bani | 2018 | |||
AbstractThe problem with telecommunications companies today is that transactional data is more extensive than existing source tables. This makes business reporting less efficient and overwhelms query processing results in data warehouses so that they do not meet business requirements. The fast and complex evolution of the digital world must be scalable to the data warehouse process, so that the authors implement it in the data warehouse using massive parallel processing (MPP) with the Greenplum database, so that business users can get reports faster and more optimally. This case study explains how the MPP system implements and measures the performance of the Greenplum database by performing complex queries in the data warehouse with parallel processing. Therefore, this case study analyzes whether the use of MPP systems can measure the scalability of throughput and the response time in the data warehouse so that system performance in the Greenplum database remains stable for daily, weekly, and monthly operations. |
|||||
|
Abba Suganda Girsang, Janner Simarmata, Nanda Adytiansya, Oei Kurniawan Utomo, Sani Muhamad Isa | 2018 | |||
AbstractThe Constitutional Court of The Republic of Indonesia is an institution which has authority to judge and examine constitutions. In this institution, each component has responsibility to manage their own finance and budgeting. Based on this case, they need a solution to summarize the amount and absorption of the budget in every component. This paper proposes a solution to design a data warehouse model to process and analyse the down payment administration data. The nine steps design methodology for developing data warehouse is implemented in this work. This paper aims to bring some benefit for The Constitutional Court, especially in form of information about down payment absorption for every component each year. The result is presented in form of pdf report and informative dashboard using data warehouse tools. |
|||||
|
Abba Suganda Girsang, Edward Chandra, Ryo Hadinata, Sani Muhamad Isa | 2018 | |||
AbstractNowadays, most universities have a lot of different ways in considering graduation eligibility for their students. The consideration can use the data which is generated by system online analytical processing (OLTP). However the OLTP system has a limitation to report intuitively. The aim of this research is to create the system using data warehouse which able to see the progress of performance student and the course failed in each term. This system is expected able to predict student's graduation eligibility. Research method of this research consists some steps to implement data warehouse and then is followed by generating the report and on-line analytical processing (OLAP) analysis. Evaluation is done by benchmarking current system with the proposed system. The result shows the system can provide data for the graduation eligibility. |
|||||
|
Abba Suganda Girsang, Michael Yulianto, Reinert Yosua Rumagit | 2018 | |||
AbstractElectronic ticket (eticket) provider services are growing fast in Indonesia, making the competition between companies increasingly intense. Moreover, most of them have the same service or feature for serving their customers. To get back the feedback of their customers, many companies use social media (Facebook and Twitter) for marketing activity or communicating directly with their customers. The development of current technology allows the company to take data from social media. Thus, many companies take social media data for analyses. This study proposed developing a data warehouse to analyze data in social media such as likes, comments, and sentiment. Since the sentiment is not provided directly from social media data, this study uses lexicon based classification to categorize the sentiment of users' comments. This data warehouse provides business intelligence to see the performance of the company based on their social media data. The data warehouse is built using three travel companies in Indonesia. As a result, this data warehouse provides the comparison of the performance based on the social media data. |
|||||
|
Abba Suganda Girsang, Michael Yulianto, Reinert Yosua Rumagit | 2018 | |||
AbstractElectronic ticket (eticket) provider services are growing fast in Indonesia, making the competition between companies increasingly intense. Moreover, most of them have the same service or feature for serving their customers. To get back the feedback of their customers, many companies use social media (Facebook and Twitter) for marketing activity or communicating directly with their customers. The development of current technology allows the company to take data from social media. Thus, many companies take social media data for analyses. This study proposed developing a data warehouse to analyze data in social media such as likes, comments, and sentiment. Since the sentiment is not provided directly from social media data, this study uses lexicon based classification to categorize the sentiment of users' comments. This data warehouse provides business intelligence to see the performance of the company based on their social media data. The data warehouse is built using three travel companies in Indonesia. As a result, this data warehouse provides the comparison of the performance based on the social media data. |
|||||
|
Abba Suganda Girsang, Ivan Alexander, Rian Rassetiadi, Samuel Garcia, Sani Muhamad Isa | 2018 | |||
AbstractIn this increasingly modern world, marketing is a very interesting job. If more products are purchased by customers then the salary received is also higher. One of important issue in PT. XYZ is how to choose which products are now being sought/searched by the customer. The goal of this paper is to create a system and report that can help marketing to choose which product suitable to be offered to customer. This paper proposes a data warehouse development through methodology designed by Kimball and Ross. Furthermore, the data can be analyzed using OLAP and then displayed on the dashboard. With this business solution, PT. XYZ can process and choose what kind of products are suitable for customer relatively faster. The result shows that the system can choose which products are suitable to be offered to customer from the dashboard display. |
|||||
|
Abba Suganda Girsang, Ahmad Nurul Fajar, Sani Muhamad Isa, Yoel Frans Alfredo | 2018 | |||
AbstractXYZ company is an online travel company that provides services for booking hotel and flight. As the companies grow fast, they need fast report to help the management create a decision for developing their product. They need this report to help them decide what the action they need to do based on the data they have for compete with fellow rival in online travel company. This data warehouse development has purpose to help provide a big picture of XYZ company's customers behavior through data in report generated. The data warehouse development uses some steps designed by Kimball. Information generated will be presented in dashboard or report. The data warehouse is expected to be able to provide quick and accurate information for management to choose best decision on next product development based on customer behavior. |
|||||
|
Abba Suganda Girsang, Martin Sujono, Mira Hidayati, Okta Purnama Rahadian, Sani Muhamad Isa | 2018 | |||
AbstractAs an online digital music content provider, company PT ABC has a portal that contains songs that can be streamed by customers. The music director has been using global trend, i.e. radio top music charts to produce favorite song list. This study will utilize data warehouse and business intelligence for PT ABC's marketing and sales strategy, e.g. to improve its service to customers, to increase number of active users and the duration of streaming per user. OLTP data suggested for the data warehouse is the existing streaming transactions, which will be used to analyze customers' most preferred songs. The proposed method to design the data warehouse is using 9-steps from Kimball. The selected process for the data warehouse is customers' streaming and downloading process, with dimensions of location, time and song, with fact less table of the streaming itself. The result of analysis will be customers' most preferred play list based on several parameters, such as customers' locations, streaming hours or days and genre of songs. |
|||||
|
Abba Suganda Girsang, Emil Robert Kaburuan, Herry Saputra, M Apriadin Nuriawan, Reginald Putra Ghozali, Sani Muhamad Isa | 2018 | |||
AbstractAs the one of Indonesian association of construction service companies, the association ensure to recommend and record every construction service company (their members) to works according to valid Indonesian law and constitution. The problem in this association, as the job is to give detailed reports to government, investor either domestic or international, and every institutions that need information such as market needs, resource developments, and improvement of construction technologies, is to create queries for every data from tables in MySQL to create reports quickly, efficiently, and automatic. Example of reporting needs are: In what month construction project done at the most within 1 year; Project duration, project cost, and experience at the most for each construction company; or amount of projects based on source of funds (APBN, APBD, LOAN PEMERINTAH, BUMN, or SWASTA) for each construction companies. Recommended solution is using Data Warehouse where it has structured storage and can analyze data with OLAP that has capability to process and visualize data at high speed. To construct the Data Warehouse, the design used Kimball Method so this association company will have capability to generate report according to every instances' needs. |
|||||
|
Abba Suganda Girsang, Davy Yeria Gunarso, Eko Cahyo Nugroho, Karona Cahya Susena, Sani Muhamad Isa | 2018 | |||
AbstractReporting using query script mostly takes the long process. It makes the company is too late to provide the strategy decision. This also results inefficient way because the updating and changing SQL query to be adjusted for matching the requirements of decision makers. So, this paper describes how to implement the data warehouse technology to create the fastest way to make a report based on the requirements without customize the SQL query and just read the required data only. With Kimball methodology, data could be reported in various forms and the decision makers could take the best decision in eProcurement System. |
|||||
|
Abba Suganda Girsang, Dita Madonna Simanjuntak, Karona Cahya Susena, Wardi Fadillah | 2018 | |||
AbstractBank XYZ is a bank in Bandung that gets profit from customers who borrow money. According to Bank Indonesia (BI) standars, if a bank has a non performing loan (NPL) or a value above 5% so the bank suffers losses. In this case, the bank must have a strategy to reduce this value to be stable. Bank XYZ concern about performance of collectors because the collector is the core of the bank. The solution for this case study is using business intelligence carried on the data warehouse and OLAP development. So this paper describes how important the selection of data that can help managers to make decision so that performance can be better. |
|||||
|
Abba Suganda Girsang, Ko-Wei Huang, Ze-Xue Wu | 2018 | |||
AbstractData clustering is a well-known data mining approach that usually used to minimizes the intra distance but maximizes inter distance of each data center. The cluster problem has been proved to be an NP-hard problem. In this paper, a hybrid algorithm based on Whole optimization algorithm (WOA) and Crow search algorithm (CSA) is proposed, namely HWCA. The HWCA algorithm has the advantages of the search strategy of the WOA and CSA. In addition to, there are two operators used to improve the quality of solution, namely hybrid individual operator and enhance diversity operator. The hybrid individual operator is used to exchanges individuals from the WOA and CSA systems by using the roulette wheel approach. In other hand, the HWCA performs enhance diversity operator to improve the quality of each system. More over, the HWCA is incorporated with center optimization strategy to enhance diversity of each system. In the performance evaluation, the proposed MPGO algorithm was comparison WOA and CSA algorithm with six well-known UCI benchmarks. The results show that the proposed algorithm has a higher measure of accuracy rate with comparison algorithms. |
|||||
|
Abba Suganda Girsang, Melva Hermayanty Saragih | 2017 | |||
AbstractCurrently, social media are commonly used by business organizations to get many advantages. Through social media, business gets feedback from their customer by giving them a room for sharing opinion or comment. This kind of customer engagement became a tool of marketing and promotion. This research proposes investigating the customer engagement by analysis the comments on social media (Facebook and Twitter) in transport online. This study investigates by mining the comments of fan page Facebook and tweet of Twitter in three transports online in Indonesia; Gojek, Grab, and Uber using the API service which is provided both of social media. The data comments are classified into some categories, positive, negative, and neutral sentiment using TF-IDF. The result shows that the category "Feedback system by driver" and "Feedback system by user" have the most comments for three means of transports online, while category "Service quality for driver" has the smallest comments. The study also reveals that the most comments are complaining. This feedback of social media can be used to evaluate the performance of these business transport online. |
|||||
|
Abba Suganda Girsang, Bambang Susilo, Danang Satya, Dudi Ramdani, Salman Al Fariz, Sani Muhamad Isa | 2017 | |||
AbstractAs a big hotel which has many branches in Indonesia, a hotel should need the express report for business development in order to take decision. This paper proposes data warehouse for supporting business in XYZ hotel to get better information about customers data of each branches for taking decision. The data warehouse development consists nine step methodology designed by Kimball and Ross. Furthermore, the data is able to be presented in dashboard or report corresponding to the user to simplify the data presentation, The data warehouse is used to integrate the data sources needed to provide quick and accurate information. |
|||||
|
Abba Suganda Girsang, Arie Purnama, Evans Andita, Ferico Samuel, Sani Muhamad Isa | 2017 | |||
AbstractCurrently, the capability to present reports or data accurately, fast and insightfully is highly required for the company to make data-driven decision. This paper is intended to solve the problem in one company which serves more than sixteen professional companies in South East Asia area. This report is useful to make decisions in this company. The method chosen to develop the data warehouse along with its analytics and reports is Kimball methodology which has been introduced since the mid-1980s and has been used by a lot of prior researchers. As the data can be displayed in various form as it needed, the stakeholder can make the data-driven decision which benefits this company to perform better in the market. |
|||||
|
Abba Suganda Girsang, Candrauji Wira Prakoso | 2017 | |||
AbstractAs the main public company in telecommunication and broadband business in Indonesia, PT XYZ has always tried to ensure adequate wifi access service to meet the needs of broadband consumers. This wifi service, called wifi-id, uses an Authentication, Authorization, Accounting (AAA) server by collaborating with the other network providers. This study was based on the need of PT. XYZ to present reports or data quickly when needed, especially when PT. XYZ needs to reconcile the total monthly customer usage with business partners in order to do billing revenue sharing partnership. The problem arises because the script of CDR (called data record), which represents the use of broadband consumers, is stored in the form of CSV file only. It makes it difficult to make a report of customer access which is needed by PT. XYZ. The suggested solution is to move the existing data into a more structured storage in data warehouse. Data warehouse development was done through a nine-step methodology designed by Kimball and Ross. Furthermore, the data can be analyzed using OLAP to present the data in a visual form such as report or dashboard. With this solution, PT. XYZ can process and present the report relatively faster. Additionally, PT. XYZ will also get more benefit from the data stored in the form of information. |
|||||
|
Abba Suganda Girsang, AS Sinaga | 2017 | |||
AbstractThe accreditation aims assuring the quality the quality of the institution education. The institution needs the comprehensive documents for giving the information accurately before reviewed by assessor. Therefore, academic documents should be stored effectively to ease fulfilling the requirement of accreditation. However, the data are generally derived from various sources, various types, not structured and dispersed. This paper proposes designing a data warehouse to integrate all various data to prepare a good academic document for accreditation in a university. The data warehouse is built using nine steps that was introduced by Kimball. This method is applied to produce a data warehouse based on the accreditation assessment focusing in academic part. The data warehouse shows that it can analyse the data to prepare the accreditation assessment documents. |
|||||
|
Abba Suganda Girsang, Andri Wijaya | 2016 | |||
AbstractThis article discusses the analysis of customer loyalty using three data mining methods: C4. 5, Naive Bayes, and Nearest Neighbor Algorithms and real-world empirical data. The data contain ten attributes related to the customer loyalty and are obtained from a national multimedia company in Indonesia. The dataset contains 2269 records. The study also evaluates the effects of the size of the training data to the accuracy of the classification. The results suggest that C4. 5 algorithm produces highest classification accuracy at the order of 81% followed by the methods of Naive Bayes 76% and Nearest Neighbor 55%. In addition, the numerical evaluation also suggests that the proportion of 80% is optimal for the training set. |