DMIS LAB
Data Mining and Information Systems Lab
PI: Prof. Jaewoo Kang, Dept. of Computer Science and Engineering, Korea Univ.
(고려대학교 컴퓨터학과 강재우교수 연구실)
About
Data science has advanced to the point where it is changing our world. It is now the center of exploring and uncovering knowledge in different domains and acts as a bridge to connect them. With ever growing amount of data and opportunity to explore, the Data Mining and Information Systems (DMIS) lab aims to drive the data science revolution.
The main focus of DMIS Lab is to utilize and leverage AI and machine learning (ML) to solve problems in bioinformatics, drug discovery, and biomedical text mining. In order to diversify and strengthen its arsenal, DMIS Lab also conducts research in other areas such as natural language processing (NLP) and graph ML in order to uncover more-refined techniques to carry out its larger mission. Aside from conducting research, DMIS Lab also participates in various international challenges and competitions in order to contribute to the communal effort to tackle unmet needs in the field of biomedine, such as the DREAM Challenges and the BioASQ Challenges.
News
Sep. 2024: Chanwoong Yoon's paper, CompAct: Compressing Retrieved Documents Actively for Question Answering, was accepted to EMNLP 2024 (Miami), one of the top conferences in NLP. Congratulations!
Sep. 2024: Congratulations to Dr. Mogan Gim on his appointment as a tenure-track Assistant Professor in the Department of Biomedical Engineering at Hankuk University of Foreign Studies!
Sep. 2024: Congratulations to Dr. Bumsoo Kim on his appointment as a tenure-track Assistant Professor in the School of Computer Science and Engineering at Chung-Ang University!
Jul. 2024: Donghee's and Jinkyu's paper, DeepClair: Utilizing Market Forecasts for Effective Portfolio Selection & Heedou's paper, LAPIS: Language Model-Augmented Police Investigation System were both accepted to the Conference on Information and Knowledge Management (CIKM), which is scheduled as an in-person conference taking place in Boise, Idaho, USA on October 21-25, 2024. The former is based on collaboration with Imperial College London and Shinhan Bank, while the latter is with also Imperial College London and Korean National Police Agency. Congratulations for the authors of these two accepted papers!
Jun. 2024: Mogan's and Jueon's paper, MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints & Minbyul's paper, Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models were published in Bioinformatics (OUP) and will be both orally presented in the upcoming ISMB conference in Montreal, Canada. The former is based on collaboration with Aigen Sciences while the latter is with Kyunghee University. Congratulations for the authors of these two published papers!
Apr. 2024: 🎉 Introducing Meerkat-7B: The First 7B Model to Pass USMLE! 🥳
There's a noticeable difference in performance between commercial large LMs and open-source small LMs in the medical domain. While GPT-4 scored an impressive 90% accuracy on USMLE-style questions, the previous best 7B model managed only 52%, falling significantly below the USMLE passing threshold of 60%.
Our new medical LM, Meerkat-7B, achieved a groundbreaking milestone by surpassing the 60% passing threshold for the United States Medical Licensing Examination (USMLE) for the first time among 7B-parameter models (with scores of 74.3% on the MedQA dataset and 71.4% on the USMLE sample test). Additionally, our system outperformed GPT-3.5 (175B) by 13.1% across seven medical benchmarks, indicating significant progress in open-source model development within the medical field. (동아일보, 국민일보, AI타임스)
Congratulations to Dr. Hyunjae Kim and the Meerkat team on their remarkable achievement!
- Paper: https://arxiv.org/abs/2404.00376
Mar. 2024: Congratulations to Dr. Mujeen Sung on his appointment as a tenure-track Assistant Professor in the School of Computing at Kyung Hee University!
Feb. 2024: Donghee Choi's paper, CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions, was accepted to LREC-COLING 2024, a notable international conference on computational linguistics resources. This feat is a result of our fruitful collaboration with Sony Research. Congratulations Donghee-san!
Jan. 2024: Ngoc-Quang Nguyen's paper, MulinforCPI: enhancing precision of compound–protein interaction prediction through novel perspectives on multi-level information integration, is published at Briefings in Bioinformatics, one of the top journal in Bioinformatics. Congratulations!
Jan. 2024: Hyunjae Kim's paper, Fine-tuning CLIP Text Encoders with Two-step Paraphrasing, is accepted to EACL 2024 (Findings of the ACL), one of the top conferences in NLP. Congratulations!
Oct. 2023: Two papers got accepted to EMNLP 2023 (Singapore), one of the top conferences in NLP. Congratulations!
Mujeen Sung's paper, Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification, is accepted to EMNLP 2023.
Gangwoo Kim's paper, Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models, is accepted to EMNLP 2023.
Aug. 2023: We are celebrating over 4,000 citations to our BioBERT paper, which is both the first and most cited biomedical domain-specific transformer-based large language model. Congratulations to our team: Jinhyuk Lee (currently at Google DeepMind), Wonjin Yoon (Harvard Medical School), Donghyeon Kim (Hyundai Motors AI), Sunkyu Kim (AIGEN Sciences), and Chan Ho So (TmaxSoft)!
Jul. 2023: Our DMIS team (Gangwoo Kim, Hajung Kim, Chanhwi Kim, Mujeen Sung, Hyunjae Kim) achieved 1st place in RadSum23, the Multi-modal and Multi-anatomical Radiology Report Summarization Challenge. Congratulations! (인공지능신문, 이데일리)
We outperformed leading AI research groups, including Stanford University, Siemens, University College London, and The University of Texas at San Antonio.
DMIS led the multinational team effort, with researchers from Microsoft Research Asia, AIGEN Sciences, KAIST, and Beihang University participating.
The paper describing the winning model is available here.
Jul. 2023: Mogan's paper, ArkDTA: attention regularization guided by non-covalent interactions for explainable drug–target binding affinity prediction was published in Bioinformatics (OUP) and will be orally presented in the upcoming ISMB conference in Lyon, France . This work wouldn't have been completed without the supportive efforts of Junseok, Seungheun, Jueon, Chaeeun, Minjae and Sumin. Congratulations!
Jun. 2023: Congratulations to Dr. Buru Chang on the appointment as a tenure-track Assistant Professor in the Department of Artificial Intelligence at Sogang University!
May. 2023: Minbyul's paper, Consistency Enhancement of Model Prediction on Document-level Named Entity Recognition will be published in Bioinformatics (OUP). Congratulations!
May. 2023: Two papers got accepted to ACL 2023 (Toronto, Canada), one of the top conferences in NLP. Congratulations!
Hyunjae Kim's paper, Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations, is accepted to ACL 2023.
Mujeen Sung's paper, Optimizing Test-Time Query Representations for Dense Retrieval, is accepted to Findings of ACL 2023.
Apr. 2023: Donghee Choi's paper, KitchenScale: Learning to Predict Ingredient Quantities from Recipe Contexts will be published in Expert Systems with Applications, one the most recognized journals with an impact factor of 8.665. This paper is based on collaborative work with Sony Research and Sejong University. Congratulations!
Jan. 2023: Congratulations to Dr. Seongjun Yun, who joined Amazon (Vancouver, BC) as an applied scientist. Dr. Yun joined the M5 team that builds large pretrained models to support machine learning applications at Amazon. Congratulations!
Jan. 2023: Mujeen Sung received the NAVER Ph.D Fellowship Award as he showed outstanding performance in his research area.
Nov. 2022: Ngoc-Quang Nguyen's paper, Perceiver CPI: A nested cross-attention network for compound-protein interaction prediction, will be published in Bioinformatics (OUP), one of the top journals in the field of bioinformatics. Congratulations!
Nov. 2022: LIQUID: A Framework for List Question Answering Dataset Generation, co-first authored by Seongyun Lee and Hyunjae Kim, was accepted to AAAI 2023, one of the top conferences in artificial intelligence. Congratulations!
Oct. 2022: Four papers got accepted to EMNLP 2022, one of the top conferences in NLP. Congratulations!
Hyunjae Kim's paper, Simple Questions Generate Named Entity Recognition Datasets, is accepted to EMNLP 2022.
Gangwoo Kim's paper, Generating Information-Seeking Conversations from Unlabeled Documents, is accepted to EMNLP 2022.
Gangwoo Kim's paper (co-authored), Saving Dense Retriever from Shortcut Dependency in Conversational Search, is accepted to EMNLP 2022.
Wonjin Yoon's paper, Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework, is accepted to EMNLP 2022 (Industry Track).
Sep. 2022: WonJin Yoon received an Academic Award, "Standigm Paper Award 2022" (스탠다임 우수논문상) from the Korean Society for Bioinformatics (한국생명정보학회) with the paper entitled Sequence Tagging for Biomedical Extractive Question Answering (Bioinformatics 2022). Congratulations!
Sep. 2022: BERN2: an advanced neural biomedical named entity recognition and normalization tool, co-first authored by Mujeen Sung and Minbyul Jeong, will be published in Bioinformatics (OUP). Congratulations! [Demo]
Aug. 2022: RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer co-first authored by Mogan Gim and Donghee Choi, got accepted at CIKM 2022, one of the top-tier conforences in Information and Knowledge Management domain. This paper is a fruitful result of collaborative work between our DMIS lab, Professor Park (FNAI Lab, Sejong University) and Sony AI (Tokyo, Japan) which aims to promote creative cooking in food industry.
Jul. 2022: Congratulations to Dr. Jinhyuk Lee, who joined Google (Mountain View, CA) as a research scientist. Dr. Lee joined the NLP team at Google Research that created BERT and Transformer, working alongside Jeff Dean. Congratulations!
Jun. 2022: Sequence Tagging for Biomedical Extractive Question Answering, co-authored by WonJin Yoon and researchers at AstraZeneca UK and Sweden, as one of the results of research collaboration, will be published in Bioinformatics (OUP). Congratulations!
Apr. 2022: DyGRAIN: An Incremental Learning Framework for Dynamic Graphs, co-authored by Seoyoon Kim and Seongjun Yun, will be presented at IJCAI 2022. Congratulations!
Apr. 2022: Congratulations, Jungsoo Park, whose papers got accepted at ACL and NAACL!
Consistency Training with Virtual Adversarial Discrete Perturbation. NAACL 2022
FAVIQ: FAct Verification from Information-seeking Questions. ACL 2022
Apr. 2022: MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection, co-authored by Bumsoo Kim and Junhyun Lee, got accepted at CVPR 2022. Congratulations!
Feb. 2022: Congratulations to Dr. Minji Jeon on the appointment of as a tenure-track Assistant Professor in the Department of Medicine at Korea University Medical School.
Feb. 2022: Congratulations to Dr. Donghyeon Park on the appointment as a tenure-track Assistant Professor in the Department of Data Science at Sejong University.
Dec. 2021: Congratulations to Dr. Kyubum Lee on joining Amgen Inc. (CA, USA) as Principal Data Scientist. Dr. Lee will work on data-driven clinical trial design and execution using ML and NLP.
Dec. 2021: WonJin Yoon et al.'s paper, KU-DMIS at BioASQ 9: Data-centric and model-centric approaches for biomedical question answering, is selected as the best paper in the BioASQ Lab at CLEF2021, one of the most highly valued venues in Biomedical NLP. Congratulations!
Nov. 2021: Seongjun Yun's paper, Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction, was accepted to NeurIPS 2021, one of the top conferences in Machine Learning. Congratulations!
Nov. 2021: DMIS team scored top performance at 2 challenge tracks held by the BioCreative VII workshop. (인공지능신문, 뉴시스)
Won third place at the relation extraction task: DrugProt: Text mining drug/chemical-protein interactions (Track 1). 🥉
Paper: Using Knowledge Base to Refine Data Augmentation for Biomedical Relation Extraction (Wonjin Yoon, Sean Yi, Richard Jackson (External affiliation), Hyunjae Kim, Sunkyu Kim, Jaewoo Kang)Won first place at the named entity recognition task: NLM-Chem Track: Full text Chemical Identification and Indexing in PubMed articles (Track 2). 🥇
Paper: Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles (Hyunjae Kim, Mujeen Sung, Wonjin Yoon, Sungjoon Park, Jaewoo Kang)Workshop information: https://biocreative.bioinformatics.udel.edu/news/
Aug. 2021: Two papers got accepted to EMNLP 2021, one of the top conferences in NLP. Congratulations!
Dr. Jinhyuk Lee's paper, Phrase Retrieval Learns Passage Retrieval, Too, is accepted to EMNLP 2021.
Mujeen Sung's paper, Can Language Models be Biomedical Knowledge Bases?, is accepted to EMNLP 2021.
May. 2021: Two papers got accepted to ACL-IJCNLP 2021, one of the top conferences in NLP. Congratulations!
Apr. 2021: Gwanghoon Jang's paper, Predicting mechanism of action of novel compounds using compound structure and transcriptomic signature co-embedding, co-advised by Dr. Sungjoon Park and Prof. Jaewoo Kang, was accepted to ISMB/ECCB 2021. Congratulations!
Mar. 2021: Bumsoo Kim's paper, HOTR: End-to-End Human-Object Interaction Detection with Transformers, was accepted to CVPR 2021 (Virtual, June 19-25), one of the most top-tier conferences for computer vision, for oral presentation!
Feb. 2021: Mujeen Sung received the 2020 KU Graduate School Achievement Award as he showed outstanding performance in his research area.
Oct. 2020: Donghyeon Park's paper, FlavorGraph: A large-scale food-chemical graph for generating food representations and recommending food pairings, was accepted to Scientific Reports, an online peer-reviewed open access scientific mega journal published by Nature Research.
Oct. 2020: Minbyul Jeong, Mujeen Sung, Gangwoo Kim, Donghyeon Kim, Jaehyo Yoo, Wonjin Yoon and Jaewoo Kang won first place in both question answering and summarization of the BioASQ 8B (Phase B) Task B Challenge!
고려대 컴퓨터학과 연구팀이 의학, 생물학 질문에 답하는 인공지능 시스템 경진 국제대회인 BioASQ 대회에서 미국 캘리포니아대학 샌디에고(UCSD), 매사추세츠대학 (UMass), 중국 푸단대학 (Fudan Univ), 일본 도쿄대학(University of Tokyo)를 제치고 2년 연속 우승했다. (이데일리)
사람이 읽기에 자연스러운 문장으로 질문에 대한 답을 할 수 있는 인공지능 시스템이라는 점에서 앞으로 임상적으로 유의한 의사결정 지원 도구를 개발하는데 활용될 수 있을 것으로 기대된다.
Sep. 2020: Jungsoo Park's paper, Adversarial Subword Regularization for Robust Neural Machine Translation, was accepted to Findings of ACL: EMNLP 2020, an anthology journal of ACL which is one of the top-tier conferences for computational linguistics.
Sep. 2020: Miyoung Ko's paper, Look at the First Sentence: Position Bias in Question Answering, was accepted to EMNLP 2020, one of the best renowned conferences for NLP-related publications!
Sep. 2020: Recently, BioBERT: a pre-trained biomedical language representation for biomedical text mining co-first authored by Dr. Jinhyuk Lee and Wonjin Yoon has been ranked as the most read papers in Bioinformatics which is one of the top-tier journals in the domain.
Also, BioBERT was included in the Best Papers for the Natural Language Processing Section of the 2020 IMIA (International Medical Informatics Association) Yearbook (link). Congratulations once again to the authors for this grand achievement!
Jul. 2020: Enhancing the interpretability of transcription factor binding site prediction using attention mechanism, co-first authored by Sungjoon Park, Yookyung Koh, and Hwisang Jeon, was accepted to Scientific Reports. Congratulations!
Apr. 2020: MAPS: Multi-Agent reinforcement learning-based Portfolio management System, co-first authored by Jinho Lee and Raehyun Kim, is accepted to IJCAI 2020, one of the top conferences for general AI.
MAPS is an hedge fund like portfolio management system trained with cooperative multi-agent reinforcement learning.
It is inspired by the fact that hedge fund's entire portfolio is manged by multiple investors, working together to maximize risk-adjusted return.
Apr. 2020: Congratulations to Sunkyu Kim for publication to Cell Systems!
Sunkyu Kim's team won 1st place in the NCI-CPTAC DREAM Proteogenomics Challenge in 2017 (outperforming UCLA(3rd), Stanford(13th)).
Assessment of the Limits of Predictability of Protein and Phosphorylation Levels in Cancer is a paper for the DREAM challenge and is worked with Heidelberg University, Icahn School of Medicine and New York University.
Apr. 2020: Sunkyu Kim's paper, Improved survival analysis by learning shared genomic information from pan-cancer data, was accepted to ISMB 2020, top conference in Bioinformatics.
Two papers got accepted to ACL 2020, one of the top conferences in NLP.
Dr. Jinhyuk Lee's paper, Contextualized Sparse Representations for Real-Time Open-Domain Question Answering, is accepted to ACL 2020.
Mujeen Sung's paper, Biomedical Entity Representations with Synonym Marginalization, is accepted to ACL 2020.
Jan. 2020: Wonjin Yoon received the NAVER Ph.D Fellowship Award as he showed outstanding performance in his research area.
Nov. 2019: Congratulations! Our DMIS team (Sungjoon Park, Minji Jeon, Sunkyu Kim, Junhyun Lee, Seongjun Yun, Bumsoo Kim, Buru Chang) has been selected as the top performers in the IDG-DREAM Drug-Kinase Binding Prediction Challenge. As one of the best performers, we presented our model at the RSG with DREAM Conference, NY in November. (Link)
Sep. 2019: Congratulations! DMIS team outperformed Google team and won 1st place at BioASQ challenge, a challenge on large scale biomedical semantic indexing and question answering.
By using BioBERT, our team(Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong) produced outstanding results for all 5 test batches on BioASQ Task 7B-Phase B (challenge results - http://bioasq.org/participate/seventh-challenge-winners ).
의생명 분야의 질의 응답 시스템 경진대회인 BioASQ 대회에서 Google 제치고 1위 [Task 7B-Phase B] (고려대 보도자료, 전자신문, 연합뉴스)
Sep. 2019: Congratulations to Dr. Jinhyuk Lee and Wonjin Yoon for BioBERT publication in Bioinformatics!
BioBERT: a pre-trained biomedical language representation model for biomedical text mining is the first biomedical language representation model pre-trained on large-scale biomedical corpus and achieves state-of-the-art performances on various biomedical NLP tasks. (paper , code)
With BioBERT, DMIS team won 1st place at BioASQ challenge.
Sep. 2019: Seongjun Yun's paper, Graph Transformer Networks, got accepted to In Advances in Neural Information Processing Systems (NeurIPS 2019), one of the top-tier conferences in Machine Learning alongside with ICML.
Aug. 2019: Donghyeon Park's paper, KitcheNette: Prediction and Ranking Food Ingredient Pairings based on Siamese Neural Network, got accepted to IJCAI 2019, one of the top-tier conferences for general AI.
May. 2019: Real-Time Open-Domain Question Answering on Wikipedia with Dense-Sparse Phrase Index, co-first authored by Jinhyuk Lee, is accepted to ACL 2019, the top conference in computational linguistics and natural language processing.
May. 2019: ReSimNet: Drug Response Similarity Prediction using Siamese Neural Networks, co-first authored by Minji Jeon and Donghyeon Park, has been accepted to Bioinformatics, the best journal for computational biology.
ReSimNet measures the transcriptional response similarity of the two chemical compounds, and the team achieved first place in the Multi-targeting Drug DREAM Challenge with this model (outperforming Janssen Pharmaceutica).
Apr. 2019: Self-Attention Graph Pooling, co-first authored by Junhyun Lee and Inyeop Lee, has been accepted to ICML 2019, the top conference in machine learning.
Apr. 2019: SAIN: Self-Attentive Integration Network for Recommendation, co-first authored by Seoungjun Yun and Raehyun Kim, got accepted by SIGIR 2019, the best conference in Information Retrieval.
Apr. 2019: Congratulations to Dr. Minji Jeon for her first Nature series publication! (accepted to Nature Communications)
Previously, Dr. Minji Jeon's team won 2nd place in the AstraZeneca Sanger Drug Synergy Prediction DREAM challenge (outperforming Stanford(6th), MIT(11th)).
Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen is an overview paper for the DREAM challenge and is coauthored by top performing teams and organizers from AstraZeneca-Sanger. (bioarxiv)
Dec. 2018: Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network, co-first authored by Raehyun Kim and Hyunjae Kim, has been accepted as full paper by SDM19, one of the top-tier conferences in data-mining.
Nov. 2018: Buru Chang received the NAVER Ph.D Fellowship Award as he showed stellar performance with his papers.
Aug. 2018: Jinhyuk Lee's paper, Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering, was accepted to EMNLP2018, one of the most renowned conferences in NLP field.
Aug. 2018: Learning User Preferences and Understanding Calendar Contexts for Event Scheduling (co-first authored by Donghyeon Kim and Jinhyuk Lee) got accepted by CIKM2018, which is one of the top-tier international conferences in Database/Data Mining/Information Retrieval field with 17% acceptance rate.
Jul. 2018: Buru Chang's paper, Content-Aware Point-of-Interest Embedding Model for Successive POI Recommendation, was accepted to IJCAI 2018, one of the top-tier conferences for general AI.
Nov. 2017: Our DMIS team (Sunkyu Kim, Heewon Lee, Keonwoo Kim, Hwisang Jeon, Minji Jeon, Yonghwa Choi, Daehan Kim) was awarded as the BEST performers of the NCI-CPTAC DREAM Proteogenomics Challenge, sponsored by the National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC). This was the very first time that Korea team won the Challenge. (Link)
UCLA: 3rd place
Stanford University: 13th place
Aug. 2017: Jinhyuk Lee's paper, Name Nationality Classification with Recurrent Neural Network, got accepted for IJCAI 2017, one of the top-tier conferences for general AI.
Apr. 2017: Constructing and Evaluating a Novel Crowdsourcing-based Paraphrased Opinion Spam Dataset, co-first authored by Seongsoon Kim and Seongwoon Lee, has been accepted to WWW 2017, one of the top conferences for web.
Oct. 2016: Among 42 teams from different parts of the world, our DMIS team ranked 2nd place at the Disease Module Identification DREAM Challenge: Discover disease pathways in genomic networks. The goal is to systematically assess module identification methods on a panel of state-of-the-art genomic networks and to discover novel network pathways.
Oct. 2016: 생물학적 네트워크에서 질병에 연관된 모듈을 발굴하는 Disease Module Identification DREAM Challenge: Discover disease pathways in genomic networks에 참여하여 전체 42팀 중 종합성적 공동2위 달성!
Mar. 2016: Our DMIS team won 2nd place at the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge, which is designed to predict synergistic drug combinations and to identify associated biomarkers. As the challenge was hosted by AstraZeneca, one of the top 10 pharmaceutical companies in the world, the DMIS team showed stellar performance in this grand competition, ranking 2nd place. (Link)
Stanford University: 6th place
MIT: 11th place