Data Mining and Information Systems Lab

PI: Prof. Jaewoo Kang, Dept. of Computer Science and Engineering, Korea Univ.

(고려대학교 컴퓨터학과 강재우교수 연구실)


Data science has advanced to the point where it is changing our world. It is now the center of exploring and uncovering knowledge in different domains and acts as a bridge to connect them. With ever growing amount of data and opportunity to explore, the Data Mining and Information Systems (DMIS) lab aims to drive the data science revolution.

The main focus of DMIS Lab is to utilize and leverage AI and machine learning (ML) to solve problems in bioinformatics, drug discovery, and biomedical text mining. In order to diversify and strengthen its arsenal, DMIS Lab also conducts research in other areas such as natural language processing (NLP) and graph ML in order to uncover more-refined techniques to carry out its larger mission. Aside from conducting research, DMIS Lab also participates in various international challenges and competitions in order to contribute to the communal effort to tackle unmet needs in the field of biomedine, such as the DREAM Challenges and the BioASQ Challenges.

[NOTICE] We are looking for MS/Ph.D. Students and Postdoctoral Fellows with a strong interest in the area of Natural Language Processing (NLP) and Biomedical NLP (BioNLP)! [Read More]


  • Sep. 2022: WonJin Yoon received an Academic Award, "Standigm Paper Award 2022" (스탠다임 우수논문상) from the Korean Society for Bioinformatics (한국생명정보학회) with the paper entitled Sequence Tagging for Biomedical Extractive Question Answering (Bioinformatics 2022). Congratulations!

  • Sep. 2022: BERN2: an advanced neural biomedical named entity recognition and normalization tool, co-first authored by Mujeen Sung and Minbyul Jeong, will be published in Bioinformatics (OUP). Congratulations! [Demo]

  • Aug. 2022: RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer co-first authored by Mogan Gim and Donghee Choi, got accepted at CIKM 2022, one of the top-tier conforences in Information and Knowledge Management domain. This paper is a fruitful result of collaborative work between our DMIS lab, Professor Park (FNAI Lab, Sejong University) and Sony AI (Tokyo, Japan) which aims to promote creative cooking in food industry.

  • Jul. 2022: Congratulations to Dr. Jinhyuk Lee, who joined Google (Mountain View, CA) as a research scientist. Dr. Lee joined the NLP team at Google Research that created BERT and Transformer, working alongside Jeff Dean. Congratulations!

  • Jun. 2022: Sequence Tagging for Biomedical Extractive Question Answering, co-authored by WonJin Yoon and researchers at AstraZeneca UK and Sweden, as one of the results of research collaboration, will be published in Bioinformatics (OUP). Congratulations!

  • Apr. 2022: DyGRAIN: An Incremental Learning Framework for Dynamic Graphs, co-authored by Seoyoon Kim and Seongjun Yun, will be presented at IJCAI 2022. Congratulations!

  • Apr. 2022: Congratulations, Jungsoo Park, whose papers got accepted at ACL and NAACL!

    • Consistency Training with Virtual Adversarial Discrete Perturbation. NAACL 2022

    • FAVIQ: FAct Verification from Information-seeking Questions. ACL 2022

  • Apr. 2022: MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection, co-authored by Bumsoo Kim and Junhyun Lee, got accepted at CVPR 2022. Congratulations!

  • Feb. 2022: Congratulations on the appointment of Dr. Minji Jeon as a tenure-track assistant professor in the Department of Medicine at Korea University Medical School.

  • Feb. 2022: Congratulations on the appointment of Dr. Donghyeon Park as a tenure-track assistant professor in the Department of Data Science at Sejong University.

  • Dec. 2021: Congratulations to Dr. Kyubum Lee on joining Amgen Inc. (CA, USA) as Principal Data Scientist. Dr. Lee will work on data-driven clinical trial design and execution using ML and NLP.

  • Dec. 2021: WonJin Yoon et al.'s paper, KU-DMIS at BioASQ 9: Data-centric and model-centric approaches for biomedical question answering, is selected as the best paper in the BioASQ Lab at CLEF2021, one of the most highly valued venues in Biomedical NLP. Congratulations!

  • Nov. 2021: Seongjun Yun's paper, Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction, was accepted to NeurIPS 2021, one of the top conferences in Machine Learning. Congratulations!

  • Nov. 2021: DMIS team scored top performance at 2 challenge tracks held by the BioCreative VII workshop. (인공지능신문, 뉴시스)

    • Won third place at the relation extraction task: DrugProt: Text mining drug/chemical-protein interactions (Track 1). 🥉
      Paper: Using Knowledge Base to Refine Data Augmentation for Biomedical Relation Extraction (Wonjin Yoon, Sean Yi, Richard Jackson (External affiliation), Hyunjae Kim, Sunkyu Kim, Jaewoo Kang)

    • Won first place at the named entity recognition task: NLM-Chem Track: Full text Chemical Identification and Indexing in PubMed articles (Track 2). 🥇
      Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles (Hyunjae Kim, Mujeen Sung, Wonjin Yoon, Sungjoon Park, Jaewoo Kang)

    • Workshop information:

  • Aug. 2021: Two papers got accepted to EMNLP 2021, one of the top conferences in NLP. Congratulations!

    • Dr. Jinhyuk Lee's paper, Phrase Retrieval Learns Passage Retrieval, Too, is accepted to EMNLP 2021.

    • Mujeen Sung's paper, Can Language Models be Biomedical Knowledge Bases?, is accepted to EMNLP 2021.

  • May. 2021: Two papers got accepted to ACL-IJCNLP 2021, one of the top conferences in NLP. Congratulations!

    • Dr. Jinhyuk Lee's paper, Learning Dense Representations of Phrases at Scale, is accepted to ACL-IJCNLP 2021. (한국경제, IT조선)

    • Gangwoo Kim's paper, Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering, is accepted to ACL-IJCNLP 2021.

  • Apr. 2021: Gwanghoon Jang's paper, Predicting mechanism of action of novel compounds using compound structure and transcriptomic signature co-embedding, co-advised by Dr. Sungjoon Park and Prof. Jaewoo Kang, was accepted to ISMB/ECCB 2021. Congratulations!

  • Mar. 2021: Bumsoo Kim's paper, HOTR: End-to-End Human-Object Interaction Detection with Transformers, was accepted to CVPR 2021 (Virtual, June 19-25), one of the most top-tier conferences for computer vision, for oral presentation!

  • Feb. 2021: Mujeen Sung received the 2020 KU Graduate School Achievement Award as he showed outstanding performance in his research area.

  • Oct. 2020: Donghyeon Park's paper, FlavorGraph: A large-scale food-chemical graph for generating food representations and recommending food pairings, was accepted to Scientific Reports, an online peer-reviewed open access scientific mega journal published by Nature Research.

  • Oct. 2020: Minbyul Jeong, Mujeen Sung, Gangwoo Kim, Donghyeon Kim, Jaehyo Yoo, Wonjin Yoon and Jaewoo Kang won first place in both question answering and summarization of the BioASQ 8B (Phase B) Task B Challenge!

    • 고려대 컴퓨터학과 연구팀이 의학, 생물학 질문에 답하는 인공지능 시스템 경진 국제대회인 BioASQ 대회에서 미국 캘리포니아대학 샌디에고(UCSD), 매사추세츠대학 (UMass), 중국 푸단대학 (Fudan Univ), 일본 도쿄대학(University of Tokyo)를 제치고 2년 연속 우승했다. (이데일리)

    • 사람이 읽기에 자연스러운 문장으로 질문에 대한 답을 할 수 있는 인공지능 시스템이라는 점에서 앞으로 임상적으로 유의한 의사결정 지원 도구를 개발하는데 활용될 수 있을 것으로 기대된다.

  • Sep. 2020: Jungsoo Park's paper, Adversarial Subword Regularization for Robust Neural Machine Translation, was accepted to Findings of ACL: EMNLP 2020, an anthology journal of ACL which is one of the top-tier conferences for computational linguistics.

  • Sep. 2020: Miyoung Ko's paper, Look at the First Sentence: Position Bias in Question Answering, was accepted to EMNLP 2020, one of the best renowned conferences for NLP-related publications!

  • Sep. 2020: Recently, BioBERT: a pre-trained biomedical language representation for biomedical text mining co-first authored by Dr. Jinhyuk Lee and Wonjin Yoon has been ranked as the most read papers in Bioinformatics which is one of the top-tier journals in the domain.

Also, BioBERT was included in the Best Papers for the Natural Language Processing Section of the 2020 IMIA (International Medical Informatics Association) Yearbook (link). Congratulations once again to the authors for this grand achievement!

  • Jul. 2020: Enhancing the interpretability of transcription factor binding site prediction using attention mechanism, co-first authored by Sungjoon Park, Yookyung Koh, and Hwisang Jeon, was accepted to Scientific Reports. Congratulations!

  • Apr. 2020: MAPS: Multi-Agent reinforcement learning-based Portfolio management System, co-first authored by Jinho Lee and Raehyun Kim, is accepted to IJCAI 2020, one of the top conferences for general AI.

    • MAPS is an hedge fund like portfolio management system trained with cooperative multi-agent reinforcement learning.

    • It is inspired by the fact that hedge fund's entire portfolio is manged by multiple investors, working together to maximize risk-adjusted return.

  • Apr. 2020: Congratulations to Sunkyu Kim for publication to Cell Systems!

    • Sunkyu Kim's team won 1st place in the NCI-CPTAC DREAM Proteogenomics Challenge in 2017 (outperforming UCLA(3rd), Stanford(13th)).

    • Assessment of the Limits of Predictability of Protein and Phosphorylation Levels in Cancer is a paper for the DREAM challenge and is worked with Heidelberg University, Icahn School of Medicine and New York University.

  • Apr. 2020: Sunkyu Kim's paper, Improved survival analysis by learning shared genomic information from pan-cancer data, was accepted to ISMB 2020, top conference in Bioinformatics.

  • Two papers got accepted to ACL 2020, one of the top conferences in NLP.

    • Dr. Jinhyuk Lee's paper, Contextualized Sparse Representations for Real-Time Open-Domain Question Answering, is accepted to ACL 2020.

    • Mujeen Sung's paper, Biomedical Entity Representations with Synonym Marginalization, is accepted to ACL 2020.

  • Jan. 2020: Wonjin Yoon received the NAVER Ph.D Fellowship Award as he showed outstanding performance in his research area.

  • Nov. 2019: Congratulations! Our DMIS team (Sungjoon Park, Minji Jeon, Sunkyu Kim, Junhyun Lee, Seongjun Yun, Bumsoo Kim, Buru Chang) has been selected as the top performers in the IDG-DREAM Drug-Kinase Binding Prediction Challenge. As one of the best performers, we presented our model at the RSG with DREAM Conference, NY in November. (Link)

    • 연구팀은 11월 뉴욕에서 개최된 RSG with Dream Conference에서 우승자 자격으로 초청되어 AI기반 버추얼약물스크리닝모델을 발표했다. (매일경제, 한국대학신문, 연합뉴스)

    • 드림 챌린지는 미국 IBM과 Sage Bionetworks가 주최하는 의생명분야 데이터과학 국제 경진대회로 세계적으로 권위를 인정받고 있는 대회이며 연구팀은 일리노이대-칭화대 컨소시움, 노스캐롤라이나대 팀과 함께 약물활성예측 드림챌린지 공동 최우수팀으로 선정되었다.

  • Sep. 2019: Congratulations! DMIS team outperformed Google team and won 1st place at BioASQ challenge, a challenge on large scale biomedical semantic indexing and question answering.

  • Sep. 2019: Congratulations to Dr. Jinhyuk Lee and Wonjin Yoon for BioBERT publication in Bioinformatics!

    • BioBERT: a pre-trained biomedical language representation model for biomedical text mining is the first biomedical language representation model pre-trained on large-scale biomedical corpus and achieves state-of-the-art performances on various biomedical NLP tasks. (paper , code)

    • With BioBERT, DMIS team won 1st place at BioASQ challenge.

  • Sep. 2019: Seongjun Yun's paper, Graph Transformer Networks, got accepted to In Advances in Neural Information Processing Systems (NeurIPS 2019), one of the top-tier conferences in Machine Learning alongside with ICML.

  • Aug. 2019: Donghyeon Park's paper, KitcheNette: Prediction and Ranking Food Ingredient Pairings based on Siamese Neural Network, got accepted to IJCAI 2019, one of the top-tier conferences for general AI.

    • DMIS 연구팀은 100만개의 레시피를 분석하고 식재료의 조합을 추천하는 Siamese Neural Network기반의 AI모델을 개발했다. 해당 모델은 전통적인 기계학습 모델들의 예측 및 추천 성능을 월등히 뛰어넘었으며 연구결과는 인공지능 최고 권위 학술대회 중 하나인 IJCAI-19, Macao에서 발표될 예정이다. (고려대 보도자료, YTN 사이언스, 매일경제, 연합뉴스, 서울신문, IT조선)

    • 연구팀은 사용자가 직접 식재료 조합을 찾아보고 연구결과를 활용할 수 있도록 웹페이지를 제공하고 있다. (KitcheNette)

  • May. 2019: Real-Time Open-Domain Question Answering on Wikipedia with Dense-Sparse Phrase Index, co-first authored by Jinhyuk Lee, is accepted to ACL 2019, the top conference in computational linguistics and natural language processing.

  • May. 2019: ReSimNet: Drug Response Similarity Prediction using Siamese Neural Networks, co-first authored by Minji Jeon and Donghyeon Park, has been accepted to Bioinformatics, the best journal for computational biology.

    • ReSimNet measures the transcriptional response similarity of the two chemical compounds, and the team achieved first place in the Multi-targeting Drug DREAM Challenge with this model (outperforming Janssen Pharmaceutica).

  • Apr. 2019: Self-Attention Graph Pooling, co-first authored by Junhyun Lee and Inyeop Lee, has been accepted to ICML 2019, the top conference in machine learning.

  • Apr. 2019: SAIN: Self-Attentive Integration Network for Recommendation, co-first authored by Seoungjun Yun and Raehyun Kim, got accepted by SIGIR 2019, the best conference in Information Retrieval.

  • Apr. 2019: Congratulations to Dr. Minji Jeon for her first Nature series publication! (accepted to Nature Communications)

    • Previously, Dr. Minji Jeon's team won 2nd place in the AstraZeneca Sanger Drug Synergy Prediction DREAM challenge (outperforming Stanford(6th), MIT(11th)).

    • Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen is an overview paper for the DREAM challenge and is coauthored by top performing teams and organizers from AstraZeneca-Sanger. (bioarxiv)

  • Dec. 2018: Our DMIS team (Minji Jeon, Donghyeon Park, Jinhyuk Lee, Hwisang Jeon, Miyoung Ko, Sunkyu Kim, Yonghwa Choi) won 1st place in the Multi-targeting Drug DREAM Challenge. The team outperformed multinational pharmaceutical firms such as Janssen Pharmaceutica. (Link1, Link2)

    • DMIS 연구팀, 다국적 제약사를 (얀센, 바이엘 등) 제치고 대회에서 우승! 연구팀은 신약 후보 물질을 발굴하는 모델을 개발했고 AI로 선택한 물질의 가능성이 입증되어 대회 우승팀으로 선정되었다. (매일경제, 연합뉴스)

  • Dec. 2018: Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network, co-first authored by Raehyun Kim and Hyunjae Kim, has been accepted as full paper by SDM19, one of the top-tier conferences in data-mining.

  • Nov. 2018: Buru Chang received the NAVER Ph.D Fellowship Award as he showed stellar performance with his papers.

  • Aug. 2018: Jinhyuk Lee's paper, Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering, was accepted to EMNLP2018, one of the most renowned conferences in NLP field.

  • Aug. 2018: Learning User Preferences and Understanding Calendar Contexts for Event Scheduling (co-first authored by Donghyeon Kim and Jinhyuk Lee) got accepted by CIKM2018, which is one of the top-tier international conferences in Database/Data Mining/Information Retrieval field with 17% acceptance rate.

  • Jul. 2018: Buru Chang's paper, Content-Aware Point-of-Interest Embedding Model for Successive POI Recommendation, was accepted to IJCAI 2018, one of the top-tier conferences for general AI.

  • Nov. 2017: Our DMIS team (Sunkyu Kim, Heewon Lee, Keonwoo Kim, Hwisang Jeon, Minji Jeon, Yonghwa Choi, Daehan Kim) was awarded as the BEST performers of the NCI-CPTAC DREAM Proteogenomics Challenge, sponsored by the National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC). This was the very first time that Korea team won the Challenge. (Link)

      • UCLA: 3rd place

      • Stanford University: 13th place

    • Nov. 2017: 고려대학교 강재우 교수 연구팀 - 암 환자의 단백질 활성도를 예측하는 NCI-CPTAC DREAM Proteogenomics Challenge에 참가하여 대회 역사상 한국팀 최초 우승! 해당 Challenge는 미국 국립 암 연구원의 유전단백체 연구센터(NCI-CPTAC)가 주최하였다. (연합뉴스, 매일경제, 서울경제)

  • Aug. 2017: Jinhyuk Lee's paper, Name Nationality Classification with Recurrent Neural Network, got accepted for IJCAI 2017, one of the top-tier conferences for general AI.

  • Apr. 2017: Constructing and Evaluating a Novel Crowdsourcing-based Paraphrased Opinion Spam Dataset, co-first authored by Seongsoon Kim and Seongwoon Lee, has been accepted to WWW 2017, one of the top conferences for web.

  • Oct. 2016: Among 42 teams from different parts of the world, our DMIS team ranked 2nd place at the Disease Module Identification DREAM Challenge: Discover disease pathways in genomic networks. The goal is to systematically assess module identification methods on a panel of state-of-the-art genomic networks and to discover novel network pathways.

    • Oct. 2016: 생물학적 네트워크에서 질병에 연관된 모듈을 발굴하는 Disease Module Identification DREAM Challenge: Discover disease pathways in genomic networks에 참여하여 전체 42팀 중 종합성적 공동2위 달성!

  • Mar. 2016: Our DMIS team won 2nd place at the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge, which is designed to predict synergistic drug combinations and to identify associated biomarkers. As the challenge was hosted by AstraZeneca, one of the top 10 pharmaceutical companies in the world, the DMIS team showed stellar performance in this grand competition, ranking 2nd place. (Link)

      • Stanford University: 6th place

      • MIT: 11th place

    • Mar. 2016: 항암제 병합 치료 효능을 예측하는 The AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge에 참여하여 전 세계 62팀 중 2위 입상! 해당 Challenge는 세계 10대 제약회사 "AstraZeneca"가 주최하였으며 강재우 교수 연구팀은 Stanford University(6위), MIT(11위)를 압도적으로 제치고 2위를 기록했다. (경향신문, 서울경제)

Address : Office 501B, Woojung Hall of Informatics, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, Republic of Korea 02841 Tel : +82-2-3290-3566Copyright © 2019, By Data Mining & Information Systems Laboratory, Department of Computer and Radio Communications, Korea University, All Rights Reserved.