Data Mining and Information Systems Lab

PI: Prof. Jaewoo Kang, Dept. of Computer Science and Engineering, Korea Univ.

(고려대학교 컴퓨터학과 강재우교수 연구실)


Data science has advanced to the point where it is changing our world. It is now the center of exploring and uncovering knowledge in different domains and acts as a bridge to connect them. With ever growing amount of data and opportunity to explore, the Data Mining and Information Systems (DMIS) lab aims to drive the data science revolution.

The main focus of DMIS Lab is to utilize and leverage AI and machine learning (ML) to solve problems in bioinformatics, drug discovery, and biomedical text mining. In order to diversify and strengthen its arsenal, DMIS Lab also conducts research in other areas such as natural language processing (NLP) and graph ML in order to uncover more-refined techniques to carry out its larger mission. Aside from conducting research, DMIS Lab also participates in various international challenges and competitions in order to contribute to the communal effort to tackle unmet needs in the field of biomedine, such as the DREAM Challenges and the BioASQ Challenges.


There's a noticeable difference in performance between commercial large LMs and open-source small LMs in the medical domain. While GPT-4 scored an impressive 90% accuracy on USMLE-style questions, the previous best 7B model managed only 52%, falling significantly below the USMLE passing threshold of 60%.

Our new medical LM, Meerkat-7B, achieved a groundbreaking milestone by surpassing the 60% passing threshold for the United States Medical Licensing Examination (USMLE) for the first time among 7B-parameter models (with scores of 74.3% on the MedQA dataset and 71.4% on the USMLE sample test). Additionally, our system outperformed GPT-3.5 (175B) by 13.1% across seven medical benchmarks, indicating significant progress in open-source model development within the medical field. (동아일보, 국민일보, AI타임스)

Congratulations to Dr. Hyunjae Kim and the Meerkat team on their remarkable achievement!

- Paper:

- Model:

Also, BioBERT was included in the Best Papers for the Natural Language Processing Section of the 2020 IMIA (International Medical Informatics Association) Yearbook (link). Congratulations once again to the authors for this grand achievement!

Address : Office 501B, Woojung Hall of Informatics, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, Republic of Korea 02841 Tel : +82-2-3290-3566Copyright © 2019, By Data Mining & Information Systems Laboratory, Department of Computer and Radio Communications, Korea University, All Rights Reserved.