...

Masahiro Suzuki

Ph.D. student in Izumi and Sakaji Lab.,
Department of Systems Innovation,
School of Engineering, The University of Tokyo
Nikko Asset Management Co., Ltd.

Mail : msuzuki [at] g.ecc.u-tokyo.ac.jp
: 0000-0001-8519-5617
: scholar.google.com/citations?user=_-8tzX0AAAAJ
: researchmap.jp/masahiro-suzuki
: Masahiro-Suzuki-11
: github.com/retarfi
: linkedin.com/in/msuzuki7/

Self-introduction

Research Area: Text Mining, Natural Language Processing

A Member of: IEEE, the Association for Natural Language Processing, the Japanese Society for Artificial Intelligence

Biography

2022/10 - :   Studying at Izumi Lab., Department of Systems Innovation, School of Engineering, The University of Tokyo (Ph.D Program)

2022/04 - :   Working at Nikko Asset Management Co., Ltd.

2020/04 - 2022/03 :   Studied at Izumi Lab., Department of Systems Innovation, School of Engineering, The University of Tokyo (Master Program)

2019/05 - 2020/03 :   Researched at Izumi Lab., Faculty of Engineering

2018/04 - 2020/03 :   Studied at Systems Design & Management Course, Department of Systems Innovation, Faculty of Engineering

2016/04 - 2018/03 :   Studied at Natural Sciences I, College of Arts and Sciences (Junior Division)

2015/04 - 2016/03 :   Department of Industrial and Systems Engineering, Faculty of Science and Engineering, Keio University

2009/04 - 2015/03 :   Senior & Junior High School at Komaba, University of Tsukuba

1996/09 :   Born in Tokyo, Japan

Public resources

  • Japanese DeBERTaV2 Model (base / small)
  • Japanese Large Language Model Project
    Japanese datasets and tuning models are available.
    detail (in Japanese)
  • ACL anthology Japanese abstract
    Automatic translation of abstracts of articles on ACL anthology into Japanese using ChatGPT.
  • Pre-training Language Models for Japanese (github.com/retarfi/language-pretraining)
    Pre-training models for BERT and ELECTRA, using the Japanese Wikipedia and financial domains as corpora. Wikipedia and financial models are available in the Transformers natural language processing library, respectively (huggingface.co/izumi-lab).
  • jptranstokenizer: Japanese Tokenzier for transformers (github.com/retarfi/jptranstokenizer)
    Japanese tokenizer compatible with HuggingFace library. Juman++, sudachi and spaCy LUW are available as main-word tokenizers (MeCab is also available). Wordpiece and sentencepiece are available as subword tokenizers. You can load easily a trained tokenizer with Juman++ and sentencepiece.
    PyPI

Papers

Publication (Refreed)

  1. Constructing and analyzing domain-specific language model for financial text mining
    Masahiro Suzuki, Hiroki Sakaji, Masanori Hirano, and Kiyoshi Izumi.
    Information Processing & Management, 2023.
    Impact Factor: 8.6, Q1 Journal as of 2022
    ScienceDirectbib
  2. Forecasting Stock Price Trends by Analyzing Economic Reports With Analyst Profiles
    Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi, and Yasushi Ishikawa.
    Frontiers in Artificial Intelligence, 2022.
    Impact Factor: 4.0
    Frontiersbib
  3. Forecasting Net Income Estimate and Stock Price Using Text Mining from Economic Reports
    Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi, and Yasushi Ishikawa.
    Information, 2020.
    Selected as Cover Story. Impact Factor: 3.1
    MDPIbib

International Conference (Refreed)

  1. JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning
    Issey Sukeda, Masahiro Suzuki, Hiroki Sakaji, and Satoshi Kodera.
    Deep Generative Models for Health Workshop NeurIPS 2023, 2023.
    Accepted
    OpenReviewarXiv
  2. From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models
    Masahiro Suzuki, Masanori Hirano, and Hiroki Sakaji.
    2023 IEEE International Conference on Big Data (Big Data), 2023.
    IEEEarXivSSRN
  3. llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology
    Masanori Hirano, Masahiro Suzuki, and Hiroki Sakaji.
    The 12th International Workshop on Web Services and Social Media (WSSM-2023) in The 26th International Conference on Network-Based Information Systems (NBiS-2023), 2023.
    Springer LinkarXivSSRN
  4. Gradual Further Pre-training Architecture for Economics/Finance Domain Adaptation of Language Model
    Hiroki Sakaji, Masahiro Suzuki, Kiyoshi Izumi, and Hiroyuki Mitsugi.
    2022 IEEE International Conference on Big Data (Big Data), 2022.
    IEEEbib
  5. Constructing and analyzing domain-specific language model for financial text mining
    Masahiro Suzuki, Hiroki Sakaji, Masanori Hirano, and Kiyoshi Izumi.
    Information Processing and Management Conference, 2022.
  6. Market Trend Analysis Using Polarity Index Generated from Analyst Reports
    Rei Taguchi, Hikaru Watanabe, Masanori Hirano, Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi, and Kenji Hiramatsu.
    2021 IEEE International Conference on Big Data (Big Data), 2021.
    IEEEbib
  7. Stock Price Analysis Using Combination of Analyst Reports and Several Document
    Masahiro Suzuki, Toshiya Katagi, Hiroki Sakaji, Kiyoshi Izumi, and Yasushi Ishikawa.
    2019 International Conference on Data Mining Workshops (ICDMW), 2019.
    Best Paper Award
    IEEEbib

Domestic Conference (Non-Refreed) / Other

  1. JMedLoRA: Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning (in Japanese)
    Issei Sukeda, Masahiro Suzuki, Hiroki Sakaji, and Satoshi Kodera.
    The Thirtieth Annual Meeting of the Association for Natural Language Processing, 2024.
    paperdetail
  2. Language Model Construction and Domain Adaptation using Multiple Nodes (in Japanese)
    Masahiro Suzuki, and Hiroki Sakaji.
    Intelligent Computing Systems (ICS), 2024.
    IPSJ
  3. LoRA Tuning Conversational Japanese Large Language Models using Japanese Instruction Dataset (in Japanese)
    Masahiro Suzuki, Masanori Hirano, and Hiroki Sakaji.
    IEICE Tech. Rep., 2023.
    IEICEJxiv
  4. llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models (in Japanese)
    Masanori Hirano, Masahiro Suzuki, and Hiroki Sakaji.
    Special Interest Group on Natural Language Processing, Information Processing Society of Japan, 2023.
    Young Research Award (Co-author)
    SIG-NLbib
  5. Construction of Japanese Instruction Dataset and its Application to Tuning of Large-scale Language Models (in Japanese)
    Masahiro Suzuki, Masanori Hirano, and Hiroki Sakaji.
    18th Symposium of Young Researcher Association for NLP Studies (YANS), 2023.
    Honorable Mention Award and ELYZA Award (Sponsor Award)
  6. Causal Text Mining in the Era of Large Language Modeling: A Reality Check (in Japanese)
    Takehiro Takayanagi, Ryotaro Kobayashi, Masahiro Suzuki, Hiroki Sakaji, and Kiyoshi Izumi.
    18th Symposium of Young Researcher Association for NLP Studies (YANS), 2023.
  7. Proposing task to extract differences from time series financial documents (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, and Kiyoshi Izumi.
    Proceedings of the Annual Conference of JSAI, 2023.
    JSTAGEbib
  8. Performance Evaluation of Japanese Pre-trained Language Models with Different Word Segmentation Systems (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, and Kiyoshi Izumi.
    29th Annual Meeting of the Association for Natural Language Processing (NLP), 2023.
    paperdetail
  9. Stock Price Trend Forecast using Multiple Timeseries Analyst Reports (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi, and Yasushi Ishikawa.
    Workshop of Social System and Information Technology (WSSIT2022), 2022.
    paperdetail
  10. Construction and Validation of a Pre-Training and Additional Pre-Training Financial Language Model (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, Masanori Hirano, and Kiyoshi Izumi.
    Proceedings of JSAI Special Interest Group on Financial Infomatics (SIG-FIN) 28, 2022.
    paperdetail
  11. Construction and Validation of Additional Pre-Training Language Model using Financial Documents (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi, and Yasushi Ishikawa.
    28th Annual Meeting of the Association for Natural Language Processing (NLP), 2022.
    paperdetail
  12. Stock Price Movement Forecast from Analyst Reports by Text Mining (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, Hirano Masanori, and Kiyoshi Izumi.
    16th Symposium of Young Researcher Association for NLP Studies (YANS), 2021.
  13. Forecasting Net Income Estimate and Stock Price Using Text Mining from Economic Reports (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, Hirano Masanori, and Kiyoshi Izumi.
    IEICE Tech. Rep., 2021.
    IEICEbib
  14. Market Trend Analysis Using Polarity Index Generated from Analyst Reports (in Japanese)
    Rei Taguchi, Hikaru Watanabe, Masanori Hirano, Masahiro Suzuki, Hiroki Sakaji, Kiyosho Izumi, and Kenji Hiramatsu.
    Proceedings of JSAI Special Interest Group on Financial Infomatics (SIG-FIN) 27, 2021.
    paperdetail
  15. Construction and Validation of a Pre-Trained Language Model Using Financial Documents (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, Hirano Masanori, and Kiyoshi Izumi.
    Proceedings of JSAI Special Interest Group on Financial Infomatics (SIG-FIN) 27, 2021.
    paperdetail
  16. Stock Price Movement Forecast from Analyst Reports by Text Mining (in Japanese)
    Masahiro Suzuki, Toshiya Katagi, Hiroki Sakaji, Kiyoshi Izumi, and Yasushi Ishikawa.
    26th Annual Meeting of the Association for Natural Language Processing (NLP), 2020.
    paperdetail
  17. Net Income Forecast from Analyst Reports by Text Mining (in Japanese)
    Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi, Hiroyasu Matsushima, and Yasushi Ishikawa.
    Proceedings of the Annual Conference of JSAI, 2020.
    JSTAGEbib

Preprint

  1. JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning
    Issey Sukeda, Masahiro Suzuki, Hiroki Sakaji, and Satoshi Kodera.
    arXiv
  2. From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models
    Masahiro Suzuki, Masanori Hirano, and Hiroki Sakaji.
    arXivSSRN
  3. LoRA Tuning Conversational Japanese Large Language Models using Japanese Instruction Dataset (in Japanese)
    Masahiro Suzuki, Masanori Hirano, and Hiroki Sakaji.
    Jxivbib
  4. llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models (in Japanese)
    Masanori Hirano, Masahiro Suzuki, and Hiroki Sakaji.
    Jxivbib
  5. llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology
    Masanori Hirano, Masahiro Suzuki, and Hiroki Sakaji.
    arXivSSRN

Scholarship and Awards

Scholarship
  • 2020/04 :   TOYOTA/Dwango AI Scholarship (1 year: 1,200,000 yen / Approx. 11,000 USD)
  • 2020/04 :   JEES / SoftBank AI Human Resources Development Scholarship (1 year: 1,000,000 yen / Approx. 9,000 USD)
Awards

Academic Activities

Others

  • The University of Tokyo Golf Team website 2018 production