IJCNLP-AACL 2023 aims to have a broad technical program. Relevant topics for the conference include, but are not limited to, the following areas (in alphabetical order):
- Computational Social Science and Cultural Analytics
- Dialogue and Interactive Systems
- Discourse and Pragmatics
- Ethics and NLP
- Information Extraction
- Information Retrieval and Text Mining
- Interpretability and Analysis of Models for NLP
- Language Grounding to Vision, Robotics and Beyond
- Multilingualism and Language Contact: Code-switching, Representation Learning, Cross-lingual transfer
- Linguistic Theories, Cognitive Modeling, and Psycholinguistics
- Machine Learning for NLP
- Machine Translation
- NLP Applications
- Phonology, Morphology, and Word Segmentation
- Question Answering
- Resources and Evaluation
- Semantics: Lexical
- Semantics: Sentence-level Semantics, Textual Inference, and Other Areas
- Sentiment Analysis, Stylistic Analysis, and Argument Mining
- Speech and Multimodality
- Syntax: Tagging, Chunking and Parsing
- Theme Track: Large Language Models and Regional/Low-Resource Languages
Theme Track: Large Language Models and Regional/Low-Resource Languages
LLMs (large language models) have been a major breakthrough in NLP, allowing machines to process and understand human language with unprecedented effectiveness and efficiency. However, LLMs are predominantly utilized for widely-spoken languages like English, Chinese, and Spanish. However, the development (e.g., pre-training or fine-tuning) and utilization of LLMs for regional languages, such as those spoken in ASEAN countries, as well as low-resource languages, often receive insufficient attention.
The inadequate development and utilization of LLMs for regional and low-resource languages is a significant issue. Many people around the world speak such languages as their primary language, which often have unique grammatical structures, vocabulary, and cultural elements that are not easily translatable to other languages. The neglect of these languages in LLM research and development may hinder the creation of effective NLP tools for such languages, resulting in the linguistic and cultural exclusion of those who use them.
Improving the situation for regional and low-resource languages requires researchers and developers to prioritize the development of LLMs specifically designed for these languages, through pre-training or fine-tuning, and with sufficient consideration given to the context of how these languages are typically used by their speakers (e.g., code-mixed with local dialects or English). This may involve developing new models that account for the unique features of each language or adapting existing models to work with languages that have limited data available. Another crucial research topic is exploring how existing LLMs can better support the processing of such languages, including in downstream applications. Furthermore, efforts should be made to collect and curate high-quality language data to train and evaluate LLMs for these languages, both in the aspects of capability and value alignment.
In IJCNLP-AACL 2023, we are delighted to announce a special theme on “Large Language Models (LLMs) and Regional/Low-Resource Languages”. We welcome submissions from researchers on a range of topics within this theme, including position papers, opinion pieces, modeling studies, resource papers, and application papers. Possible topics of interest include (but are not limited to):
- Developing effective or efficient pre-training and fine-tuning techniques for large language models in regional/low-resource languages.
- Evaluating the effectiveness of current large language models on regional and low-resource languages, and identifying areas for improvement.
- Investigating the impact of pre-training and fine-tuning large language models on linguistic and cultural diversity for regional/low-resource languages.
- Developing strategies for creating high-quality data for pre-training or fine-tuning in regional/low-resource languages to improve the performance of large language models.
- Assessing the ethical considerations of using large language models in regional/low-resource languages, including issues of linguistic and cultural bias.
- Proposing techniques to manage regional-specific or emerging linguistic issues such as code-mixing, informal forms of regional languages in social media and English as used by regional (e.g., Southeast Asian) speakers.
- Exploring the potential of transfer learning or adaptation strategies to improve the performance of large language models in regional/low-resource languages, including those that take advantage of the common linguistic roots of some related regional languages or dialects.
The theme track submissions can be either long or short papers.