- LLMs Research
- Posts
- May 19th, 2024
May 19th, 2024
π takeaway from todayβs newsletter
MAML-en-LLM: Multi-task fine-tuning to improve in context learning
In depth analysis and quantization of memorization in LLMs
Decoding by Contrasting Knowledge: Enhancing LLMs Confidence on Edited Facts
MultiMedRes: A multimodal medical collaborative reasoning framework
EmbSum: LLMs for recommendation system
Core research improving LLMs!
π‘Why?: Can LLMs be used without fine-tuning to unseen domains?
π»How?: The research paper proposes a novel method called MAML-en-LLM for meta-training LLMs, which aims to learn truly generalizable parameters that not only perform well on disjointed tasks but also adapt to unseen tasks. This is achieved by meta-training pre-trained LLMs on a wide range of diverse tasks and using in-context multi-task fine-tuning. This allows the LLMs to learn a more robust representation of language that can adapt to new tasks without the need for extensive fine-tuning.
πResults: The research paper reports an average increase of 2% in performance on unseen domains and a significant 4% improvement in adaptation performance. It also outperforms other state-of-the-art meta-training approaches in settings with limited training data, achieving an average improvement of 2%. These results demonstrate the effectiveness and superiority of MAML-en-LLM in adapting LLMs to new tasks.
Scope of research in memorization prevention
π‘Why?: This paper quantifies the LLMs memorization.
π»How?: The research paper proposes to comprehensively discuss memorization from various perspectives and extend the discussion to not only the memorized content, but also less and unmemorized content. This is achieved through various studies including:
Experiments
Embedding analysis
N-gram statistics analysis
Entropy decoding dynamics analysis
Additionally, a Transformer model is trained to predict memorization based on context.
π‘Why?: Foundation LLMs knowledge cutoffs can be outdated for some information. It requires a knowledge editing (KE) but due to the black-box nature of LLMs, making it difficult to interpret their results.
π»How?: The research paper proposes a novel approach called "DeCoding by Contrasting Knowledge" (DeCK) to enhance the performance of in-context editing (ICE) for knowledge editing. This approach works by comparing the logits (measure of confidence) of newly edited knowledge with those of unedited parametric knowledge. By contrasting these logits, DeCK aims to improve the confidence of LLMs in edited facts, thus addressing the issue of stubborn knowledge.
πResults: The research paper demonstrates consistent improvements in the performance of LLMs when using DeCK in knowledge editing. For example, on the MQuAKE dataset, DeCK improved the performance of LLaMA3-8B-instruct by up to 219%. This shows the capability of DeCK to strengthen ICE in the editing of stubborn knowledge. The source code for DeCK is also made available for further research and development.
LLMs evaluations
π‘Why?: The research paper addresses the problem of understanding how LLMs perform reinforcement learning (RL) tasks, and whether they are susceptible to biases, given their potential use as decision-making agents.
π»How?: Paper checks whether LLMs exhibit behavioral signatures of a relative value bias, similar to humans, and how this bias affects their performance in RL tasks. This is done by conducting experiments with multiple bandit tasks and models, and using computational cognitive modeling to understand the underlying behavior. The paper also explores the use of explicit outcome comparisons in the prompt to enhance performance in trained choice sets, but found that it impaired generalization to new choice sets. The proposed solution works by incorporating relative values at the outcome encoding stage in a simple RL algorithm.
MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation - GitHub
π‘Why?: The research paper assess the capabilities of LLMs in code generation at the function level.
π»How?: The research paper proposes the introduction of a new benchmark called the Mostly Hard Python Problems (MHPP) dataset. This dataset consists of 140 unique human-curated problems that focus on the combination of natural language and code reasoning. This allows for a more comprehensive evaluation of LLMs' abilities to comprehend specifications and restrictions, engage in multi-step reasoning, and effectively apply coding knowledge. The MHPP dataset is publicly available on GitHub for further research and evaluation.
π‘Why?: The research paper addresses the challenge of incorporating LLMs into Human-Computer Interaction (HCI) and the impact of this on user experience (HX). It also explores the potential benefits of applying Human-Centered Design (HCD) principles to LLMs.
π»How?: The research paper derives six specific HCD guidelines for LLMs and conducting a preliminary experiment to demonstrate how these principles can be applied to enhance user experience within GPT. This is done by using a single document input to GPT's Knowledge base as a new knowledge resource to control interactions between GPT and users, with the goal of meeting the diverse needs of hypothetical software learners. The experiment focuses on optimizing different elements and configurations to improve the effectiveness of the interaction between GPT and software learners.
Towards Translating Real-World Code with LLMs: A Study of Translating to Rust - Paper conducts a substantial study on LLM-based translation to Rust, using five state-of-the-art LLMs, GPT4, Claude 3, Claude
Letβs make LLMs safe!!
π‘Why?: The research paper addresses the issue of hidden backdoors, or trojans, in LLMs used for software development. These backdoors allow attackers to manipulate the behaviour of the model maliciously, posing a serious threat to the safety and security of software systems.
π»How?: The research paper proposes a method for detecting potential backdoor signals in LLMs of code by analyzing the model parameters and embeddings. Specifically, the attention weights and biases and context embeddings of clean and poisoned CodeBERT and CodeT5 models are examined. The researchers found noticeable patterns in the context embeddings of poisoned samples, indicating the presence of backdoor signals. This method can be used as part of ongoing efforts to develop white-box detection techniques for backdoors in LLMs.
π‘Why?: The research paper addresses the issue of using LLMs in human-in-the-loop, human-in-the-plant, cyber-physical systems (CPS). Specifically, it aims to address the potential challenges and risks associated with utilizing LLMs to generate plans and make decisions in these complex systems.
π»How?: The research paper proposes a solution called CPS-LLM, which involves retraining an LLM using an instruction tuning framework. This framework takes into account the physical dynamics of the CPS and ensures that the generated plans are not only feasible for the physical system to execute, but also safe for human users. This is achieved through two components: a physical dynamics coefficient estimator and an LLM trained with prompts and corresponding model coefficients. The CPS-LLM is then integrated with a contextualized chatbot, such as BARD, to generate feasible and safe plans for managing external events in automated insulin delivery systems.
Creative ways to use LLMs!!
LLMs for content recommendation system
EmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations
π»How?: Paper proposes a novel framework called EmbSum. This framework enables offline pre-computations of users and candidate items while capturing the interactions within the user engagement history. It utilizes a pretrained encoder-decoder model and poly-attention layers to derive User Poly-Embedding (UPE) and Content Poly-Embedding (CPE) which are used to calculate relevance scores between users and candidate items. Furthermore, EmbSum actively learns the long user engagement histories by generating user-interest summaries with supervision from LLMs. This allows for more accurate and personalized content recommendations.
πResults: The research paper achieved better performance compared to state-of-the-art methods in terms of accuracy and parameter efficiency on two different datasets from different domains. Additionally, the model's ability to generate summaries of user interests serves as a valuable by-product, enhancing its usefulness for personalized content recommendations.
Research paper proposes a multimodal medical collaborative reasoning framework called MultiMedRes. This framework incorporates a learner agent that proactively gains essential information from domain-specific expert models, to solve medical multimodal reasoning problems. The method involves three steps: Inquire, Interact, and Integrate. First, the learner agent decomposes complex medical reasoning problems into multiple domain-specific sub-problems. Then, it interacts with domain-specific expert models by repeating the "ask-answer" process to progressively obtain different domain-specific knowledge. Finally, the agent integrates all the acquired knowledge to accurately address the medical reasoning problem.
πResults: The research paper validates the effectiveness of their method on the task of difference visual question answering for X-ray images. Their experiments demonstrate that their zero-shot prediction achieves state-of-the-art performance.
Reply