LLMs Research
Posts
LLMs based research papers published on 30th April, 2024

LLMs based research papers published on 30th April, 2024

Summary of research papers categorized in following categories: LLMs performance improvement techniques, generative agents, LLMs benchmarks, LLMs survey paper, LLMs applications and survey papers

May 01, 2024

Core research improving LLMs performance

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

💻Proposed solution: The research paper proposes new framework named HydraLoRA to improve performance of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. A new PEFT approach called HydraLoRA, which has an asymmetric structure that eliminates the need for domain expertise. This approach is built on two critical insights that were uncovered through a series of experiments on LoRA. It works by using an asymmetric structure that allows for more fine-grained control over the parameters being adapted, resulting in better performance.

📊Results: Through experiments, the research paper demonstrates that HydraLoRA outperforms other PEFT approaches, even those that rely on domain knowledge during the training and inference phases.

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

The research paper addresses the problem of reward over-optimization (ROO) in reinforcement learning (RL) for LLMs. It proposes a new approach called Reward Calibration from Demonstration (RCfD). This approach leverages human demonstrations and a reward model to recalibrate the reward objective. This means that instead of directly optimizing the reward function, RCfD minimizes the distance between the demonstrations' rewards and the LLM's rewards. This helps to avoid over-optimization and promotes more natural and diverse language generation.

📊Results:
The research paper shows the effectiveness of RCfD on three language tasks, achieving comparable performance to carefully tuned baselines while mitigating ROO. This means that RCfD successfully addresses the problem of reward over-optimization without sacrificing performance on language tasks.

Extending Llama-3's Context Ten-Fold Overnight
[Model: https://github.com/FlagOpen/FlagEmbedding]

The research paper extends the context length of the Llama-3-8B-Instruct model from 8K to 80K through fine-tuning using QLoRA. This involves training the model on a large dataset, generated by GPT-4, which contains 3.5K synthetic training samples. The extended context length allows the model to better understand and process longer contexts, leading to improved performance on various evaluation tasks.

💡 Read the 1st paper HydraLoRA, can it be used here to further improve the performance/context length ⁉️

📚Survey papers

Do Large Language Models Understand Conversational Implicature -- A case study with a evevaluates sitcom

RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

LLMs evaluation approaches & benchmark

Harmonic LLMs are Trustworthy - A new metric proposed to better assessing the working of LLMs (at gradient level)🔥🔥

Multi-hop Question Answering over Knowledge Graphs using Large Language Models

🤔Problem?: Research paper tries to overcome challenges of answering questions that require multiple hops or steps of reasoning over the KG.

💻Proposed solution: Paper evaluates the capability of LLMs to answer questions over KGs involving multiple hops. The approach involves extracting relevant information from the KG and feeding it to the LLM, which then generates the final answer. The paper also highlights the need for different approaches depending on the size and nature of the KG, as every LLM has a fixed context window.

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

GRAMMAR: Grounded and Modular Evaluation of Domain-Specific Retrieval-Augmented Language Models

Exploring Multi-Lingual Bias of Large Code Models in Code Generation

Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships

RepEval: Effective Text Evaluation with LLM Representation

🧯Let’s make LLMs safe!!

Enhancing Trust in LLM-Generated Code Summaries with Calibrated Confidence Scores

The research paper proposes a solution which studies summarization as a calibration problem. It suggests using a confidence measure to determine whether a summary generated by an AI-based method is likely to be similar to what a human would produce. This is achieved by examining the performance of several LLMs in different settings and languages. The paper suggests an approach that provides well-calibrated predictions of the likelihood of similarity to human summaries.

Safe Training with Sensitive In-domain Data: Leveraging Data Fragmentation To Mitigate Linkage Attacks

🤔Problem?: The research paper addresses the issue of sensitive information being exposed by text generation models that are trained using real data. This can pose a risk, particularly when the data contains confidential patient information.

💻Proposed solution: Paper proposes a safer alternative where the data is fragmented into short phrases that are randomly grouped together and shared instead of full texts. This prevents the model from memorizing and reproducing sensitive information in one sequence, thus protecting against linkage attacks. The authors fine-tune several state-of-the-art LLMs using meaningful syntactic chunks to explore their utility and demonstrate the effectiveness of this approach.

Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning 🤯🤯

🤔Problem?: The research paper addresses the issue of backdoor attacks on English-centric LLMs and how they can also affect multilingual models.

💻Proposed solution: Paper investigates cross-lingual backdoor attacks against multilingual LLMs. This involves poisoning the instruction-tuning data in one or two languages, which can then activate malicious outputs in languages whose instruction-tuning data was not poisoned. This method has been shown to have a high attack success rate in various scenarios, even after paraphrasing, and can work across 25 languages.

🌈 Creative ways to use LLMs!!

Neuro-Vision to Language: Image Reconstruction and Interaction via Non-invasive Brain Recordings

The research paper proposes a framework that integrates 3D brain structures with visual semantics using Vision Transformer 3D. This allows for efficient alignment of fMRI features with multiple levels of visual embeddings, eliminating the need for custom models and extensive trials. The framework also incorporates textual data related to fMRI images to support multimodal large model development. This integration with Large Language Models (LLMs) enhances decoding capabilities, enabling tasks like brain captioning, question-answering, detailed descriptions, complex reasoning, and visual reconstruction.

FactCheck Editor: Multilingual Text Editor with End-to-End fact-checking

The research paper proposes an advanced text editor called 'FactCheck Editor'. This tool aims to automate fact-checking and correct any factual errors in text content. It utilizes transformer models to assist humans in the labor-intensive process of fact verification. The tool supports over 90 languages and has a complete workflow that detects text claims in need of verification, generates relevant search engine queries, and retrieves appropriate documents from the web. It also employs Natural Language Inference (NLI) to predict the veracity of claims and uses Language Model (LM) to summarize evidence and suggest textual revisions to correct any errors in the text.

Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models

The paper proposes a transformative framework that integrates semi-structured data with ChatGPT3.5 to automate the creation of medical simulation scenarios. This works by utilizing AI to efficiently generate detailed and clinically relevant scenarios that are tailored to specific educational objectives. By doing so, the paper aims to reduce the time and resources required for scenario development, allowing for a broader variety of simulations.

Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns

The research paper uses LLMs for deobfuscation of malicious code in order to improve threat intelligence and combat obfuscated malware. It evaluates real-world malicious scripts, specifically those used in the Emotet malware campaign. The LLMs are fine-tuned for this task and are able to efficiently deobfuscate the malicious payloads, thus aiding in the fight against obfuscated malware.

PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games

The research paper combines the power of LLMs with a game designer's high-level criteria to generate narrative content that aligns with the game's procedural narrative. PANGeA not only generates game level data (such as setting and non-playable characters), but also allows for dynamic, free-form interactions between the player and the environment. To ensure consistency, PANGeA uses a novel validation system that evaluates text input and aligns generated responses with the unfolding narrative. It also utilizes a custom memory system to provide context for augmenting generated responses.

🤖LLMs for robotics

Transcrib3D: 3D Referring Expression Resolution through Large Language Models

🔗GitHub: https://ripl.github.io/Transcrib3D

🤔Problem?: The research paper enable robots to effectively interpret natural language references to objects in their 3D environment, in order to work alongside people.

💻Proposed solution: It proposes Transcrib3D, which combines 3D detection methods with the reasoning capabilities of LLMs. This approach uses text as a common medium, eliminating the need for shared representations between multi-modal inputs and avoiding the need for massive amounts of annotated 3D data. Transcrib3D achieves state-of-the-art results on 3D reference resolution benchmarks, with a significant improvement over previous multi-modality baselines. To further improve performance and allow for local deployment on edge computers and robots, the paper also proposes a self-correction process for fine-tuning smaller models, resulting in performance comparable to larger models.

Reply

or to participate.