- LLMs Research
- Posts
- LLMs research papers published on April 24th
LLMs research papers published on April 24th
Summary of research papers categorized in following categories: LLMs performance improvement techniques, generative agents, LLMs benchmarks, LLMs survey paper, LLMs applications and survey papers
LLMs core performance improvement papers
π€Problem?:
The research paper addresses the problem of limited and fixed claim-reference relationships in current studies on referential knowledge linking (RKL). While RKL is critical for fulfilling human demands for authentic and reliable information, existing approaches are restricted to specific tasks and may not be able to handle the diverse and complex referential knowledge linking tasks in the real-world.
π»Proposed solution:
To solve this problem, the research paper proposes a universal referential knowledge linking (URL) model that aims to tackle various RKL tasks with one unified framework. This is achieved through a LLM-driven task-instructed representation compression, which effectively adapts the instruction following and semantic understanding abilities of LLMs to RKL. The proposed model also utilizes a multi-view learning approach to further enhance its performance. In summary, the URL model combines the strengths of LLMs and multi-view learning to effectively handle diverse and complex RKL tasks.
π€Problem?: The research paper tackles efficiently personalizing text generation by LLMs to meet individual preferences without creating a unique model for each user.
π»Proposed solution: The research paper proposes an innovative method that uses neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback. This enhances the personalization of open-ended text generation by white-box LLMs. Essentially, the method allows the LLM to adapt and improve its text generation based on how the user responds to the generated content.
πResults: Framework achieves up to a 62.9% improvement in terms of best ROUGE scores and up to a 2.76% increase in LLM-agent evaluation against the baseline in personalized news headline generation.
The research paper proposes zkLLM, a specialized zero-knowledge proof tailored for LLMs to verify the correctness of the entire inference process of LLMs, while also protecting the privacy of the model's parameters. It works by leveraging the foundation of tlookup, a parallelized lookup argument designed for non-arithmetic tensor operations in deep learning. Additionally, the paper introduces zkAttn, a specialized zero-knowledge proof crafted for the attention mechanism in LLMs, which balances considerations of running time, memory usage, and accuracy.
πResults:
The paper claims that their approach, powered by a fully parallelized CUDA implementation, enables the generation of a correctness proof for the entire inference process of LLMs with 13 billion parameters in under 15 minutes. The resulting proof is also compact in size, at less than 200 kB, and ensures no inadvertent information leakage.
πGitHub: https://ggg0919.github.io/cantor/
π€Problem?:The paper aims to address the challenge of "determining hallucinations" in decision-making when tackling visual reasoning tasks due to insufficient visual information and the limitations of low-level perception tools.
π»Proposed solution:
The paper proposes an innovative framework called Cantor, which combines visual context acquisition and logical reasoning to enhance visual reasoning tasks. Cantor utilizes large language models (LLMs) enhanced by the chain-of-thought (CoT) methodology to decompose the problem into manageable sub-tasks and sequentially tackle them with various external tools. It also integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Additionally, Cantor leverages the advanced cognitive functions of multimodal large language models (MLLMs) to derive higher-level information, enhancing the CoT generation process.
The research paper proposes EasyLAN, a human-computer collaborative tool, to help developers construct LANs. It works by initially generating a LAN with one agent based on the task description, and then using a few training examples to update the LAN. EasyLAN models the gap between the output and the ground truth, identifies errors, and addresses them through carefully designed strategies. Users can intervene or modify the LAN directly. This process allows the LAN to evolve from a single agent to a network of LLM agents.
π€Problem?:
Significant amount of memory occupied by the KV cache becomes a bottleneck for inference and it makes LLMs a memory fraining machine!
π»Proposed solution:
The research paper proposes a novel approach called CORM, which optimizes the KV cache and reduces its memory footprint. This is achieved by dynamically retaining important key-value pairs for inference without finetuning the model. CORM takes advantage of the high similarity between adjacent tokens' query vectors and the fact that current query's attention calculation can rely on a small portion of the preceding queries. This allows for a more efficient use of the KV cache without compromising performance.
πResults:
The research paper validates that CORM can reduce the inference memory usage of KV cache by up to 70% without noticeable performance degradation across six tasks in LongBench. This demonstrates a significant improvement in memory usage and computational efficiency for LLMs during inference.
π€Problem?:
The research paper aims to address the challenge of validating information in knowledge graphs (KGs) at a large scale and high cost.
π»Proposed solution:
The proposed solution is to use Large Language Models (LLMs) as a generative agent for human-in-the-loop validation of KGs. This is made possible by recent developments in open-source tools for structural and semantic validation of LLM outputs, and by utilizing flexible approaches to fact checking and verification. The framework also allows for the incorporation of external knowledge sources for additional verification.
π€Problem?:
The research paper addresses the issue of enhancing the ability of LLMs to follow complex instructions with multiple constraints.
π»Proposed solution:
Paper studyies what training data is effective in enhancing the LLM's understanding of complex instructions. They found that training LLMs with instructions containing multiple constraints can improve their ability to follow complex instructions, even for out-of-domain constraints. They also propose methods for obtaining and utilizing the effective training data. The training data and methods are then tested through extensive experiments to prove their effectiveness in terms of overall performance, training efficiency, and generalization abilities.
Benchmarks/LLMs evaluation
πWant to learn more, Survey paper
The research paper proposes to solve the problem of scalability by utilizing Large Language Models (LLMs) to create a variety of honeytokens. This approach involves systematically investigating different prompt structures and building blocks to generate honeytokens of different types, such as configuration files, databases, and log files. The researchers also tested the performance of these honeytokens across different state-of-the-art LLMs to assess their effectiveness.
π§―Letβs make LLMs safe!!
Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
The research paper proposes a unique multi-turn threat model that leverages the LLM's sycophancy effect to analyze and dissect task instruction and knowledge leakage in the LLM responses. This model is used to measure the average attack success rate (ASR) in a multi-turn setting, which is found to be 86.2%, with some LLMs like GPT-4 and claude-1.3 leaking up to 99%. To mitigate this, the paper suggests implementing 6 black-box defense strategies, including a query-rewriter in the RAG scenario.
π Creative ways to use LLMs!! (Applications)
Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model [GitHub: https://github.com/ftgTUGraz/Chat2Scenario]
Chat2Scenario uses LLMs which allows for the efficient extraction of driving scenarios from descriptive texts of driving conditions, using specified criticality metric thresholds. The extracted scenarios are then converted into ASAM OpenSCENARIO and IPG CarMaker text files, streamlining the scenario extraction process and enhancing efficiency.
The research paper proposes a Graph RAG (Retrieval-Augmented Generation) approach, which combines the strengths of two contrasting methods - retrieval-augmented generation and query-focused summarization. This approach uses a large language model to build a graph-based text index in two stages. First, an entity knowledge graph is derived from the source documents, and then community summaries are pregenerated for closely-related entities.
The research paper proposes Telco-RAG, a customized RAG framework specifically designed for handling the complexities of telecom standard documents, particularly 3rd Generation Partnership Project (3GPP) documents. Telco-RAG addresses the challenges of implementing a RAG pipeline on highly technical content by providing guidelines and offering a solution for applying LLMs in telecommunications.
π€LLMs for robotics
The research paper addresses the lack of discussion and exploration of using FPGAs in large machine learning applications, despite evidence showing their potential for efficient and high-performance computation. The research paper proposes to solve this problem by developing a scalable multi-FPGA platform and tools to map large applications to the platform. This platform would allow for the implementation and deployment of multi-FPGA applications, which currently lacks a commonly-accepted flow. The platform and tools would allow for the exploration of using multiple FPGAs in large machine learning applications, specifically for large transformers. The platform and tools were validated by designing a multi-FPGA version of the I-BERT transformer and implementing one encoder using six FPGAs as a proof-of-concept. This shows that with the right infrastructure and tools, it is feasible to explore the benefits of using FPGAs for large machine learning applications.
Reply