• LLMs Research
  • Posts
  • Underdog Victory: Tiny LLMs Take on Trillion-Token Titans in Today's Research Spotlight!

Underdog Victory: Tiny LLMs Take on Trillion-Token Titans in Today's Research Spotlight!

Summary of new LLM models and research papers published on April 22nd, 2024

New Models & πŸ”₯ research:

Microsoft launched Phi-3, a small language models with big potential. The Phi-3 model family comes in three sizes: Phi-3-mini at 3.8B parameters, Phi-3-small at 7B parameters, and Phi-3-medium at 14B parameters. Phi-3-mini has 128k context window!! - Read the full announcement

πŸ”—GitHub: https://github.com/hustcxx/InfoRE

The research paper proposes an information re-organization (InfoRE) method to solve this problem. This method involves first re-organizing the contextual content to identify logical relationships, and then using this re-organized information in the reasoning process. This allows LLMs to more deeply understand the context and improve their reasoning ability.

πŸ“ŠResults:
The research paper achieves an average improvement of 3% across all tasks in using only a zero-shot setting, demonstrating the effectiveness of their proposed method in improving the reasoning performance of LLMs.

πŸ€”Problem?:
This paper addresses the hallucination in LLMs. Hallucination leads to potential errors and a lack of understanding of complex multi-hop queries.

πŸ’»Proposed solution:
To solve this problem, the paper proposes a Missing Information Guided Retrieve-Extraction-Solving (MIGRES) paradigm. This approach leverages the ability of LLMs to identify missing information and generates a targeted query that guides the subsequent knowledge retrieval. Additionally, a sentence-level re-ranking filtering approach is designed to remove irrelevant content from documents. The LLMs then extract useful information from the filtered documents, which improves the overall efficacy of the Retrieval-Augmented Generation (RAG) process.

An Artificial Neuron for Enhanced Problem Solving in Large Language Models

The research paper proposes a solution in the form of a novel enhancement called the Artificial Neuron, which integrates external memory systems to mimic neurobiological processes. It works by recording and analyzing each LLM interaction in solving complex math word problems and common sense reasoning tasks. Incorrect responses are refined through feedback loops using a higher capacity LLM or human corrections, and both the query and the enhanced response are stored in a vector database. This external memory aid allows the LLM to reference past interactions and apply learned reasoning strategies to new problems.

πŸ“ŠResults:
The research paper demonstrates a significant improvement in accuracy and efficiency through testing with the GSM8K dataset and subsequent refinements through feedback loops.

A new tiny model with SOTA reasoning capabilities

PARAMANU-GANITA: Language Model with Mathematical Capabilities πŸ”₯
The paper proposes a novel AutoRegressive (AR) decoder based language model called Paramanu-Ganita, which is pretrained from scratch on a curated mixed mathematical corpus. The model has a context size of 4096 and a total of 208 million parameters. It works by utilizing a large amount of training data and advanced AR techniques to learn mathematical reasoning and solve mathematical problems with high accuracy.

πŸ“ŠResults:
The research paper achieved significant performance improvement compared to existing large language models. Paramanu-Ganita, which is only 35 times smaller than 7B LLMs, outperformed generalist LLMs such as LLaMa-1 7B by 28.4%, LLaMa-2 7B by 27.6%, Falcon 7B by 32.6%, PaLM 8B by 35.3%

Papers with database/benchmarks:

πŸ“šWant to learn more, Survey paper:

🧯Let’s make LLMs safe!! (LLMs security related papers)

To assess the robustness of RAG, the research paper introduces a novel attack method, the Genetic Attack on RAG (GARAG). GARAG is designed to reveal vulnerabilities in each component and test the overall system functionality against noisy documents. It does this by applying low-level perturbations to the documents and evaluating the impact on the performance of RAG.

The research paper proposes a solution called the Information Bottleneck Protector (IBProtector). This defense mechanism is based on the information bottleneck principle and modifies the objective to avoid trivial solutions. It selectively compresses and perturbs prompts using a lightweight and trainable extractor, keeping only essential information for the target LLMs to respond with the expected answer. Additionally, it considers a scenario where the gradient is not visible, making it compatible with any LLM.

🌈 Creative ways to use LLMs!! (Applications based papers)

πŸ€–LLMs for robotics:

Reply

or to participate.