Deconstructing RARE: Bridging Theory and Practice in Retrieval-Augmented Reasoning Modeling (arXiv:2503.23513)

1. Introduction: The Bottleneck in Domain-Specific AI

The demand for artificial intelligence systems capable of operating with deep expertise in specialized domains—such as medicine, law, and finance—is rapidly increasing. However, standard Large Language Models (LLMs), despite their impressive general capabilities, often encounter significant limitations when deployed in these niche areas. Key challenges include a propensity for knowledge hallucination (generating plausible but factually incorrect information) and inadequate reasoning capabilities, particularly when operating under the constrained parameter budgets typical of deployable, efficient models.¹ These shortcomings hinder their reliable application in fields where accuracy and logical coherence are paramount.

At the heart of this issue lies a fundamental trade-off. Within a fixed model size (parameter count), LLMs must balance the need to memorize vast quantities of domain-specific knowledge against the need to develop sophisticated, domain-relevant reasoning skills.¹ Conventional adaptation techniques, like fine-tuning on domain data, often conflate these two objectives, potentially leading to inefficient use of model capacity where parameters are heavily allocated to storing facts rather than optimizing cognitive processes.⁴

In response to this challenge, the paper "RARE: Retrieval-Augmented Reasoning Modeling" by Zhengren Wang and colleagues introduces a novel paradigm.¹ RARE proposes a fundamental shift: decoupling the storage of domain knowledge from the optimization of reasoning abilities.¹ This approach draws inspiration from educational theory, specifically Bloom's Taxonomy, suggesting that current LLM training may overemphasize lower-order cognitive skills like 'remembering' at the expense of higher-order skills like 'applying' and 'analyzing' information within a specific domain context.¹ This framing suggests the limitations observed are not merely technical artifacts but parallel known constraints in learning when rote memorization is prioritized over comprehension and application.

Furthermore, the paper's emphasis on achieving high performance with "lightweight" models under "constrained parameter budgets" ¹ positions RARE not only as a method for enhancing accuracy but also as a pathway toward more efficient and accessible domain-specific AI. By potentially reducing the reliance on massive parameter counts, RARE could lower the computational cost and broaden the deployment possibilities for specialized AI systems.¹ This report will delve into the RARE philosophy, detail its mathematical underpinnings, discuss the status of its implementation, provide a conceptual walkthrough of how it might be realized in code, and conclude with its potential implications.

2. The RARE Philosophy: Shifting the Learning Objective

Core Concept: Decoupling Knowledge and Reasoning

The central tenet of RARE is the separation of knowledge storage and reasoning optimization.¹ Instead of requiring the LLM to internalize and memorize extensive domain facts within its parameters, RARE externalizes this knowledge component. It assumes that domain knowledge resides in external, retrievable sources, such as specialized databases, document corpora, or knowledge graphs. The model's training objective then shifts exclusively to internalizing domain-specific reasoning patterns and thinking skills.¹

The authors employ the analogy of an "open-book examination" to illustrate this concept.¹ In such an exam, students are not primarily tested on their ability to recall facts from memory (the "closed-book" approach analogous to standard LLM training). Instead, they are evaluated on their ability to effectively use the provided reference materials (the "textbook" or external knowledge) to analyze problems, synthesize information, and construct reasoned solutions. RARE aims to train LLMs in this "open-book" manner, focusing on developing their capacity to reason with information rather than simply storing it.

Mechanism: Retrieval-Augmented Training

RARE achieves this decoupling through a modification of the training process itself. The key mechanism is the injection of retrieved knowledge, denoted as R(x), directly into the training prompts alongside the original input instruction x.¹ This seemingly simple step fundamentally transforms the learning objective. The model is no longer trained to generate the correct response y based solely on x (which would require recalling knowledge related to x). Instead, it learns to generate y given both x and the relevant external knowledge R(x).

This reframing shifts the focus from rote memorization to the contextualized application of provided information.¹ When the model makes an error related to factual content during training, it's not interpreted as a failure of memory but as a failure to correctly understand or apply the retrieved knowledge presented in the prompt. Consequently, the optimization process (gradient descent) prioritizes the development of pathways for integrating external context and performing reasoning steps based on that context, rather than reinforcing factual recall mechanisms.¹

Contrast with Existing Paradigms

RARE distinguishes itself from both standard LLM training and conventional Retrieval-Augmented Generation (RAG) techniques:

● Vanilla LLMs: These models attempt to store both world knowledge and reasoning procedures within their parameters. This entanglement makes knowledge updates difficult and can lead to inefficient parameter allocation, especially for deep domain expertise.¹

● Standard RAG: Typically, RAG involves retrieving relevant documents during inference time to provide additional context to the LLM prompt.² While this improves factual grounding and reduces hallucination, the underlying model is usually trained conventionally. Retrieval serves as an input augmentation technique rather than a core component of the learning process itself. RAG helps the model access knowledge at inference, but RARE aims to teach the model how to reason using accessed knowledge during training.¹

The following table summarizes these distinctions:

Table 1: Comparison of Learning Paradigms

Feature	Vanilla LLM	Standard RAG (Inference)	RARE (Training & Inference)
Knowledge Storage	Internal (Model Parameters)	Internal + External (Retrieved at Inference)	Primarily External (Retrieved); Internal focus on reasoning
Reasoning Focus	Learned alongside knowledge memorization	Learned alongside knowledge memorization	Primary focus of training; learned via contextual application
Role of Retrieval	N/A	Augments inference context	Augments training prompts; integral to learning objective
Primary Learning Objective	Memorization + Reasoning	Memorization + Reasoning (Inference uses context)	Contextualized Reasoning & Application
Parameter Efficiency (Domain)	Lower (Parameters used for memorization)	Lower (Parameters used for memorization)	Higher (Parameters freed for reasoning)
Knowledge Updatability	Difficult (Requires retraining)	Moderate (External source updatable, internal static)	Easier (Update external source, reasoning model stable)

(Synthesized from ¹)

The Role of Bloom's Taxonomy

The connection to Bloom's Taxonomy provides a conceptual framework for understanding RARE's ambition.¹ The taxonomy categorizes cognitive skills in a hierarchy, from lower-order skills like 'Remembering' and 'Understanding' to higher-order skills like 'Applying', 'Analyzing', 'Evaluating', and 'Creating'. RARE explicitly aims to shift the LLM's training emphasis up this hierarchy for domain-specific tasks. By externalizing the 'Remembering' aspect (knowledge storage), RARE frees up the model's representational capacity and computational resources during training to focus on mastering the 'Applying' and 'Analyzing' of domain knowledge within relevant contexts.¹ This focus on higher-order cognitive processes is hypothesized to be more effective for complex, domain-specific problem-solving.

However, this approach introduces a significant dependency. The entire RARE paradigm hinges on the quality and relevance of the external knowledge source and the effectiveness of the retrieval mechanism (R(x)) used during training.¹ If the retrieved information is inaccurate, irrelevant, or incomplete, the model will be trained to reason over flawed premises. The learning process, guided by gradients derived from optimizing performance on these poor contexts, could lead to the internalization of suboptimal or even incorrect reasoning strategies. This highlights a critical practical consideration: the performance of a RARE-trained model is inextricably linked to the fidelity of its knowledge retrieval system, both during training and inference.

Furthermore, the claim that RARE "bypasses parameter-intensive memorization" ¹ warrants careful consideration. While the need for explicit, verbatim recall of vast factual databases is reduced, it is unlikely that the model completely avoids learning any internal representations related to the domain. To effectively utilize retrieved medical documents, for instance, the model must develop an understanding of medical terminology, conceptual relationships, and common diagnostic patterns. This suggests RARE likely shifts the nature of the learned knowledge – from static factual recall towards dynamic conceptual understanding and procedural application – rather than eliminating internal knowledge representation entirely. The parameters saved from rote memorization are likely repurposed to build these more flexible, reasoning-oriented representations.

3. Under the Hood: The Mathematical Framework of RARE

To formalize the RARE approach and contrast it with standard methods, the paper presents mathematical formulations for the learning objectives.¹ These formulations center on modeling the probability of generating a desired response y, given an input instruction x. The response y is conceptualized as a combination, or concatenation (⊕), of domain knowledge components k and domain thinking/reasoning steps r, such that y = k ⊕ r.¹

Vanilla Model Formulation

For a standard LLM trained without explicit retrieval augmentation during training, the joint probability of generating the response y (composed of knowledge k and reasoning r) given the input x is modeled as:

pVanilla(y|x) = pVanilla(k ⊕ r|x) = pθ(k|x) · pθ(r|x, k) ¹

Here, pθ(k|x) represents the probability that the model, with parameters θ, can recall or generate the necessary domain knowledge k based solely on the input x. pθ(r|x, k) represents the probability of generating the reasoning steps r, conditioned on both the input x and the previously generated/recalled knowledge k.

The training objective is typically to maximize this probability, which corresponds to minimizing the negative log-likelihood, or the cross-entropy loss LVanilla:

LVanilla = −E(x,k,r)[log pVanilla(y|x)]

LVanilla = −E(x,k,r)[log pθ(k|x) + log pθ(r|x, k)]

LVanilla = −E(x,k,r)[log pθ(k|x)] + −E(x,k,r)[log pθ(r|x, k)] 1

The paper interprets the components of this loss function distinctly:

● −E(x,k,r)[log pθ(k|x)]: This term is labeled the "Loss of Remembering". It penalizes the model for failing to generate the correct knowledge k based on the input x alone.

● −E(x,k,r)[log pθ(r|x, k)]: This term is labeled the "Loss of Reasoning". It penalizes the model for failing to generate the correct reasoning steps r, given the input x and the knowledge k.

Crucially, standard LLM training optimizes the sum of these two components, implicitly forcing the model to allocate parameters to both memorizing knowledge and learning to reason.¹

RARE Model Formulation

RARE introduces a key modification by incorporating the retrieved external knowledge R(x) as an explicit condition during training. The joint probability distribution under the RARE paradigm becomes:

pRARE(y|x, R(x)) = pRARE(k ⊕ r|x, R(x)) = pθ(k|x, R(x)) · pθ(r|x, R(x), k) ¹

In this formulation:

● pθ(k|x, R(x)) represents the probability of generating (or integrating) the knowledge component k, given not only the input x but also the retrieved knowledge R(x).

● pθ(r|x, R(x), k) represents the probability of generating the reasoning steps r, conditioned on the input x, the retrieved knowledge R(x), and the integrated knowledge k.

The corresponding loss function for RARE, LRARE, is:

LRARE = −E(x,k,r)

LRARE = −E(x,k,r) + −E(x,k,r) 1

The interpretation of the loss components shifts accordingly:

● −E(x,k,r): This term is labeled the "Loss of Understanding and Application". It penalizes the model for failing to correctly utilize the provided retrieved knowledge R(x) (along with the input x) to generate the appropriate knowledge component k.

● −E(x,k,r): This term remains the "Loss of Reasoning", but it is now explicitly contextualized by the retrieved knowledge R(x). It penalizes faulty reasoning given the external information.

Comparing the Formulations

The critical difference lies in the conditioning on R(x). By making the generation of both knowledge and reasoning components dependent on the retrieved information, LRARE mathematically encodes the "open-book" philosophy. The model is no longer penalized for failing to remember k from its internal parameters (as LVanilla does via the pθ(k|x) term). Instead, the penalty focuses on the model's ability to process and apply the externally provided knowledge R(x) to arrive at the correct knowledge k and reasoning r. This redirection of the optimization objective is the core mechanism by which RARE aims to prioritize the development of reasoning skills over rote memorization.

While mathematically elegant, this formulation presents practical challenges. The decomposition y = k ⊕ r assumes that knowledge and reasoning components within a chain-of-thought response can be cleanly separated and identified.¹ In practice, knowledge invocation and reasoning steps are often tightly interwoven in human-like explanations. Creating training datasets where k and r are explicitly and consistently labeled to allow for the calculation of the distinct loss terms pθ(k|...) and pθ(r|...) could be a significant hurdle. The practical success of RARE may depend heavily on the quality and structure of the training data used and how well this decomposition can be approximated.

Furthermore, the RARE loss function LRARE still includes a term related to producing the knowledge component k, namely pθ(k|x, R(x)).¹ Although framed as "Loss of Understanding and Application," its presence indicates the model remains responsible for generating or selecting the appropriate knowledge elements for the final response, albeit conditioned on the retrieved context. This reinforces the earlier notion that RARE doesn't entirely eliminate knowledge representation or generation from the model's tasks. Rather, it reframes the task from pure recall to knowledge integration and synthesis based on external context. The model learns to become an effective user of retrieved information, not merely a reasoner operating on pre-digested facts.

4. Implementing RARE: From Theory to (Conceptual) Code

Code Availability Status

A crucial aspect for understanding and replicating the RARE framework is access to its source code implementation. The paper's abstract ¹ and related web entries on platforms like PapersWithCode ⁵ point to an official GitHub repository: https://github.com/Open-DataFlow/RARE. This repository is associated with the Open-DataFlow group at Peking University's DataLab, which hosts other data-centric ML projects.⁹ The authors of the paper have affiliations with institutions involved in this group, including Peking University, Shanghai Jiao Tong University, the Institute for Advanced Algorithms Research (IAAR) in Shanghai, and OriginHub Technology.¹

However, direct investigation reveals that at the time of this analysis, the specified RARE repository was inaccessible ¹⁶ or explicitly marked as a "Work in progress" with "No code implementations yet" available.⁵ Therefore, a direct analysis of the official implementation is not possible. The following sections will outline a conceptual implementation sketch based on the descriptions provided in the paper ¹, highlighting how the theoretical concepts might translate into practical code components.

Conceptual Implementation Sketch

Implementing the RARE framework would likely involve the following key stages:

A. Data Preparation & Retrieval

1. Dataset: A training dataset consisting of pairs (x, y) is required, where x is the input instruction/query, and y is the target response. Ideally, y should be structured as a chain-of-thought that allows for the identification (even if implicitly) of knowledge components (k) and reasoning steps (r).

2. Knowledge Base: An external corpus of domain-specific knowledge (e.g., text documents, database records) must be established.

3. Retriever: A retrieval function or module, Retriever(query), needs to be implemented. This function takes an input query x and returns a set of relevant knowledge chunks R(x) from the knowledge base. This could range from traditional methods like BM25 to sophisticated dense vector retrieval models (e.g., based on Sentence-BERT or custom embeddings).

4. Pre-computation (Optional but Recommended): To streamline training, the retrieved knowledge R(x) for every input x in the training dataset can be pre-computed and stored. This avoids performing costly retrieval operations within the training loop itself.

B. Prompt Engineering for RARE Training

1. Template Design: A consistent prompt template must be designed to structure the input for the LLM during training. This template needs to integrate the original instruction x and the retrieved knowledge R(x).

2. Example Structure: A plausible template could look like this:
Retrieved Knowledge:

...

Instruction:
[Content of x]

Response:
```
The model is then trained to generate the target response `y` following this combined input.

C. Model Training Loop

1. Base Model: Select a pre-trained LLM (e.g., Llama-3.1-8B as mentioned in the paper ¹) as the starting point.

2. Fine-tuning Setup: Utilize a standard LLM fine-tuning framework (e.g., Hugging Face's transformers library with PyTorch or TensorFlow).

3. Input Formatting: In each training step, format the input using the RARE prompt template defined in (B), combining the current batch's instructions x with their corresponding pre-computed retrieved knowledge R(x).

4. Forward Pass: Feed the RARE-formatted prompt into the LLM to obtain the predicted output logits.

5. Loss Calculation (The Core Challenge): This is where the implementation must attempt to capture the essence of LRARE.¹

○ Option 1 (Simplest Approximation): Use the standard cross-entropy loss between the model's predicted sequence y' and the target sequence y, calculated based on the RARE-augmented input. This implicitly encourages the model to use R(x) but doesn't explicitly decompose the loss as described mathematically.

○ Option 2 (Closer to Theory): If the target responses y are annotated to distinguish knowledge tokens (k) from reasoning tokens (r), one could implement a custom loss function. This might involve calculating separate cross-entropy losses for k-tokens and r-tokens (potentially using loss masking) and summing them. This attempts to mirror the two terms in the LRARE formula: log pθ(k|x, R(x)) and log pθ(r|x, R(x), k). The conditioning on k in the second term might be handled using teacher-forcing with the ground-truth k or by dynamically using the model's generated k' (though this adds complexity).

6. Backward Pass & Optimization: Compute gradients based on the chosen loss function and update the model parameters θ using an optimizer (e.g., AdamW).

D. Inference Function

1. Input: Receive a user query x_query.

2. Retrieve: Use the Retriever function to fetch relevant knowledge R(x_query) from the external knowledge base.

3. Format Prompt: Construct the input prompt using the exact same RARE template used during training, combining x_query and R(x_query).

4. Generate: Feed the formatted prompt to the fine-tuned RARE model (θ).

5. Output: Decode the model's generated sequence to produce the final response y_pred.

Connecting Conceptual Code to Math

● The Prompt Engineering step (B) directly implements the crucial conditioning on R(x) that distinguishes pRARE and LRARE from their vanilla counterparts.¹ It ensures the model learns in the presence of external context.

● The Loss Calculation step (C) is the practical attempt to minimize the negative log-likelihood defined by LRARE.¹ The fidelity to the mathematical formula depends heavily on the implementation choice (Option 1 vs. Option 2 or more sophisticated methods).

● The Inference Function (D) ensures symmetry between training and inference by providing the necessary R(x) context at runtime. This allows the model to apply the reasoning patterns learned during training, consistent with the overall RARE paradigm.¹

Key Variables/Parameters in Implementation

● θ: The learnable parameters of the LLM.

● x: Input instruction/query string.

● y: Target response string (ideally structured or annotated).

● k: Conceptual knowledge part of y.

● r: Conceptual reasoning part of y.

● R(x): A list or concatenation of retrieved knowledge strings/chunks relevant to x.

● Retriever: The retrieval module/function.

● Prompt Template: The f-string or template used to combine x and R(x).

The practical realization of the LRARE loss function remains the most significant uncertainty without access to the original code.¹ A naive implementation using standard cross-entropy loss on the augmented input might capture some benefits but potentially misses the finer-grained optimization suggested by the paper's mathematical decomposition. Achieving that specific decomposition likely requires more complex techniques, such as sequence tagging to identify knowledge vs. reasoning tokens or potentially a multi-task learning setup within the decoder architecture.

Furthermore, the RARE training process could introduce computational overhead compared to standard fine-tuning. Incorporating retrieved text R(x) into the input prompt increases the overall sequence length fed to the model.¹ Since the computational cost of attention mechanisms in Transformers scales significantly (often quadratically) with sequence length, processing these longer RARE-augmented inputs during training might require more computation time and memory per sample, even if the ultimate goal is to enable smaller models. This represents a potential trade-off between model size efficiency and training resource requirements.

5. Discussion and Conclusion: Implications of RARE

Summary of RARE

RARE (Retrieval-Augmented Reasoning Modeling) presents a novel training paradigm for developing domain-specific intelligence in LLMs.¹ Its core philosophy involves decoupling knowledge storage from reasoning optimization by externalizing domain knowledge and using retrieval to augment the training process itself.¹ By injecting retrieved context R(x) into training prompts, RARE shifts the learning objective from memorization towards the contextualized application of knowledge. This is mathematically formalized through a modified loss function, LRARE, which penalizes failures in understanding and applying retrieved information, rather than failures in recalling information from parameters.¹

Reported Performance and Potential Advantages

The paper reports compelling results, claiming that lightweight models (e.g., Llama-3.1-8B, Qwen-2.5-7B) trained using the RARE framework can achieve state-of-the-art performance on domain-specific benchmarks, particularly in the medical domain.¹ Notably, these smaller RARE-trained models are reported to outperform not only standard large-scale models like GPT-4 but also retrieval-augmented versions of GPT-4 and models distilled from strong counterparts like Deepseek-R1.¹ This suggests several potential advantages:

1. Enhanced Domain Reasoning: By focusing training on applying knowledge in context, RARE may cultivate more robust and accurate reasoning abilities specific to the target domain.

2. Parameter Efficiency: The approach enables smaller, more resource-constrained models to achieve high performance, potentially lowering deployment costs and increasing accessibility.¹

3. Improved Grounding & Reduced Hallucination: Relying on explicitly retrieved facts during both training and inference could make model outputs more grounded in verifiable information, potentially mitigating knowledge hallucination.¹

4. Simplified Knowledge Updates: Domain knowledge can be updated by modifying the external knowledge base without needing to retrain the core reasoning model, offering greater maintainability.¹

Potential Challenges & Open Questions

Despite the promising results, several challenges and open questions remain regarding the RARE framework:

1. Retriever Dependency: The quality of the RARE-trained model is fundamentally tied to the quality and relevance of the retriever (R(x)) used during training. Poor retrieval could lead to flawed reasoning patterns being learned.

2. Loss Function Implementation: The precise implementation of the decomposed LRARE loss function is unclear without the code and might be complex to replicate accurately.¹ Simpler approximations may not fully capture the intended optimization dynamics.

3. Data Structuring: Creating or annotating training data y to clearly distinguish knowledge (k) and reasoning (r) components for the loss calculation could be labor-intensive and non-trivial.¹

4. Training Cost: Incorporating retrieved text can significantly increase input sequence lengths, potentially increasing the computational cost per training step, despite the goal of using smaller models.

5. Generalizability: The effectiveness of RARE needs validation across a wider range of domains, particularly those with less structured knowledge or requiring more abstract or multi-hop reasoning.

6. Handling Conflicting Information: How the RARE framework manages inconsistencies or contradictions within the retrieved knowledge R(x) is an important practical consideration not detailed in the introductory materials.

Future Directions and Broader Implications

If the RARE approach proves robust, scalable, and its performance benefits are widely replicable, it could significantly impact the development of specialized AI systems. It points towards a future characterized by more modular AI architectures, where compact, highly optimized reasoning engines are dynamically coupled with large, easily maintainable external knowledge bases.¹ This separation of concerns contrasts sharply with the trend towards monolithic, ever-larger LLMs attempting to internalize all knowledge and skills. Such modularity could enhance system adaptability, trustworthiness, and long-term maintenance.

Furthermore, RARE's reported success with smaller models challenges the prevailing narrative that performance gains in complex tasks necessitate relentless scaling of model size.¹ For domain-specific applications where reasoning grounded in external facts is key, RARE suggests that targeted training paradigms focused on how to use information may be a more efficient path to high performance than simply increasing parameter count. This could democratize the development of expert-level AI by reducing reliance on massive computational resources.

Final Thoughts for Implementation and Communication

For those aiming to understand, implement, or communicate the RARE framework (e.g., via blog posts or notebooks), the core conceptual shift—from memorization to retrieval-augmented reasoning during training—is the most critical aspect to convey. Explaining the intuition behind the mathematical formulations ¹ and the "open-book" analogy ¹ can effectively illustrate the paradigm shift. Given the current unavailability of the official code ⁵, any implementation attempts should start conceptually, perhaps experimenting with simpler approximations of the RARE training setup (e.g., using standard cross-entropy loss on RARE-formatted prompts) while clearly acknowledging the uncertainties surrounding the exact loss implementation and data structuring used in the original work. The focus should be on exploring the potential of training models to reason effectively with provided context, which lies at the heart of the RARE proposal.

Works cited

1. arxiv.org, accessed on April 9, 2025, https://arxiv.org/abs/2503.23513

2. RARE: Retrieval-Augmented Reasoning Modeling - arXiv, accessed on April 9, 2025, https://arxiv.org/html/2503.23513v1

3. RARE: Retrieval-Augmented Reasoning Modeling - arXiv, accessed on April 9, 2025, https://arxiv.org/pdf/2503.23513

4. RARE (Retrieval-Augmented Reasoning Modeling): A Scalable AI Framework for Domain-Specific Reasoning in Lightweight Language Models - MarkTechPost, accessed on April 9, 2025, https://www.marktechpost.com/2025/04/07/rare-retrieval-augmented-reasoning-modeling-a-scalable-ai-framework-for-domain-specific-reasoning-in-lightweight-language-models/

5. RARE: Retrieval-Augmented Reasoning Modeling - Papers With Code, accessed on April 9, 2025, https://paperswithcode.com/paper/rare-retrieval-augmented-reasoning-modeling

6. Yu Wang - CatalyzeX, accessed on April 9, 2025, https://www.catalyzex.com/author/Yu%20Wang

7. Zhe Chen - CatalyzeX, accessed on April 9, 2025, https://www.catalyzex.com/author/Zhe%20Chen

8. Papers with Code - Feiyu Xiong, accessed on April 9, 2025, https://paperswithcode.com/search?q=author%3AFeiyu+Xiong&order_by=stars

9. Open-DataFlow - GitHub, accessed on April 9, 2025, https://github.com/Open-DataFlow

10. Open-DataFlow/Dataflow-Gen - GitHub, accessed on April 9, 2025, https://github.com/Open-DataFlow/Dataflow-Gen

11. DataFlow-Eval-Process/process.py at main - GitHub, accessed on April 9, 2025, https://github.com/Open-DataFlow/DataFlow-Eval-Process/blob/main/process.py

12. Open-DataFlow/DataFlow-Eval-Process - GitHub, accessed on April 9, 2025, https://github.com/Open-DataFlow/DataFlow-Eval-Process

13. MaintainCoder: Maintainable Code Generation Under Dynamic Requirements - arXiv, accessed on April 9, 2025, https://arxiv.org/html/2503.24260v1

14. wentao.zhang@pku.edu.cn, accessed on April 9, 2025, https://zwt233.github.io/

15. Yu Wang (SJTU), accessed on April 9, 2025, https://yuwangsjtu.github.io/

16. accessed on January 1, 1970, https://github.com/Open-DataFlow/RARE

Katharguppe Notes

Thursday, April 10, 2025

How the Model Context Protocol (MCP) Extends the Power of LLMs

Wednesday, April 09, 2025