Truth Ratio implementation question

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task
- [ ] My own task or dataset (give details below)

### Reproduction

The calculation of Truth Ratio in the code is not same as claimed in the paper. The paper claims that it has been changed into the $\text{Truth Ratio}=\frac{p(y^\text{para}\mid x)}{p(y^\text{para }\mid x)+p(y^\text{pert}\mid x)}$, but it is still $\mathrm{TR}=\frac{P^{\mathrm{false}}}{P^{\mathrm{true}}}$

<img width="893" height="255" alt="Image" src="https://github.com/user-attachments/assets/3a70fa78-c6c1-40e3-94f9-7857ae5831cf" />

## Full function(src/evals/metrics/memorization.py)
```python
def truth_ratio(model, **kwargs):
    """Compute the truth ratio, aggregating false/true scores, and
    return the aggregated value."""

    # Forget data: It is better if false and true are equally likely,
    # i.e., tr=false/true is closest to 1.
    def closer_to_1_better(arr):
        return np.mean(np.minimum(arr, 1 / (arr + 1e-10)))

    # Non-forget data: It is better if tr=false/true is lower, i.e.,
    # 1-tr is higher.
    def true_better(arr):
        return np.mean(np.maximum(0, 1 - arr))

    if kwargs["aggregator"] == "closer_to_1_better":
        aggregator = closer_to_1_better
    elif kwargs["aggregator"] == "true_better":
        aggregator = true_better
    else:
        raise ValueError(f"Invalid truth ratio aggregator: {kwargs['aggregator']}")

    correct_answer_results = kwargs["pre_compute"]["correct"]["value_by_index"]
    wrong_answer_results = kwargs["pre_compute"]["wrong"]["value_by_index"]

    correct_indices = list(correct_answer_results.keys())
    wrong_indices = list(wrong_answer_results.keys())
    assert correct_indices == wrong_indices

    # Filter out None values from both correct and wrong answers
    filtered_indices = [
        idx
        for idx in correct_indices
        if correct_answer_results[idx] is not None
        and wrong_answer_results[idx] is not None
    ]
    correct_avg_losses = [
        correct_answer_results[idx]["avg_loss"] for idx in filtered_indices
    ]
    wrong_avg_losses = [
        wrong_answer_results[idx]["avg_loss"] for idx in filtered_indices
    ]

    correct_avg_losses = aggregate_to_1D(np.array(correct_avg_losses))
    wrong_avg_losses = aggregate_to_1D(np.array(wrong_avg_losses))

    correct_prob = np.exp(-correct_avg_losses)
    wrong_prob = np.exp(-wrong_avg_losses)

    truth_ratios = wrong_prob / (correct_prob + 1e-10)
    value_by_index = dict(
        zip(correct_indices, [{"score": val} for val in truth_ratios])
    )
    truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
    forget_tr_avg = aggregator(truth_ratio_stats)
    return {"agg_value": forget_tr_avg, "value_by_index": value_by_index}
```
## Place of doubt:
``` python
    truth_ratios = wrong_prob / (correct_prob + 1e-10)
    value_by_index = dict(
        zip(correct_indices, [{"score": val} for val in truth_ratios])
    )
    truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
    forget_tr_avg = aggregator(truth_ratio_stats)
```
## My correction:
``` python
    truth_ratios = correct_prob / (correct_prob + wrong_prob + 1e-10)
    # Store per-index scores
    value_by_index = dict(
        zip(correct_indices, [{"score": val} for val in truth_ratios])
    )

    # Aggregate: mean of all TRs (higher is better)
    truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
    forget_tr_avg = np.mean(truth_ratio_stats)
```


### Expected behavior

Hello! Thanks for your work on the repository, it's great and very valuable!
I have a question on the evaluation on the TOFU dataset, with the metric "Truth Ratio".
The paper claims that it should be changed into the new ones, but it still use the function in the TOFU.
If I misunderstand your intention, please let me know, too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Truth Ratio implementation question #160

Information

Tasks

Reproduction

Full function(src/evals/metrics/memorization.py)

Place of doubt:

My correction:

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Truth Ratio implementation question #160

Description

Information

Tasks

Reproduction

Full function(src/evals/metrics/memorization.py)

Place of doubt:

My correction:

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions