Skip to content

Truth Ratio implementation question #160

@Yohjishong

Description

@Yohjishong

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task
  • My own task or dataset (give details below)

Reproduction

The calculation of Truth Ratio in the code is not same as claimed in the paper. The paper claims that it has been changed into the $\text{Truth Ratio}=\frac{p(y^\text{para}\mid x)}{p(y^\text{para }\mid x)+p(y^\text{pert}\mid x)}$, but it is still $\mathrm{TR}=\frac{P^{\mathrm{false}}}{P^{\mathrm{true}}}$

Image

Full function(src/evals/metrics/memorization.py)

def truth_ratio(model, **kwargs):
    """Compute the truth ratio, aggregating false/true scores, and
    return the aggregated value."""

    # Forget data: It is better if false and true are equally likely,
    # i.e., tr=false/true is closest to 1.
    def closer_to_1_better(arr):
        return np.mean(np.minimum(arr, 1 / (arr + 1e-10)))

    # Non-forget data: It is better if tr=false/true is lower, i.e.,
    # 1-tr is higher.
    def true_better(arr):
        return np.mean(np.maximum(0, 1 - arr))

    if kwargs["aggregator"] == "closer_to_1_better":
        aggregator = closer_to_1_better
    elif kwargs["aggregator"] == "true_better":
        aggregator = true_better
    else:
        raise ValueError(f"Invalid truth ratio aggregator: {kwargs['aggregator']}")

    correct_answer_results = kwargs["pre_compute"]["correct"]["value_by_index"]
    wrong_answer_results = kwargs["pre_compute"]["wrong"]["value_by_index"]

    correct_indices = list(correct_answer_results.keys())
    wrong_indices = list(wrong_answer_results.keys())
    assert correct_indices == wrong_indices

    # Filter out None values from both correct and wrong answers
    filtered_indices = [
        idx
        for idx in correct_indices
        if correct_answer_results[idx] is not None
        and wrong_answer_results[idx] is not None
    ]
    correct_avg_losses = [
        correct_answer_results[idx]["avg_loss"] for idx in filtered_indices
    ]
    wrong_avg_losses = [
        wrong_answer_results[idx]["avg_loss"] for idx in filtered_indices
    ]

    correct_avg_losses = aggregate_to_1D(np.array(correct_avg_losses))
    wrong_avg_losses = aggregate_to_1D(np.array(wrong_avg_losses))

    correct_prob = np.exp(-correct_avg_losses)
    wrong_prob = np.exp(-wrong_avg_losses)

    truth_ratios = wrong_prob / (correct_prob + 1e-10)
    value_by_index = dict(
        zip(correct_indices, [{"score": val} for val in truth_ratios])
    )
    truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
    forget_tr_avg = aggregator(truth_ratio_stats)
    return {"agg_value": forget_tr_avg, "value_by_index": value_by_index}

Place of doubt:

    truth_ratios = wrong_prob / (correct_prob + 1e-10)
    value_by_index = dict(
        zip(correct_indices, [{"score": val} for val in truth_ratios])
    )
    truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
    forget_tr_avg = aggregator(truth_ratio_stats)

My correction:

    truth_ratios = correct_prob / (correct_prob + wrong_prob + 1e-10)
    # Store per-index scores
    value_by_index = dict(
        zip(correct_indices, [{"score": val} for val in truth_ratios])
    )

    # Aggregate: mean of all TRs (higher is better)
    truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
    forget_tr_avg = np.mean(truth_ratio_stats)

Expected behavior

Hello! Thanks for your work on the repository, it's great and very valuable!
I have a question on the evaluation on the TOFU dataset, with the metric "Truth Ratio".
The paper claims that it should be changed into the new ones, but it still use the function in the TOFU.
If I misunderstand your intention, please let me know, too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions