-
Notifications
You must be signed in to change notification settings - Fork 119
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task
- My own task or dataset (give details below)
Reproduction
The calculation of Truth Ratio in the code is not same as claimed in the paper. The paper claims that it has been changed into the
Full function(src/evals/metrics/memorization.py)
def truth_ratio(model, **kwargs):
"""Compute the truth ratio, aggregating false/true scores, and
return the aggregated value."""
# Forget data: It is better if false and true are equally likely,
# i.e., tr=false/true is closest to 1.
def closer_to_1_better(arr):
return np.mean(np.minimum(arr, 1 / (arr + 1e-10)))
# Non-forget data: It is better if tr=false/true is lower, i.e.,
# 1-tr is higher.
def true_better(arr):
return np.mean(np.maximum(0, 1 - arr))
if kwargs["aggregator"] == "closer_to_1_better":
aggregator = closer_to_1_better
elif kwargs["aggregator"] == "true_better":
aggregator = true_better
else:
raise ValueError(f"Invalid truth ratio aggregator: {kwargs['aggregator']}")
correct_answer_results = kwargs["pre_compute"]["correct"]["value_by_index"]
wrong_answer_results = kwargs["pre_compute"]["wrong"]["value_by_index"]
correct_indices = list(correct_answer_results.keys())
wrong_indices = list(wrong_answer_results.keys())
assert correct_indices == wrong_indices
# Filter out None values from both correct and wrong answers
filtered_indices = [
idx
for idx in correct_indices
if correct_answer_results[idx] is not None
and wrong_answer_results[idx] is not None
]
correct_avg_losses = [
correct_answer_results[idx]["avg_loss"] for idx in filtered_indices
]
wrong_avg_losses = [
wrong_answer_results[idx]["avg_loss"] for idx in filtered_indices
]
correct_avg_losses = aggregate_to_1D(np.array(correct_avg_losses))
wrong_avg_losses = aggregate_to_1D(np.array(wrong_avg_losses))
correct_prob = np.exp(-correct_avg_losses)
wrong_prob = np.exp(-wrong_avg_losses)
truth_ratios = wrong_prob / (correct_prob + 1e-10)
value_by_index = dict(
zip(correct_indices, [{"score": val} for val in truth_ratios])
)
truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
forget_tr_avg = aggregator(truth_ratio_stats)
return {"agg_value": forget_tr_avg, "value_by_index": value_by_index}Place of doubt:
truth_ratios = wrong_prob / (correct_prob + 1e-10)
value_by_index = dict(
zip(correct_indices, [{"score": val} for val in truth_ratios])
)
truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
forget_tr_avg = aggregator(truth_ratio_stats)My correction:
truth_ratios = correct_prob / (correct_prob + wrong_prob + 1e-10)
# Store per-index scores
value_by_index = dict(
zip(correct_indices, [{"score": val} for val in truth_ratios])
)
# Aggregate: mean of all TRs (higher is better)
truth_ratio_stats = np.array([evals["score"] for evals in value_by_index.values()])
forget_tr_avg = np.mean(truth_ratio_stats)Expected behavior
Hello! Thanks for your work on the repository, it's great and very valuable!
I have a question on the evaluation on the TOFU dataset, with the metric "Truth Ratio".
The paper claims that it should be changed into the new ones, but it still use the function in the TOFU.
If I misunderstand your intention, please let me know, too.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working