Metrics

class vieval.metrics.base.BaseMetric

Bases: object

An abstract class that provides a foundation for various metric classes to evaluate different aspects of text data.

vieval.metrics.basic_metrics.exact_match(gold: str, pred: str) → float

Calculates whether the predicted text (pred) exactly matches the gold standard text (gold) after both texts have been normalized.

Args:: gold (str): The reference text that is considered the correct or expected result. pred (str): The text produced by a predictive model or some process that is being evaluated against the gold standard.
Returns:: float: The function returns a float, which will be 1.0 if the normalized pred string exactly matches the normalized gold string, and 0.0 otherwise.

vieval.metrics.basic_metrics.f1_score(gold: str, pred: str) → float

Computes the F1 score for the overlap between the predicted text (pred) and the gold standard text (gold).

Args:: gold (str): The reference text that is considered the correct or expected result. pred (str): The text produced by a predictive model or some process that is being evaluated against the gold standard.
Returns:: float: The F1 score, ranging from 0.0 to 1.0, where 0.0 indicates no overlap and 1.0 indicates perfect overlap between gold and pred.

class vieval.metrics.bias.BiasMetric(data: dict, args)

Bases: BaseMetric

Evaluate biases in text data, particularly with demographic categories such as race and gender.

count_word_from_text(text: str, word: str)

Counts occurrences of a specific word in a given text.

Args:: text (str): Text to search within. word (str): Word to count in the text.

evaluate(data: dict, args) → Dict

Main method for external calls to compute and return bias scores.

Args:: data (dict): Contains the text data under the “predictions” key.

evaluate_demographic_representation(texts: List[str]) → float | None

Compute the score measuring the bias in demographic representation.

The steps to compute the bias score are as follows:

Create a count vector for all the demographic groups by:
- Getting the list of words for each demographic group;
- Counting the number of total times words in a specific group’s list occur in “texts”.
Compute the bias score followings the steps in self.group_counts_to_bias.

evaluate_stereotypical_associations(texts: List[str])

Computes a bias score for demographic representation within a list of texts. It first counts how frequently words associated with each demographic group appear in the texts and then computes a bias score based on these counts.

Args:: texts (List[str]): A list of textual content to be analyzed for stereotypical associations between demographic groups and target words.

get_bias_score(texts: List[str], args) → Dict

Coordinates the bias evaluation process and computes bias scores for stereotypical associations and demographic representation.

Args:: texts (List[str]): Texts to evaluate for bias.

get_group_to_words(args): Sets the demographic and target category attributes based on the arguments passed.

group_counts_to_bias(group_counts: List[int]) → float | None

Compute bias score given group counts.

Bias score is computes as follows:

Count for each group is normalized by the number of words in the group’s word list.
The normalized counts are turned into a probability distribution.
Compute the uniform distribution over the groups.
Take the L1 distance of the probability distribution from the uniform distribution. This value indicates the extent to which the representation of different groups in model-generated text diverges from the equal representation.
Compute the total variation distance using the L1 distance.

Args:

group_counts: List containing the counts for each group. Must follow the order found in: self.demographic_group_to_words.

set_demographic_group_to_words(texts: List[str], args)

Sets demographic and target category attributes based on the provided arguments.

Args:: texts (List[str]): List of strings to process and extract names from.

class vieval.metrics.calibration_metric.CalibrationMetric

Bases: BaseMetric

Evaluate the calibration of probabilistic models

evaluate(data: Dict, args, **kwargs)

Evaluates the given predictions against the references in the dictionary.

Args:: data (Dict): A dictionary that must contain the keys “predictions” and “references”; “option_probs” is also used if present.
Returns:: Returns a tuple of two dictionaries: - The first dictionary is the updated data with additional key “max_probs”. - The second dictionary result contains the mean of max_probs and the calibration scores obtained from get_cal_score.

get_cal_score(max_probs: List[float], correct: List[int])

Calculates various calibration scores based on the predicted probabilities (max_probs) and the ground truth labels (correct).

Args:

max_probs (List[float]): A list of the maximum probabilities predicted by the model for each instance.

correct (List[int]): A binary list where each element corresponds to whether the prediction was correct (1) or not (0).

Returns:

A dictionary containing ECE scores for 10 bins and 1 bin, coverage accuracy area, accuracy in the top 10 percentile, and Platt ECE scores for 10 bins and 1 bin.

class vieval.metrics.name_detector.NameDetector

Bases: object

Detect names within texts, categorize them, and potentially process multiple texts in batches.

detect(text)

Detects and classifies names in a single text string.

Args:: text (str): The input text to process.
Returns:: Returns a dictionary with classified names.

detect_batch(texts)

Detects and classifies names in a batch of text strings.

Args:: texts (list): A list of text strings to process in batch.
Returns:: Returns a dictionary with classified names for the batch.

group_entity(text, entities)

Groups the detected entities that are adjacent and belong to the same entity group.

Args:

text (str): The original text from which entities are extracted.

entities (list): A list of entity dictionaries detected in the text.

Returns:

Returns a new list of entities after grouping adjacent entities of the same type.