roboto.analytics.signal_similarity#

Submodules#

Package Contents#

class roboto.analytics.signal_similarity.Match#

A subsequence of a target signal that is similar to a query signal.

context: MatchContext#: Correlate a matched subsequence back to its source.

distance: float#: Unitless measure of similarity between a query signal and the subsequence of the target signal this Match represents. A smaller distance relative to a larger distance indicates a “closer” match.

end_idx: int#: The end index in the target signal of this match.

end_time: int#: The end time in the target signal of this match.

start_idx: int#: The start index in the target signal of this match.

start_time: int#: The start time in the target signal of this match.

subsequence: pandas.DataFrame#: The subsequence of the target signal this Match represents. It is equivalent to target[start_idx:end_idx].

to_event(name='Signal Similarity Match Result', caller_org_id=None, roboto_client=None)#

Create a Roboto Platform event out of this similarity match result.

Parameters:

name (str)
caller_org_id (Optional[str])
roboto_client (Optional[roboto.http.RobotoClient])

Return type:

roboto.domain.events.Event

class roboto.analytics.signal_similarity.MatchContext#

Correlate a matched subsequence back to its source.

dataset_id: str | None = None#

file_id: str | None = None#

message_paths: collections.abc.Sequence[str]#

topic_id: str#

topic_name: str#

roboto.analytics.signal_similarity.find_similar_signals(needle, haystack, *, max_distance=None, max_matches_per_topic=None, normalize=False)#

Find subsequences of topic data (from haystack) that are similar to needle.

If needle is a dataframe with a single, non-index column, single-dimensional similarity search will be performed. If it instead has multiple non-index columns, multi-dimensional search will be performed.

Even if there is no true similarity between the query signal and a topic’s data, this will always return at least one Match. Matches are expected to improve in quality as the topic data is more relevant to the query. Matches are returned sorted in ascending order by their distance, with the best matches (lowest distance) first.

If max_distance is provided, only matches with a distance less than max_distance will be returned. Given distances computed against all comparison windows in the target, this defaults to the maximum of:

the minimum distance

the mean distance minus two standard deviations

Use max_matches_per_topic to limit the number of match results contributed by a single topic.

If normalize is True, values will be projected to the unit scale before matching. This is useful if you want to match windows of the target signal regardless of scale. For example, a query sequence of [1., 2., 3.] will perfectly match (distance == 0) the target [1000., 2000., 3000.] if normalize is True, but would have a distance of nearly 3800 if normalize is False.

Parameters:

needle (pandas.DataFrame)
haystack (collections.abc.Iterable[roboto.domain.topics.Topic])
max_distance (Optional[float])
max_matches_per_topic (Optional[int])
normalize (bool)

Return type:

collections.abc.Sequence[roboto.analytics.signal_similarity.match.Match]