roboto.analytics.signal_similarity#
Submodules#
Package Contents#
- class roboto.analytics.signal_similarity.Match#
A subsequence of a target signal that is similar to a query signal.
- context: MatchContext#
Correlate a matched subsequence back to its source.
- distance: float#
Unitless measure of similarity between a query signal and the subsequence of the target signal this Match represents. A smaller distance relative to a larger distance indicates a “closer” match.
- end_idx: int#
The end index in the target signal of this match.
- end_time: int#
The end time in the target signal of this match.
- start_idx: int#
The start index in the target signal of this match.
- start_time: int#
The start time in the target signal of this match.
- subsequence: pandas.DataFrame#
The subsequence of the target signal this Match represents. It is equivalent to
target[start_idx:end_idx]
.
- to_event(name='Signal Similarity Match Result', caller_org_id=None, roboto_client=None)#
Create a Roboto Platform event out of this similarity match result.
- Parameters:
name (str)
caller_org_id (Optional[str])
roboto_client (Optional[roboto.http.RobotoClient])
- Return type:
- class roboto.analytics.signal_similarity.MatchContext#
Correlate a matched subsequence back to its source.
- dataset_id: str | None = None#
- file_id: str | None = None#
- message_paths: collections.abc.Sequence[str]#
- topic_id: str#
- topic_name: str#
- roboto.analytics.signal_similarity.find_similar_signals(needle, haystack, *, max_distance=None, max_matches_per_topic=None, normalize=False)#
Find subsequences of topic data (from
haystack
) that are similar toneedle
.If
needle
is a dataframe with a single, non-index column, single-dimensional similarity search will be performed. If it instead has multiple non-index columns, multi-dimensional search will be performed.Even if there is no true similarity between the query signal and a topic’s data, this will always return at least one
Match
. Matches are expected to improve in quality as the topic data is more relevant to the query. Matches are returned sorted in ascending order by their distance, with the best matches (lowest distance) first.If
max_distance
is provided, only matches with a distance less thanmax_distance
will be returned. Given distances computed against all comparison windows in the target, this defaults to the maximum of:the minimum distance
the mean distance minus two standard deviations
Use
max_matches_per_topic
to limit the number of match results contributed by a single topic.If
normalize
is True, values will be projected to the unit scale before matching. This is useful if you want to match windows of the target signal regardless of scale. For example, a query sequence of[1., 2., 3.]
will perfectly match (distance == 0) the target[1000., 2000., 3000.]
ifnormalize
is True, but would have a distance of nearly 3800 ifnormalize
is False.- Parameters:
needle (pandas.DataFrame)
haystack (collections.abc.Iterable[roboto.domain.topics.Topic])
max_distance (Optional[float])
max_matches_per_topic (Optional[int])
normalize (bool)
- Return type:
collections.abc.Sequence[roboto.analytics.signal_similarity.match.Match]