roboto.analytics.signal_similarity#
Submodules#
Package Contents#
- class roboto.analytics.signal_similarity.Match#
A subsequence of a target signal that is similar to a query signal.
- context: MatchContext#
Correlate a matched subsequence back to its source.
- distance: float#
Measure of similarity between a query signal and the subsequence of the target signal this Match represents. A smaller distance indicates a closer match.
In single-scale search (
scale=None) this is the raw z-normalised Euclidean distance produced by MASS, with range[0, 2·√N]whereNis the query length.In multi-scale search (
scaleprovided) this is multiplied by√N / √M(whereNis the original needle length andMis the resampled length at that scale step), projecting onto the same[0, 2·√N]range as single-scale search. This means amax_distancethreshold calibrated on single-scale search transfers directly to multi-scale search without adjustment.
- end_idx: int#
The end index in the target signal of this match.
- end_time: pandas.Timestamp#
The end time in the target signal of this match.
- scale: float = 1.0#
The time-scale factor at which this match was found.
A value of
1.0means the matched subsequence has the same length as the query. Values greater than1.0mean the matched subsequence is proportionally longer (the action occurred more slowly in the target than in the query). Values less than1.0mean the matched subsequence is proportionally shorter (the action occurred more quickly).This field is only meaningful when
scaleis passed tofind_similar_signals().
- start_idx: int#
The start index in the target signal of this match.
- start_time: pandas.Timestamp#
The start time in the target signal of this match.
- subsequence: pandas.DataFrame#
The subsequence of the target signal this Match represents. It is equivalent to
target[start_idx:end_idx].
- to_event(name='Signal Similarity Match Result', caller_org_id=None, roboto_client=None)#
Create a Roboto Platform event out of this similarity match result.
- Parameters:
name (str)
caller_org_id (Optional[str])
roboto_client (Optional[roboto.http.RobotoClient])
- Return type:
- class roboto.analytics.signal_similarity.MatchContext#
Correlate a matched subsequence back to its source.
- dataset_id: str | None = None#
- file_id: str | None = None#
- message_paths: collections.abc.Sequence[str]#
- topic_id: str#
- topic_name: str#
- class roboto.analytics.signal_similarity.Scale#
Configuration for rate-invariant (multi-scale) signal similarity search.
Searching across multiple scales finds a query pattern regardless of how quickly or slowly it unfolds in the target. For example, a robot lifting a cup in 1 second and the same robot lifting a cup in 3 seconds would both be found.
minandmaxare positive scale factors relative to the original query length. A scale of1.0corresponds to the original query length;2.0searches for target subsequences twice as long (action happened at half speed);0.5searches for subsequences half as long (action happened at double speed).While
Scale.any()provides a convenient wide-range preset, providing domain-informed bounds (e.g.Scale(min=0.5, max=3.0)for a motion that can happen between half and triple speed) will both improve match quality — by concentrating the search grid where matches are physically plausible — and reduce compute by avoiding unnecessary scale steps.- classmethod any()#
Well-known preset covering a wide range of speed ratios (0.1x to 10x).
- Return type:
- factors()#
Return a list of scale factors spanning the configured range.
- Return type:
list[float]
- max: float#
Maximum scale factor (must be >=
min).
- min: float#
Minimum scale factor (must be positive).
- spacing: Literal['log', 'linear'] = 'log'#
How to distribute scale values across the range.
"log"(default) — geometrically spaced; equal ratio between adjacent steps, which is more natural for speed ratios (e.g. 0.5x, 1x, 2x are equally spaced on a log scale)."linear"— linearly spaced.
- steps: int = 10#
Number of scale values to sample across the range.
- roboto.analytics.signal_similarity.find_similar_signals(needle, haystack, *, max_distance=None, max_matches_per_topic=None, normalize=False, scale=None)#
Find subsequences of topic data (from
haystack) that are similar toneedle.If
needleis a dataframe with a single, non-index column, single-dimensional similarity search will be performed. If it instead has multiple non-index columns, multi-dimensional search will be performed.Even if there is no true similarity between the query signal and a topic’s data, this will always return at least one
Match. Matches are expected to improve in quality as the topic data is more relevant to the query. Matches are returned sorted in ascending order by their distance, with the best matches (lowest distance) first.If
max_distanceis provided, only matches with a distance less thanmax_distancewill be returned. Given distances computed against all comparison windows in the target, this defaults to the maximum of:the minimum distance
the mean distance minus two standard deviations
Use
max_matches_per_topicto limit the number of match results contributed by a single topic.If
normalizeis True, values will be projected to the unit scale before matching. This makes the search amplitude-invariant (y-axis): it matches the shape of the signal regardless of its absolute magnitude. For example, a query sequence of[1., 2., 3.]will perfectly match (distance == 0) the target[1000., 2000., 3000.]ifnormalizeis True, but would have a distance of nearly 3800 ifnormalizeis False.DataFrames with string-typed columns are supported as long as all values are convertible to numeric types (e.g.,
"1.0"). Rows containing values that cannot be converted are dropped with a warning.Rate-invariant (multi-scale) search
Pass a
Scaleto make the search rate-invariant (x-axis / time axis): it finds the query pattern regardless of how quickly or slowly it unfolds in the target. For example, a robot lifting a cup in 1 second and the same robot lifting a cup in 3 seconds would both be found with an appropriatescale.See
Scalefor details on configuring the scale range, step count, and spacing. Well-known presets are available as class attributes, e.g.Scale.any().The scale at which each match was found is reported in
scale.When
scaleis used,max_matches_per_topicis applied to the combined results across all scales for a given topic, keeping the best (lowest-distance) matches.Distance normalisation in multi-scale mode
The raw z-normalised Euclidean distance produced by MASS has range
[0, 2·√M], whereMis the query length at a given scale. Without correction this biases results toward smaller scales (shorter queries always produce smaller raw distances).When
scaleis used, every distance is multiplied by√N / √Mbefore being stored indistance, whereNis the original needle length. This projects all scales onto the same[0, 2·√N]range — identical to the single-scale range — so that distances are directly comparable across scales and consistent with single-scale results. Amax_distancethreshold tuned on single-scale search can therefore be reused without adjustment in multi-scale search.Single-scale distances (
scale=None) are unchanged.- Parameters:
needle (pandas.DataFrame)
haystack (collections.abc.Iterable[roboto.domain.topics.Topic])
max_distance (Optional[float])
max_matches_per_topic (Optional[int])
normalize (bool)
scale (Optional[roboto.analytics.signal_similarity.match.Scale])
- Return type:
collections.abc.Sequence[roboto.analytics.signal_similarity.match.Match]