Class Sparstogram

A histogram that maintains a complete or sparse approximation of the data frequency. The representation will be complete if the number of distinct values is less than or equal to the maxCentroids. Otherwise, the values will be compressed to a smaller number of centroids in a way that minimizes the loss. The histogram can also maintain a set of quantiles, which are maintained with each new value (e.g. median can be efficiently maintained). The structure is adaptive in range and resolution, and can be used for streaming data. The histogram is roughly based on the ideas in the paper "Approximate Frequency Counts over Data Streams" by Gurmeet Singh Manku and Rajeev Motwani. The implementation uses B+Trees to efficiently maintain the centroids and losses, which is a self-balancing structure; so this should be able to scale efficiently.

Index

Constructors

constructor

new Sparstogram(maxCentroids: number, markers?: number[]): Sparstogram
Parameters
- maxCentroids: number
  The initial maximum number of centroids to store. This must be at least 1. This can be changed at any time. Note that nothing is allocated by this; it represents the point at which point centroids will begin to be collapsed.
- Optionalmarkers: number[]
  Optional persisted quantiles to be maintained with each new value. These don't have to be in order, but they must be between 0 and 1. The quantiles will be maintained with each new value, and can be retrieved with atMarker.
Returns Sparstogram
- Defined in sparstogram.ts:71

Properties

`Optional`markers

markers?: number[]

Optional persisted quantiles to be maintained with each new value. These don't have to be in order, but they must be between 0 and 1. The quantiles will be maintained with each new value, and can be retrieved with atMarker.

Accessors

centroidCount

get centroidCount(): number
The current number of distinct values (centroids) in the histogram. This may be fewer than the number of distinct values added if maxCentroids is less than the number of distinct values added.

Returns number
- Defined in sparstogram.ts:117

count

get count(): number
The current total count accumulated in the histogram (sum of counts of all values)

Returns number
- Defined in sparstogram.ts:111

maxCentroids

get maxCentroids(): number
The current maximum number of centroids to store before compression.

Returns number
- Defined in sparstogram.ts:96
set maxCentroids(value: number): void
Parameters
- value: number
Returns void
- Defined in sparstogram.ts:100

Methods

add

add(value: number): number
Adds a value to the histogram. If you want to dynamically limit loss, monitor the returned loss and adjust maxCentroids accordingly.
Parameters
- value: number
Returns number
The loss incurred by compression, if any
- Defined in sparstogram.ts:125

append

append(...centroids: Centroid[]): number
Adds one or more centroids to the histogram. Adds all centroids before compressing to incur the least loss. If you want to reduce memory usage, or monitor loss, use an iterator with sequential calls to this rather than this method.
Parameters
- ...centroids: Centroid[]
Returns number
The maximal loss incurred by compression, if any
- Defined in sparstogram.ts:139

ascending

ascending(criteria?: Criteria): IterableIterator<Centroid>
Returns an iterator for the centroids in the histogram in ascending order
Parameters
- Optionalcriteria: Criteria
  If specified, the iterator will start at the centroid at the given marker index or value; otherwise it will start at the first centroid
Returns IterableIterator<Centroid>
- Defined in sparstogram.ts:327

countAt

countAt(value: number): number
Returns the interpolated count of values at a given value in the histogram
Parameters
- value: number
  The value to find the count for
Returns number
The count of the value in the histogram (interpolated, but in whole counts)
- Defined in sparstogram.ts:234

descending

descending(criteria?: Criteria): IterableIterator<Centroid>
Returns an iterator for the centroids in the histogram in descending order
Parameters
- Optionalcriteria: Criteria
  If specified, the iterator will start at the centroid at the given marker index or value; otherwise it will start at the last centroid
Returns IterableIterator<Centroid>
- Defined in sparstogram.ts:337

markerAt

markerAt(index: number): Quantile
Returns the quantile marker at a given index, as given by markers at construction (0 = median, 1 = lower quartile, 2 = upper quartile, etc.)
Parameters
- index: number
  The index of the marker to find the value for
Returns Quantile
The Quartile information at the given marker in the histogram
- Defined in sparstogram.ts:269

mergeFrom

mergeFrom(other: Sparstogram): void
Merges all the centroids from another histogram into this one. Adds all centroids before compressing to incur the least loss. If you want to reduce memory usage, or monitor loss, use an iterator with sequential calls to append rather than this method.
Parameters
- other: Sparstogram
Returns void
- Defined in sparstogram.ts:161

peaks

peaks(smoothing?: number): IterableIterator<Peak>
Computes the peaks in the histogram. This is a simple peak detection algorithm that finds local maxima in the histogram for frequency of cluster detection. A peak is defined as a local maximum where the count is greater than the counts of the centroids on either side. If there are too few centroids to allow for the given smoothing in the histogram, no peaks will be returned
Parameters
- smoothing: number = 3
  The number of centroids in each "direction" to consider for smoothing the peaks (default 3)
Returns IterableIterator<Peak>
Iterator for the computed Peak entries in the histogram
- Defined in sparstogram.ts:286

quantileAt

quantileAt(quantile: number): Quantile
Returns the value at a given quantile in the histogram (0.5 = median)
Parameters
- quantile: number
  The quantile (0-1) to find the value for
Returns Quantile
The centroid at the given quantile in the histogram
- Defined in sparstogram.ts:259

rankAt

rankAt(value: number): number
Returns the rank of a value in the histogram - count of all values less than or equal to the given value This method interpolates the rank between and beyond centroids based on the normal distribution of each centroid.
Parameters
- value: number
  The value to find the rank for.
Returns number
The rank of the value in the histogram
- Defined in sparstogram.ts:176

valueAt

valueAt(rank: number): Quantile
Returns the centroid at a given rank in the histogram (count / 2 = median)
Parameters
- rank: number
  The rank (accumulated count from start if positive, end if negative) to find the value for
Returns Quantile
The Quantile information at the given rank in the histogram - always returns a positive rank
- Defined in sparstogram.ts:209

Class Sparstogram

Index

Constructors

Properties

Accessors

Methods

Constructors

constructor

Parameters

Returns Sparstogram

Properties

Optionalmarkers

Accessors

centroidCount

Returns number

count

Returns number

maxCentroids

Returns number

Parameters

Returns void

Methods

add

Parameters

Returns number

append

Parameters

Returns number

ascending

Parameters

Returns IterableIterator<Centroid>

countAt

Parameters

Returns number

descending

Parameters

Returns IterableIterator<Centroid>

markerAt

Parameters

Returns Quantile

mergeFrom

Parameters

Returns void

peaks

Parameters

Returns IterableIterator<Peak>

quantileAt

Parameters

Returns Quantile

rankAt

Parameters

Returns number

valueAt

Parameters

Returns Quantile

Settings

On This Page

`Optional`markers