sparstogram
    Preparing search index...

    Class Sparstogram

    A histogram that maintains a complete or sparse approximation of the data frequency. The representation will be complete if the number of distinct values is less than or equal to the maxCentroids. Otherwise, the values will be compressed to a smaller number of centroids in a way that minimizes the loss. The histogram can also maintain a set of quantiles, which are maintained with each new value (e.g. median can be efficiently maintained). The structure is adaptive in range and resolution, and can be used for streaming data. The histogram is roughly based on the ideas in the paper "Approximate Frequency Counts over Data Streams" by Gurmeet Singh Manku and Rajeev Motwani. The implementation uses B+Trees to efficiently maintain the centroids and losses, which is a self-balancing structure; so this should be able to scale efficiently.

    Index

    Constructors

    • Parameters

      • maxCentroids: number

        The initial maximum number of centroids to store. This must be at least 1. This can be changed at any time. Note that nothing is allocated by this; it represents the point at which point centroids will begin to be collapsed.

      • Optionalmarkers: number[]

        Optional persisted quantiles to be maintained with each new value. These don't have to be in order, but they must be between 0 and 1. The quantiles will be maintained with each new value, and can be retrieved with atMarker.

      Returns Sparstogram

    Properties

    markers?: number[]

    Optional persisted quantiles to be maintained with each new value. These don't have to be in order, but they must be between 0 and 1. The quantiles will be maintained with each new value, and can be retrieved with atMarker.

    Accessors

    • get centroidCount(): number

      The current number of distinct values (centroids) in the histogram. This may be fewer than the number of distinct values added if maxCentroids is less than the number of distinct values added.

      Returns number

    • get count(): number

      The current total count accumulated in the histogram (sum of counts of all values)

      Returns number

    • get maxCentroids(): number

      The current maximum number of centroids to store before compression.

      Returns number

    • set maxCentroids(value: number): void

      Parameters

      • value: number

      Returns void

    Methods

    • Adds a value to the histogram. If you want to dynamically limit loss, monitor the returned loss and adjust maxCentroids accordingly.

      Parameters

      • value: number

      Returns number

      The loss incurred by compression, if any

    • Adds one or more centroids to the histogram. Adds all centroids before compressing to incur the least loss. If you want to reduce memory usage, or monitor loss, use an iterator with sequential calls to this rather than this method.

      Parameters

      Returns number

      The maximal loss incurred by compression, if any

    • Returns an iterator for the centroids in the histogram in ascending order

      Parameters

      • Optionalcriteria: Criteria

        If specified, the iterator will start at the centroid at the given marker index or value; otherwise it will start at the first centroid

      Returns IterableIterator<Centroid>

    • Returns the interpolated count of values at a given value in the histogram

      Parameters

      • value: number

        The value to find the count for

      Returns number

      The count of the value in the histogram (interpolated, but in whole counts)

    • Returns an iterator for the centroids in the histogram in descending order

      Parameters

      • Optionalcriteria: Criteria

        If specified, the iterator will start at the centroid at the given marker index or value; otherwise it will start at the last centroid

      Returns IterableIterator<Centroid>

    • Returns the quantile marker at a given index, as given by markers at construction (0 = median, 1 = lower quartile, 2 = upper quartile, etc.)

      Parameters

      • index: number

        The index of the marker to find the value for

      Returns Quantile

      The Quartile information at the given marker in the histogram

    • Merges all the centroids from another histogram into this one. Adds all centroids before compressing to incur the least loss. If you want to reduce memory usage, or monitor loss, use an iterator with sequential calls to append rather than this method.

      Parameters

      Returns void

    • Computes the peaks in the histogram. This is a simple peak detection algorithm that finds local maxima in the histogram for frequency of cluster detection. A peak is defined as a local maximum where the count is greater than the counts of the centroids on either side. If there are too few centroids to allow for the given smoothing in the histogram, no peaks will be returned

      Parameters

      • smoothing: number = 3

        The number of centroids in each "direction" to consider for smoothing the peaks (default 3)

      Returns IterableIterator<Peak>

      Iterator for the computed Peak entries in the histogram

    • Returns the value at a given quantile in the histogram (0.5 = median)

      Parameters

      • quantile: number

        The quantile (0-1) to find the value for

      Returns Quantile

      The centroid at the given quantile in the histogram

    • Returns the rank of a value in the histogram - count of all values less than or equal to the given value This method interpolates the rank between and beyond centroids based on the normal distribution of each centroid.

      Parameters

      • value: number

        The value to find the rank for.

      Returns number

      The rank of the value in the histogram

    • Returns the centroid at a given rank in the histogram (count / 2 = median)

      Parameters

      • rank: number

        The rank (accumulated count from start if positive, end if negative) to find the value for

      Returns Quantile

      The Quantile information at the given rank in the histogram - always returns a positive rank