# `Scholar.Cluster.KMeans`
[🔗](https://github.com/elixir-nx/scholar/blob/main/lib/scholar/cluster/k_means.ex#L1)

K-Means Algorithm.

K-Means is a simple clustering method that works iteratively [1]. In the first iteration,
centroids are chosen randomly from input data. It turned out that some initializations
are especially effective. In 2007 David Arthur and Sergei Vassilvitskii proposed initialization
called k-means++ which speed up convergence of algorithm drastically [2]. After initialization, from each centroid
find points that are the closest to that centroid. Then, for each centroid replace it with the
center of mass of associated points. These two steps mentioned above are repeated until the solution
converges. Since some initializations are unfortunate and converge to sub-optimal results
we need repeat the whole procedure a few times and take the best result.

Average time complexity is $O(CKNI)$, where $C$ is the number of clusters, $N$ is the number of samples,
$I$ is the number of iterations until convergence, and $K$ is the number of features. Space
complexity is $O(K*(N+C))$.

Reference:

* [1] - [K-Means Algorithm](https://cs.nyu.edu/~roweis/csc2515-2006/readings/lloyd57.pdf)
* [2] - [K-Means++ Initialization](http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf)

# `fit`
[🔗](https://github.com/elixir-nx/scholar/blob/main/lib/scholar/cluster/k_means.ex#L131)

Fits a K-Means model for sample inputs `x`.

## Options

* `:num_clusters` (`t:pos_integer/0`) - Required. The number of clusters to form as well as the number of centroids to generate.

* `:max_iterations` (`t:pos_integer/0`) - Maximum number of iterations of the k-means algorithm for a single run. The default value is `300`.

* `:num_runs` (`t:pos_integer/0`) - Number of time the k-means algorithm will be run with different centroid seeds.
  The final results will be the best output of num_runs runs in terms of inertia. The default value is `10`.

* `:tol` - Relative tolerance with regards to Frobenius norm of the difference in
  the cluster centers of two consecutive iterations to declare convergence. The default value is `0.0001`.

* `:weights` - The weights for each observation in `x`. If equals to `nil`,
  all observations are assigned equal weight.

* `:init` - Method for centroid initialization, either of:

  * `:k_means_plus_plus` - selects initial cluster centroids using sampling based
    on an empirical probability distribution of the points' contribution to
    the overall inertia. This technique speeds up convergence, and is
    theoretically proven to be O(log(k))-optimal.

  * `:random` - choose `:num_clusters` observations (rows) at random from data for
    the initial centroids.

  The default value is `:k_means_plus_plus`.

* `:key` - Determines random number generation for centroid initialization.
  If the key is not provided, it is set to `Nx.Random.key(System.system_time())`.

## Return Values

  The function returns a struct with the following parameters:

  * `:clusters` - Coordinates of cluster centers.

  * `:num_iterations` - Number of iterations run.

  * `:inertia` - Sum of squared distances of samples to their closest cluster center.

  * `:labels` - Labels of each point.

## Examples

    iex> key = Nx.Random.key(42)
    iex> x = Nx.tensor([[1, 2], [2, 4], [1, 3], [2, 5]])
    iex> Scholar.Cluster.KMeans.fit(x, num_clusters: 2, key: key)
    %Scholar.Cluster.KMeans{
      num_iterations: Nx.tensor(
        2
      ),
      clusters: Nx.tensor(
        [
          [1.0, 2.5],
          [2.0, 4.5]
        ]
      ),
      inertia: Nx.tensor(
        1.0
      ),
      labels: Nx.tensor(
        [0, 1, 0, 1]
      )
    }

# `predict`
[🔗](https://github.com/elixir-nx/scholar/blob/main/lib/scholar/cluster/k_means.ex#L304)

Makes predictions with the given `model` on inputs `x`.

## Return Values

  It returns a tensor with clusters corresponding to the input.

## Examples

    iex> key = Nx.Random.key(42)
    iex> x = Nx.tensor([[1, 2], [2, 4], [1, 3], [2, 5]])
    iex> model = Scholar.Cluster.KMeans.fit(x, num_clusters: 2, key: key)
    iex> Scholar.Cluster.KMeans.predict(model, Nx.tensor([[1.9, 4.3], [1.1, 2.0]]))
    Nx.tensor(
      [1, 0]
    )

# `transform`
[🔗](https://github.com/elixir-nx/scholar/blob/main/lib/scholar/cluster/k_means.ex#L332)

Calculates distances between each sample from `x` and the calculated centroids.

## Return Values

  It returns a tensor with corresponding distances.

## Examples

    iex> key = Nx.Random.key(40)
    iex> model =
    ...>  Scholar.Cluster.KMeans.fit(Nx.tensor([[1, 2], [2, 4], [1, 3], [2, 5]]),
    ...>    num_clusters: 2,
    ...>    key: key
    ...>  )
    iex> Scholar.Cluster.KMeans.transform(model, Nx.tensor([[1.0, 2.5]]))
    Nx.tensor(
      [
        [2.2360680103302, 0.0]
      ]
    )

---

*Consult [api-reference.md](api-reference.md) for complete listing*
