# Class interface

• fit
• X: features or distances matrix (each line is the features of a sample, or each column is the distance to the other samples)

• Y: labels array (1D array where each element is the label of the sample)

• learning: "default", "iterative", "agglomerative"

• metric: "euclidian", "log_euclidian", "chi_square", "manhattan", "canberra", "squared_chord", "squared_chi_square", "bray_curtis"

• precomputed_distance: True or False

• Input Contract:
• Number of lines of X must be equal to the number of elements of Y
• Learning must be "default"
• Metric must "euclidian"
• If precomputed_distance is True, X must be square, and filled with the distances
• If Y is not supplied, its the unsupervised training (no labels = clustering)
• predict
• X: features or distances matrix (each line is the features of a sample, or each column the distance to the other samples)

• Input Contract:
• X shape must match parameters of fit: if precomputed distance, the number of lines must match the number of nodes on the graph; if not, the number of columns must match the number of features
• Output Contract:
• Returns an array containing the labels for the number of samples on input X

# Graph Representation

The graph is represented by two members: (1) a list of tuples V, where each tuple is a vertex and contains the indices of the adjacent vertices; and (2) a distance matrix f, that may be stored as a sparse matrix.

Example:

```V =

[(1, 3),
(0, 2),
(1, 3),
(0, 2)]

f = [[0, 10, 0, 5],
[10, 0, 3, 0],
[0,  3, 0, 4],
[5,  0, 4, 0]]
```

This representation allows to efficiently store a complete graph, represented by a special case where V = [].

The generation of the list of edges and the associated value is simple:

```from itertools import product
import numpy

# the number of vertices
nodes = f.shape[0]

# a list with all non repeating edges
edges = list(combinations(range(nodes), 2))

# an array of the edges
edges_ar = numpy.array(edges)
rows = edges_ar[:,0]
cols = edges_ar[:,1]

# extract the weight of each edge
weigths = f[rows, cols]

# a list of edges with its associated weight
graph = zip(edges, weigths)

# remove zeroed edges
graph = filter(lambda x: x[1] != 0, graph)

# now we can sort it based on the weigth
from operator import itemgetter
graph_sorted = sorted(graph, key=itemgetter(1))
```

For the example above, graph is:

```[((0, 1), 10), ((0, 3), 5), ((1, 2), 3), ((2, 3), 4)]
```

And graph_sorted is:

```[((1, 2), 3), ((2, 3), 4), ((0, 3), 5), ((0, 1), 10)]
```

# MST Algorithms

• Boruvka

• Não usa estruturas complexas
• Prim

• O kernel é o mesmo do Dijkstra, mas a fila de prioridade contém as arestas ao invés dos vértices
• A implementação mais eficiente utiliza um head de Fibonacci, que não é nativo do Python
• Kruskal

• Algoritmo mais simples, com estruturas de dados simples (arestas ordenadas e union-find)
• Mesma complexidade do alg. de Prim
• Chazelle

• PDF
• A complexidade deste algoritmo é O(m α(m,n)), que pra motivos práticos acaba sendo linear em m, que é o número de arestas do grafo
• Paralelismo