Francisco Leite | Página Inicial

Introduction

This page describes the functions required to use OPF classifier. This classifier resembles the behaviour of k-NN classifier. Particularly the 1-NN classifier. However, it is optimized in the sense that instead of using all samples to built the classifier, only a subset of it is used, which is called prototypes. With prototypes calculated, a trainning is performed to Fit the data. Once Fit occurs, new samples are classified using the Predictor.

Function iaseed_fra

Synopse

Return prototypes for OPF classifier

Parâmetros:
  • MST - Sample Minimum Spanning Tree.
  • label - Sample label.
Saída:
  • prototypes - OPF prototypes

Description

Given a graph represented by its adjacency matrix and its calculated MST, the function evaluates root nodes, called prototypes, that lay on the border between node classes, that is, suppose 3 nodes 1, 2 and 3 with class label 1 and 2 nodes 4 and 5 with class label 2. The prototypes shall be nodes 3 and 4.

Function Code

 1 from numpy import *
 2 
 3 
 4 def iaseed_fra(MSTl,label):
 5     n = len(MSTl)
 6     seed=[]
 7     for i in range(n): #MST ha n-1 edges
 8         if label[MSTl[i][1]]!=label[MSTl[i][2]]:
 9             if not(seed.count(MSTl[i][1])):
10                   seed.append(MSTl[i][1])
11             if not(seed.count(MSTl[i][2])):
12                   seed.append(MSTl[i][2])
13     return seed

Function iafit_fra

Synopse

Trainned classifier.

Parâmetros:
  • feats - Feature descriptor array.
  • labels - Features label.
  • evaladj - Distance evaluation indicator
Saída:
  • opfcosts - Costs associated to each sample according with trainning
  • opflabel - labelled sample according with trainning

Description

The FIT function performs an optimum partitioning of the graph in a forest such that any node is associated to the prototype which gives the minimum path cost connecting the prototype to that node in the graph. All the nodes more strongly connected to a prototype form a tree of the forest and are classified with the same class of the prototype. If distance evaluator is set to zero, then Features shall be an array of pre-computed distances, otherwise the function will compute distances between elements of feature array using euclidean distance.

Function Code

 1 from iafitpred_fra import iaseed_fra
 2 from iamstk_fra import *
 3 from iadijkstra_fra import *
 4 
 5 def iafit_fra(feats,labels,evaladj=1):
 6 
 7     from scipy.spatial import distance
 8     from scipy.spatial.distance import pdist
 9 
10     #adjacency matrix
11     if evaladj:
12         A = distance.squareform(distance.pdist(feats,'euclidean'))
13     else:
14         A = feats
15     # MST list
16     mst=iamstk_fra(A)
17     n=A.shape[0]
18     opfC=ones((n,1))*float('inf')
19     opfL=zeros((n,1), int)
20     sl=iaseed_fra(mst[1],labels)  #seed list
21     #initialize costs and labels
22     for p in sl:
23         #print p
24         opfC[p]=0
25         opfL[p]=labels[p]
26     #Dijkstra
27     for i in arange(n):
28         if not(sl.count(i)): # no nao e prototipo
29             cmin=ones((len(sl),1))*float('inf')
30             lmax=zeros((len(sl),1))
31             c,cc,s,soma=iadijkstra_fra(A,sl,i)
32             minc=min(c)
33             opfC[i]=minc
34             cx=array(c)
35             ccx=array(cc).reshape(len(sl),1)
36             sx=array(s).reshape(len(sl),1)
37             #if equal costs take the shortest path
38             cmin[where(cx==minc)]=ccx[where(cx==minc)]
39             minl=min(cmin)
40             #if equal pahs take shortest label
41             lmax[where(cmin==minl)]=labels[sx[where(cmin==minl)]]
42             opfL[i]=max(lmax)
43             # take label of seed offering best path
44     return opfC,opfL

Functin iapredict_fra

Synopse

Return new classified sample.

Parameters:
  • test - Sample descriptor array.
  • train - Sample descriptor for trainning set array.
  • opfC - Sample cost as given by iafit_fra.
  • opfC - Sample label as given by iafit_fra.
  • evaldist - Distance evaluation indicator
Saída:
  • opfcustos - Sample cost
  • opflabel - Labeled sample

Description

Given a new sample t, the prediction function evaluates all possible paths from t to the trees calculated by Fit function, then it assigns the label of t according to label of prototype offering strongest connection. If distance evaluator is set to zero, then test shall be an array of pre-computed distances, otherwise the function will compute distances from sample array to train array using euclidean distance.

Function iaclassify_fra

Synopse

This function performs supervised classification.

  • OPF_test = iaclassify_fra(test_set,train_set,train_labels)

    • Output

      • OPF_class: Labelled samples.
    • Input

      • test_set: Sample descriptor array.
      • train_set: Train descriptor array.
      • train_labels: Train labels array.

Description

This functions is a wrapper function for iapredict_fra and iafit_fra

Function Code

 1 def iapredict_fra(test,train,opfC,opfL,evaldist=1):
 2 
 3     from scipy.spatial.distance import cdist
 4 
 5     if evaldist :
 6         dist=cdist(test,train,'euclidean') #dimension maxmb
 7     else:
 8         dist=test
 9     distx=transpose(dist) # opfC has dimension mb distx has dimension mbxma
10     ma=dist.shape[0]
11     mb=distx.shape[0]
12     opfp=zeros((ma,1))
13     for i in arange(ma):
14         # align dimensions of opfC and distx to mbX1
15         djm=maximum(distx[:,i].reshape(mb,1),opfC)
16         opfp[i]=opfL[argmin(djm)]
17     return opfp
18 
19 def iaclassify_fra(train_feat,train_label,test_feat):
20     opfcost,opflabel=iafit_fra(train_feat,train_label)
21     opfclass=iapredict_fra(test_feat,train_feat,opfcost,opflabel)
22     return transpose(opfclass).reshape(opfclass.shape[0])

Example 1

This simple example is taken from [Jpapa2009]. Distance matrix is pre-computed.

  • Trainning
 1 from numpy import *
 2 from  iafitpred_fra  import iafit_fra, iapredict_fra
 3 
 4 d=ones((5,5))*float('inf')
 5 label=[2,2,2,1,1]
 6 d[0,1]=0.2
 7 d[0,2]=0.3
 8 d[0,3]=0.6
 9 d[0,4]=0.7
10 
11 d[1,0]=0.2
12 d[1,2]=0.1
13 d[1,3]=0.8
14 d[1,4]=0.8
15 
16 d[2,0]=0.3
17 d[2,1]=0.1
18 d[2,3]=0.8
19 d[2,4]=0.7
20 
21 d[3,0]=0.6
22 d[3,2]=0.8
23 d[3,1]=0.8
24 d[3,4]=0.5
25 
26 d[4,0]=0.7
27 d[4,2]=0.7
28 d[4,3]=0.5
29 d[4,1]=0.8
30 
31 c,l=iafit_fra(d,label,0)
32 print "OPF costs"
33 print c
34 print "OPF labels"
35 print l
OPF costs
[[ 0. ]
 [ 0.2]
 [ 0.2]
 [ 0. ]
 [ 0.5]]
OPF labels
[[2]
 [2]
 [2]
 [1]
 [1]]
  • Prediction
1 dj=array([[0.6,0.5,0.4,0.7,0.3]])
2 djm=iapredict_fra(dj,0,c,l,0)
3 print "classifying new sample"
4 print djm[0][0]
classifying new sample
2.0

Example 2:

Iris dataset classification.

 1 from numpy import*
 2 from iafitpred_fra import*
 3 import courseIA368Q1S2012.Ex09 as ex
 4 import time
 5 import iaOPF as ia
 6 
 7 t1=time.time()
 8 opfcost,opflabel = iafit_fra(ex.iris_feats_train,array(ex.iris_labels_train))
 9 opfcl = iapredict_fra(ex.iris_feats_test,ex.iris_feats_train,opfcost,opflabel).astype(int)
10 t2=time.time()
11 
12 print 'Confusion Matrix found by my OPF'
13 cm = ia.iaconfmtx(ex.iris_labels_test,transpose(opfcl).reshape(opfcl.shape[0]))
14 print cm
15 print 'My OPF execution time'
16 print t2-t1
Confusion Matrix found by my OPF
[[ 25.   0.   0.]
 [  0.  23.   1.]
 [  0.   2.  24.]]
My OPF execution time
0.818936109543

Performance Test

These tests are imported from OPF Fit Predict Test Page. Example 2 different datasets (Iris and Digits). The results and the processing time are diplayed in a Table.

gravando arquivo: /home/rubens/www/media/Attachments/iaOPF/iafitpred_fra/testperf.pkl

Autor Funcao Iris   Digits  
Francisco iafitpred_fra 873.369 ms 0.96 264042.953 ms 0.992619926199

References

[Jpapa2009]João P. Papa, Alexandre Falcão and Celso Suzuki. Supervised Pattern Classification based on Optimum-Path Forest. Intl. Journal of Imaging Systems and Technology. http://www.ic.unicamp.br/~afalcao/libopf/opf-ijist09.pdf

Algorithm

See [Jpapa2009]

See also

A list of other related functions can be found on link below: