net.sourceforge.cilib.functions.clustering
Class ClusteringFitnessFunction

java.lang.Object
  extended by net.sourceforge.cilib.functions.Function
      extended by net.sourceforge.cilib.functions.ContinuousFunction
          extended by net.sourceforge.cilib.functions.clustering.ClusteringFitnessFunction
All Implemented Interfaces:
Serializable, Cloneable
Direct Known Subclasses:
HalkidiVazirgiannisIndex, InterClusterDistance, IntraClusterDistance, KHarmonicMeansFunction, MaulikBandyopadhyayIndex, NonParametricClusteringFunction, ParametricClusteringFunction, QuantisationErrorFunction, ScatterSeperationRatio, TuriIndex, VeenmanReindersBackerIndex

public abstract class ClusteringFitnessFunction
extends ContinuousFunction

This abstract class defines member variables and member functions that can be used by subclasses to calculate the fitness of a particular clustering in their respective evaluate methods.
This class makes extensive use of the ClusteringUtils helper class. The helper variable is set at the beginning of the evaluate(Vector) method so that it can be accessible to the classes that inherit from this one.
This class makes use of a ClusterCenterStrategy to enable the user of this class to specify what is meant by the center of a cluster, because sometimes the centroid is used and other times the mean is used. By default, the cluster center is interpreted as the cluster centroid.

Author:
Theuns Cloete
See Also:
Serialized Form

Field Summary
protected  ArrayList<Vector> arrangedCentroids
           
protected  ArrayList<Hashtable<Integer,ClusterableDataSet.Pattern>> arrangedClusters
           
protected  ClusterCenterStrategy clusterCenterStrategy
           
protected  int clustersFormed
           
protected  ClusteringUtils helper
           
 
Constructor Summary
ClusteringFitnessFunction()
          This constructor cannot be called directly since this is an abstract class.
 
Method Summary
 double calculateAverageIntraClusterDistance()
          Calculate the average intra-cluster distance.
 double calculateAverageSetDistance(int i, int j)
          Calculate the average distance between two clusters (sets).
 double calculateClusterDiameter(int k)
          Calculate the diameter of the given cluster, i.e.
abstract  double calculateFitness()
           
 double calculateIntraClusterDistance()
          Calculate the intra-cluster distance.
 double calculateMaximumAverageDistance()
          Calculate the Maximum Average Distance between the patterns in the dataset and the centers learned so far.
 double calculateMaximumInterClusterDistance()
          Calculate the longest distance between two clusters.
 double calculateMaximumSetDistance(int i, int j)
          Calculate the maximum distance between two clusters (sets).
 double calculateMinimumInterClusterDistance()
          Calculate the shortest distance between two clusters.
 double calculateMinimumSetDistance(int i, int j)
          Calculate the minimum distance between two clusters (sets).
 double calculateQuantisationError()
          Calculate the Quantisation Error.
 double evaluate(Vector centroids)
          This method is responsible for various things before the fitness can be returned: Arrange the patterns in the dataset to belong to its closest centroid.
abstract  ClusteringFitnessFunction getClone()
          Create a cloned copy of the current object and return it.
 Double getMaximum()
          Accessor for the function maximum.
 Double getMinimum()
          Accessor for the function minimum.
 void setClusterCenterStrategy(ClusterCenterStrategy ccs)
           
protected  double validateFitness(double fitness)
          This method logs the cases when the fitness is less than zero.
protected  Double worstFitness()
           
 
Methods inherited from class net.sourceforge.cilib.functions.ContinuousFunction
evaluate
 
Methods inherited from class net.sourceforge.cilib.functions.Function
getBehavioralDomainRegistry, getDimension, getDomain, getDomainRegistry, setBehavioralDomain, setBehaviouralDomainRegistry, setDomain
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

helper

protected ClusteringUtils helper

clusterCenterStrategy

protected ClusterCenterStrategy clusterCenterStrategy

arrangedClusters

protected ArrayList<Hashtable<Integer,ClusterableDataSet.Pattern>> arrangedClusters

arrangedCentroids

protected ArrayList<Vector> arrangedCentroids

clustersFormed

protected int clustersFormed
Constructor Detail

ClusteringFitnessFunction

public ClusteringFitnessFunction()
This constructor cannot be called directly since this is an abstract class. Subclasses call this constructor, from their constructor(s) via the super() command. The default domain is not set, because it should be specified on the ClusteringProblem.

Method Detail

getClone

public abstract ClusteringFitnessFunction getClone()
Description copied from class: ContinuousFunction
Create a cloned copy of the current object and return it. In general the created copy will be a deep copy of the provided instance. As a result this operation an be quite expensive if used incorrectly.

Specified by:
getClone in interface Cloneable
Specified by:
getClone in class ContinuousFunction
Returns:
An exact clone of the current object instance.
See Also:
Object.clone()

evaluate

public double evaluate(Vector centroids)
This method is responsible for various things before the fitness can be returned:
  1. Arrange the patterns in the dataset to belong to its closest centroid. We don't care how this is done, since it is handled by the ClusteringUtils.arrangeClustersAndCentroids(Vector) method. We also assume that this method removes empty clusters and their associated centroids from the arranged lists
  2. Sets the arrangedClusters member variable which is available to sub-classes.
  3. Sets the arrangedCentroids member variable which is available to sub-classes.
  4. Calculate the fitness using the given centroids Vector. We don't care how this is done, since it is handled by the abstraction (polymorphism) created by the hierarchy of this class. This is achieved via the abstact calculateFitness() method
  5. Validate the fitness, i.e. make sure the fitness is positive >= 0.0.
Steps 1 - 3 have to be performed before the fitness is calculated, using the given centroids Vector, in step 4.

Specified by:
evaluate in class ContinuousFunction
Parameters:
centroids - The Vector representing the centroid vectors
Returns:
the fitness that has been calculated

calculateFitness

public abstract double calculateFitness()

calculateQuantisationError

public double calculateQuantisationError()
Calculate the Quantisation Error.

This is explained in Section 4.1.1 on pages 104 & 105 of:

Returns:
the Quantisation Error of the particular clustering.

calculateMaximumAverageDistance

public double calculateMaximumAverageDistance()
Calculate the Maximum Average Distance between the patterns in the dataset and the centers learned so far.

See Section 4.1.1 at the bottom of page 105 of:

Returns:
the maximum average distance between the patterns of a cluster and their associated center.

calculateMinimumInterClusterDistance

public double calculateMinimumInterClusterDistance()
Calculate the shortest distance between two clusters. In other words, the shortest distance between the centroids of any two clusters.

Returns:
the minimum inter-cluster distance

calculateMaximumInterClusterDistance

public double calculateMaximumInterClusterDistance()
Calculate the longest distance between two clusters. In other words, the longest distance between the centroids of any two clusters.

Returns:
the maximum inter-cluster distance.

calculateMinimumSetDistance

public double calculateMinimumSetDistance(int i,
                                          int j)
Calculate the minimum distance between two clusters (sets).

This is illustrated in Equation 20 of:

Parameters:
i - the index of the LHS cluster
j - the index of the RHS cluster
Returns:
the shortest distance between the patterns of two clusters (sets)

calculateMaximumSetDistance

public double calculateMaximumSetDistance(int i,
                                          int j)
Calculate the maximum distance between two clusters (sets).

Illustrated in Equation 21 of:

Parameters:
i - the index of the LHS cluster.
j - the index of the RHS cluster.
Returns:
the longest distance between the patterns of two clusters (sets).

calculateAverageSetDistance

public double calculateAverageSetDistance(int i,
                                          int j)
Calculate the average distance between two clusters (sets).

Illustrated in Equation 22 of:

Parameters:
i - the index of the LHS cluster.
j - the index of the RHS cluster.
Returns:
the average distance between the patterns of two clusters (sets).

calculateClusterDiameter

public double calculateClusterDiameter(int k)
Calculate the diameter of the given cluster, i.e. the distance between the two patterns (in the set) that are furthest apart. There exists numerous references for this calculation.

Parameters:
k - the index of the cluster for which the diameter should be calculated.
Returns:
the diameter of the given cluster.

calculateIntraClusterDistance

public double calculateIntraClusterDistance()
Calculate the intra-cluster distance. In other words, the sum of the distances between all patterns of all clusters and their associated centroids. The calculation is specified by Equation 13 in Section IV on page 124 of:

Returns:
the average intra-cluster distance for all clusters

calculateAverageIntraClusterDistance

public double calculateAverageIntraClusterDistance()
Calculate the average intra-cluster distance. In other words, the average of the distances between all patterns of all clusters and their associated centroids. The calculation is specified in Section 3.2 on page 2 of:

Returns:
the average intra-cluster distance for all clusters.

setClusterCenterStrategy

public void setClusterCenterStrategy(ClusterCenterStrategy ccs)

getMinimum

public Double getMinimum()
Description copied from class: ContinuousFunction
Accessor for the function minimum. This is the minimum value of the function in the given domain.

Overrides:
getMinimum in class ContinuousFunction
Returns:
The minimum function value.

getMaximum

public Double getMaximum()
Description copied from class: ContinuousFunction
Accessor for the function maximum. This is the maximum value of the function in the given domain.

Overrides:
getMaximum in class ContinuousFunction
Returns:
The maximum of the function.

worstFitness

protected Double worstFitness()

validateFitness

protected double validateFitness(double fitness)
This method logs the cases when the fitness is less than zero. We do not want the Parametric fitness to be less than zero. Fitnesses drop below zero when the centroids are outside the given domain which causes zMax to be too small to compensate. TODO: Should this function always return the fitness? TODO: Or should it return NaN when the fitness drops below 0.0? TODO: Or should we throw an exception?

Parameters:
fitness - the fitness value that will be validated.
Returns:
the fitness.


Copyright © 2009 CIRG. All Rights Reserved.