net.sourceforge.cilib.problem
Class ClusteringProblem

java.lang.Object
  extended by net.sourceforge.cilib.problem.OptimisationProblemAdapter
      extended by net.sourceforge.cilib.problem.ClusteringProblem
All Implemented Interfaces:
Serializable, OptimisationProblem, Problem, Cloneable

public class ClusteringProblem
extends OptimisationProblemAdapter

This class is used to setup/configure a problem that is capable of clustering the data in a dataset, more specifically the data contained in an AssociatedPairDataSetBuilder. Clustering is an optimisation problem. The process of optimising a clustering is driven by a fitness function that determines the fitness of a specific clustering. This class therefore wraps a FunctionOptimisationProblem (called the innerProblem) that may either be a FunctionMinimisationProblem or a FunctionMaximisationProblem. The FunctionOptimisationProblem in turn makes use of a function that determines the fitness of the problem being optimised. Because we are clustering data in a dataset, this function should be a ClusteringFitnessFunction.

The following is a list of methods that should be called (usually from XML in this order) to correctly configure a clustering problem:

  1. setDomain(String)
  2. setInnerProblem(FunctionOptimisationProblem)
  3. FunctionOptimisationProblem.setFunction(Function) on the innerProblem
  4. setDataSetBuilder(DataSetBuilder)
  5. DataSetBuilder.addDataSet(DataSet) on the dataset builder

One important aspect that should be noted is that the domain of the dataset (or this clustering problem), the number of clusters and the fitness function used to optimise the clustering are all dependant on one another. The reason for this is that the domain of the dataset is duplicated a number of times and then used as the domain of the clustering fitness function. The number of clusters determines the number of times the domain string is duplicated. See regenerateDomain() for more detail. The reason for this is because the centroids of a clustering is represented by a single Entity such as a Particle or Individual (that have an internal representation of a Vector) and these entities are initialised by using the domain of the FunctionOptimisationProblem which effectively turns out to be the domain of the ClusteringFitnessFunction.

This class also provides a central point for specifying the distanceMeasure that should be used for calculating distances throughout the entire clustering process.

Author:
Theuns Cloete
See Also:
regenerateDomain(), Serialized Form

Field Summary
 
Fields inherited from class net.sourceforge.cilib.problem.OptimisationProblemAdapter
dataSetBuilder, fitnessEvaluations
 
Constructor Summary
ClusteringProblem()
           
ClusteringProblem(ClusteringProblem rhs)
           
 
Method Summary
protected  Fitness calculateFitness(Type solution)
          We are actually optimising the innerProblem, so use it to calculate the fitness.
 DomainRegistry getBehaviouralDomain()
          Return the actual domain of the problem's dataset, i.e.
 ClusteringProblem getClone()
          Create a cloned copy of the current object and return it.
 DistanceMeasure getDistanceMeasure()
          This method will be called from ClusteringUtils.calculateDistance(Vector, Vector) which is the central point for distance calculations during a clustering.
 DomainRegistry getDomain()
          Return the domain as used by the configured fitness function, i.e.
 DomainRegistry getDomainRegistry()
          Return the actual domain of the problem's dataset, i.e.
 int getNumberOfClusters()
          Return the number of clusters used throughout this clustering problem.
 void setDataSetBuilder(DataSetBuilder dsb)
          Use the DataSetManager singleton to parse and/or retrieve the given DataSetBuilder.
 void setDistanceMeasure(DistanceMeasure dm)
          Set the DistanceMeasure that will be used for all distance calculations throughout a clustering.
 void setDomain(String representation)
          Sets the domain of the dataset being clustered.
 void setDomainRegistry(DomainRegistry dr)
          Set the actual domain of the problem's dataset.
 void setInnerProblem(FunctionOptimisationProblem fop)
          Sets the problem that will be used to optimise the clustering.
 void setNumberOfClusters(int noc)
          The expert uses this method to set the number of clusters that should be used to optimise this clustering.
 
Methods inherited from class net.sourceforge.cilib.problem.OptimisationProblemAdapter
accept, changeEnvironment, getChangeStrategy, getDataSetBuilder, getFitness, getFitnessEvaluations, setChangeStrategy
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ClusteringProblem

public ClusteringProblem()

ClusteringProblem

public ClusteringProblem(ClusteringProblem rhs)
Method Detail

getClone

public ClusteringProblem getClone()
Description copied from interface: OptimisationProblem
Create a cloned copy of the current object and return it. In general the created copy will be a deep copy of the provided instance. As a result this operation an be quite expensive if used incorrectly.

Specified by:
getClone in interface OptimisationProblem
Specified by:
getClone in interface Problem
Specified by:
getClone in interface Cloneable
Specified by:
getClone in class OptimisationProblemAdapter
Returns:
An exact clone of the current object instance.
See Also:
Object.clone()

setInnerProblem

public void setInnerProblem(FunctionOptimisationProblem fop)
Sets the problem that will be used to optimise the clustering. This is in most cases a FunctionOptimisationProblem that either optimises the fitness as calculated by a ClusteringFitnessFunction. Once the problem is set (changed), the domain of the ClusteringFitnessFunction is automatically regenerated.

Parameters:
fop - a FunctionOptimisationProblem that should take a ClusteringFitnessFunction that drives the optimisation process.
See Also:
regenerateDomain()

getNumberOfClusters

public int getNumberOfClusters()
Return the number of clusters used throughout this clustering problem.

Returns:
the numberOfClusters

setNumberOfClusters

public void setNumberOfClusters(int noc)
The expert uses this method to set the number of clusters that should be used to optimise this clustering. Once the number of clusters is set (changed), the domain of the ClusteringFitnessFunction is automatically regenerated.

Parameters:
noc - the user-specified number of clusters that should be used to optimise this clustering
See Also:
regenerateDomain()

getDomainRegistry

public DomainRegistry getDomainRegistry()
Return the actual domain of the problem's dataset, i.e. NOT the duplicated domain string as used by the clustering fitness function.

Returns:
the domainRegistry of this clustering problem

setDomainRegistry

public void setDomainRegistry(DomainRegistry dr)
Set the actual domain of the problem's dataset. Once this domain registry is set (changed), the domain of the ClusteringFitnessFunction is automatically regenerated.

Parameters:
dr - the domainRegistry of this clustering problem
See Also:
regenerateDomain()

getBehaviouralDomain

public DomainRegistry getBehaviouralDomain()
Return the actual domain of the problem's dataset, i.e. NOT the duplicated domain string as used by the clustering fitness function.

Returns:
the domainRegistry of this clustering problem

setDomain

public void setDomain(String representation)
Sets the domain of the dataset being clustered. Once this domain string is set (changed), the domain of the ClusteringFitnessFunction is automatically regenerated.

Parameters:
representation - a String representing the domain of the dataset being clustered
See Also:
regenerateDomain()

getDomain

public DomainRegistry getDomain()
Return the domain as used by the configured fitness function, i.e. NOT the simplified domain string of the problem's dataset.

Returns:
the innerProblem's function's domain registry

setDataSetBuilder

public void setDataSetBuilder(DataSetBuilder dsb)
Use the DataSetManager singleton to parse and/or retrieve the given DataSetBuilder. Then use the ClusteringUtils per-thread singleton to set the DataSetBuilder as the current dataset for this clustering.

Specified by:
setDataSetBuilder in interface OptimisationProblem
Overrides:
setDataSetBuilder in class OptimisationProblemAdapter
Parameters:
dsb - the DataSetBuilder that represents the dataset that should be clustered
Throws:
IllegalArgumentException - when the given DataSetBuilder is not an AssociatedPairDataSetBuilder. This is only temporary, because I didn't want to change the more generic DataSetBuilder too much.

setDistanceMeasure

public void setDistanceMeasure(DistanceMeasure dm)
Set the DistanceMeasure that will be used for all distance calculations throughout a clustering.

Parameters:
dm - the desired DistanceMeasure

getDistanceMeasure

public DistanceMeasure getDistanceMeasure()
This method will be called from ClusteringUtils.calculateDistance(Vector, Vector) which is the central point for distance calculations during a clustering.

Returns:
the distanceMeasure

calculateFitness

protected Fitness calculateFitness(Type solution)
We are actually optimising the innerProblem, so use it to calculate the fitness.

Specified by:
calculateFitness in class OptimisationProblemAdapter
Parameters:
solution - The Type representing the candidate solution.
Returns:
the fitness of the current clustering
See Also:
OptimisationProblemAdapter.getFitness(Type, boolean)


Copyright © 2009 CIRG. All Rights Reserved.