net.sourceforge.cilib.problem.dataset
Class LocalDataSet

java.lang.Object
  extended by net.sourceforge.cilib.problem.dataset.DataSet
      extended by net.sourceforge.cilib.problem.dataset.LocalDataSet
All Implemented Interfaces:
Serializable, Cloneable

public class LocalDataSet
extends DataSet

This class represents a local dataset, i.e. a local file on disk that contains lines that represent patterns of the dataset. It is responsible for parsing this file and building up an ArrayList of ClusterableDataSet.Pattern objects. It makes use of a few variables to correctly parse the lines in the dataset and construct ClusterableDataSet.Pattern objects from them. The first variable that is of importance is the delimiter variable. It specifies the delimiter (actually a regular expression) that is used to split up the elements of a single line of the dataset. The default delimiter is a whitespace character. Once that is done, the beginIndex specifies the column number/index where the pattern's data begins. Likewise, the endIndex specifies the column number/index where the pattern's data ends. This index is inclusive. Lastly, the classIndex specifies the column number/index that represents the pattern's class. If the classIndex is -1 then it means that the dataset does not have a column for the class of the patterns, and in this case the filename of the dataset is used as the class.

Author:
Edwin Peer, Theuns Cloete
See Also:
Serialized Form

Field Summary
protected  int beginIndex
           
protected  int classIndex
           
protected  String delimiter
           
protected  int endIndex
           
protected  String fileName
           
 
Fields inherited from class net.sourceforge.cilib.problem.dataset.DataSet
patternExpression
 
Constructor Summary
LocalDataSet()
           
LocalDataSet(LocalDataSet rhs)
           
 
Method Summary
 LocalDataSet getClone()
          Create a cloned copy of the current object and return it.
 byte[] getData()
          Get the contents of the file on disk as an array of bytes.
 String getFile()
          Get the name of the file that represents this dataset on disk.
 InputStream getInputStream()
          Get the contents of the file on disk as an InputStream.
 ArrayList<ClusterableDataSet.Pattern> parseDataSet()
          Parse the dataset, building up a list containing all the patterns in the dataset.
 void setBeginIndex(int bi)
          Sets the index where the elements of the pattern begins.
 void setClassIndex(int ci)
          Sets the index of the column that represents the class of the pattern.
 void setDelimiter(String d)
          Sets the regular expression (as a String} that should be used as delimiter to split a string into the elements of the pattern.
 void setEndIndex(int ei)
          Sets the index where the elements of the pattern ends.
 void setFile(String fileName)
          Set the name of the file that represents this dataset on disk.
 
Methods inherited from class net.sourceforge.cilib.problem.dataset.DataSet
getPatternExpression, setPatternExpression
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fileName

protected String fileName

delimiter

protected String delimiter

beginIndex

protected int beginIndex

endIndex

protected int endIndex

classIndex

protected int classIndex
Constructor Detail

LocalDataSet

public LocalDataSet()

LocalDataSet

public LocalDataSet(LocalDataSet rhs)
Method Detail

getClone

public LocalDataSet getClone()
Description copied from interface: Cloneable
Create a cloned copy of the current object and return it. In general the created copy will be a deep copy of the provided instance. As a result this operation an be quite expensive if used incorrectly.

Specified by:
getClone in interface Cloneable
Specified by:
getClone in class DataSet
Returns:
An exact clone of the current object instance.
See Also:
Object.clone()

setFile

public void setFile(String fileName)
Set the name of the file that represents this dataset on disk.

Parameters:
fileName - the name of the file

getFile

public String getFile()
Get the name of the file that represents this dataset on disk.

Returns:
the name of the file

getData

public byte[] getData()
Get the contents of the file on disk as an array of bytes.

Specified by:
getData in class DataSet
Returns:
the contents of the file on disk as an array of bytes

getInputStream

public InputStream getInputStream()
Get the contents of the file on disk as an InputStream.

Specified by:
getInputStream in class DataSet
Returns:
the contents of the file on disk as an InputStream

parseDataSet

public ArrayList<ClusterableDataSet.Pattern> parseDataSet()
Parse the dataset, building up a list containing all the patterns in the dataset.

Returns:
an ArrayList of ClusterableDataSet.Patterns containing all the patterns in this dataset
Throws:
IllegalArgumentException - when beginIndex == endIndex.

setDelimiter

public void setDelimiter(String d)
Sets the regular expression (as a String} that should be used as delimiter to split a string into the elements of the pattern.

Parameters:
d - the regular expression (as a String) that should be used as delimiter
Throws:
IllegalArgumentException - when the delimiter is empty ("") or null

setBeginIndex

public void setBeginIndex(int bi)
Sets the index where the elements of the pattern begins.

Parameters:
bi - the starting index
Throws:
IllegalArgumentException - when the index is negative

setEndIndex

public void setEndIndex(int ei)
Sets the index where the elements of the pattern ends. This index is inclusive.

Parameters:
ei - the ending index
Throws:
IllegalArgumentException - when the index is negative

setClassIndex

public void setClassIndex(int ci)
Sets the index of the column that represents the class of the pattern. If the index is -1 then the filename of the dataset will be used as the class.

Parameters:
ci - the index where the class resides
Throws:
IllegalArgumentException - when the index is <-1


Copyright © 2009 CIRG. All Rights Reserved.