Datasets
Reading Datasets
Data can be read from a file using the read-dataset function in the incanter.io library:
(use 'incanter.io)
(def data (read-dataset "datafile.csv" :header true))
The default delimiter is a comma, but other delimiters can be specified with the :delim option.
Incanter comes with sample data that can be loaded using the get-dataset function from the incanter.datasets library. The get-dataset function relies on the incanter.home property, which is set to ./ by default. If you use bin/clj to start the Clojure shell (REPL) from the Incanter directory, get-dataset will be able to find the data sets in incanter/data. If you want to start the REPL from another directory, or use another environment to run it (e.g. emacs/slime), then you need to pass the incanter.home property to the JVM at startup: java -Dincanter.home=$INCANTER_HOME ... or use the :incanter-home option to get-dataset.
To load and view Edgar Anderson’s Iris dataset:
(use '(incanter core datasets))
(def iris (get-dataset :iris))
(view iris)

Converting Datasets to Matrices
A dataset can be converted to a matrix, where non-numeric columns are converted to either
numeric codes or dummy variables, using the to-matrix function.
(def iris-mat (to-matrix iris))
(view iris-mat)

To convert the ‘Species’ column to two binary dummy-variables use the dummies option.
(def iris-dummy (to-matrix iris :dummies true))
(view iris-dummy)
Saving Data
Datasets and matrices can be written to a file using the save function.
(save iris "/tmp/iris.csv")The default delimiter is a comma, but other delimiters can be selected with the
:delimoptions. Dataset headers are written to the file automatically, but headers can be specified
for matrices with the
:header option.(save iris-mat "/tmp/iris_mat.csv"
:header ["Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"])
The :append option can be used to append instead of overwriting an existing file.
References
For further information on using datasets and matrices in Incanter see:
