(DBWORLD) CfP: 1st KDD-Sisyphus

Martin Staudt (martin.staudt@swisslife.ch)
Mon, 22 Dec 1997 10:57:33 +0100

*********************************************************************

CALL FOR CONTRIBUTIONS

ECML'98 Workshop ``KDD-SISYPHUS'':
Data Preparation, Preprocessing and Reasoning
for Real-World Data Mining Applications

http://research.swisslife.ch/kdd-sisyphus/

organised with support of the Compulog Net
area `Computational Logic and Machine Learning'

in conjunction with the
10th European Conference on MACHINE LEARNING (ECML'98)

Chemnitz, Germany, April 24 1998

http://www.tu-chemnitz.de/informatik/ecml98/

*********************************************************************

Motivation and Intention
------------------------

The growing interest in Data Mining as a key technology for analyzing
huge data sets has in particular brought certain ML algorithms such as
C4.5 into the stage of commercial maturity and wide-spread acceptance.
We can only expect useful output from analyses by data mining
algorithms, if they are embedded in the whole Knowledge Discovery
process. This whole KDD process has to cover tasks like data
extraction, sampling, data integration and homogenization, data
cleaning, data transformation - which all can be understood as
preprocessing steps preceding the actual mining phase - as well as the
evaluation and (often manual) interpretation of the mining results.

The KDD community and even commercial vendors have begun to elaborate
ideas to support the whole process instead of simply supporting single
mining algorithms. Dedicated KDD support environments (KDDSE) offer
powerful preprocessing operators on the data. Examples for tools going
into this direction are Kepler, Clementine and IMACS. However, the
appropriate application of the operators on the underlying data
sources with respect to the great variety of mining algorithms and
their features remains a very difficult and case-specific problem.

Following the idea of Sisyphus tracks developed sucessfully in the
Knowledge Acquisition community, KDD-Sisyphus is a project which
should constitute a collection of data and problems as a common ground
for better comparisons and discussions of the applicability of data
mining and machine learning algorithms with main focus on the required
preprocessing features.

Aims of the Workshop
--------------------

The 1st KDD-Sisyphus workshop organized at the ECML'98 conference
wants to bring together developers of algorithms who should be forced
to think about the ways how to prove the usefulness of their
algorithms on real world data, as well as people who are interested in
building KDDSE tools which integrate various data mining algorithms as
possible core phases for KDD applications.

We are especially interested in the following topics:
- Identify neccessary and useful preprocessing operations and tools,
i.e. to get the application know-how from the algorithm developer.

- Examine ways of how these preprocessing operations can be
represented (e.g. for documention and reuse) as well as executed
efficiently on large datasets.

- Compare the different data mining approaches with respect to
their input requirements.

- Compare different (logical) representations of the problem
and discuss their advantages/disadvantages. Examine the
need for multi-relational representations to cover all the
1:N and N:M relations between the different entities of this
KDD-Sisyphus problem.

- The impact of (unsatisfiable) data mining results on further
preprocessing.

- Establish usability criteria for different ML-approaches wrt.
data-mining, e.g.

- scalability: number of records, number of attributes,
multiple relations vs. learning time and
space requirements
- robustness: handling of missing values, missing related
tuples, noise-tolerance, nominal attributes
with many different values, etc.
- learning goal: classification, clustering, rule learning, etc.
- understandability: size und presentation of mining results.

- Parameter-settings of the data mining algorithm and their impact
on the mining result.

KDD-Sisyphus I
--------------

The standard Machine Learning data sets (as e.g. available from
http://www.ics.uci.edu/~mlearn/MLRepository.html or
http://www.gmd.de/ml-archive/frames/datasets/datasets-frames.html)
provide hints for the practical usefulness of algorithms only in a
very limited way because of their homogeneous structure which often
ideally fits the input requirements of these algorithms.

The KDD-Sisyphus Workshop provides the Sisyphus I package which is
based on data extracted from a real-world insurance business
application. As such it shows typical properties like fragmentation,
varying data quality, irregular data value codings, etc. which makes
the application of data mining or machine learning algorithms a real
challenge and usually requires sophisticated preprocessing methods.

The work package of KDD-Sisyphus I contains

- a data set consisting of 10 relations with 5-50 attributes and
around 200.000 data tuples in Ascii-Format,
- a rough schema description explaining the data types and
their semantic relationships,
- three data mining task descriptions (2 classification and 1
clustering task)

and can be obtained from
http://research.swisslife.ch/kdd-sisyphus/

Contribution Requirements
-------------------------

Contributions are expected to present the necessary preprocessing
steps and the resulting representation required for successfully
applying certain data mining algorithms (either own or third-party
approaches) on the specified mining tasks. The employed mining
algorithms themselves should NOT be described in the paper but be
referred to by pointing to a suited reference. The main emphasis of
the reported experiments with the supplied data set should lie on the
application methodology (e.g. data transformations, data
amalgamations, scaling, parameter settings etc.) instead. The
accuracy of the actually achieved mining results is less important.

Submission Procedure
--------------------

The experience reports as draft papers or extended abstracts should be
sent by email to Joerg-Uwe Kietz (uwe.kietz@swisslife.ch) before March
15, 1998. Acceptance and revision notes will be e-mailed by March 31,
1998. The final version of accepted papers must be delivered by April
10, 1998. Authors are expected to personally present their results at
the workshop.

Submission and review of papers, and coordination of all aspects of
the meeting, is done via Internet. The proceedings will be
published as hard copies in the Swiss Life Research Report Series
and electronically on the WWW in the CEUR Workshop proceedings series.

The preferred format for submissions is Postscript. Style files and
formatting instructions for the final versions will be sent together
with the acceptance notifications.

Workshop Coordinators
---------------------

Joerg-Uwe Kietz (uwe.kietz@swisslife.ch) and
Martin Staudt (martin.staudt@swisslife.ch)

Swiss Life
Information Systems Research
CH/IFUE
Postfach
CH-8022 Zurich, Switzerland

Program Committee
-----------------

Peter Flach, Tilburg Univ., The Netherlands
Joerg-Uwe Kietz, Swiss Life, Switzerland
Nada Lavrac, Jozef Stefan Institute, Slovenia
Katharina Morik, Univ. Dortmund, Germany
Ulrich Reimer, Swiss Life, Switzerland
Celine Rouveirol, LRI Univ. Paris-Sud, France
Stefan Sklorz, RWTH Aachen, Germany
Martin Staudt, Swiss Life, Switzerland

Important Dates
_______________

Papers due: March 15, 1998
Notification: March 31, 1998
Final version: April 10, 1998
Workshop date: April 26, 1998

---------------------------------------------
Martin Staudt
Swiss Life
Information Systems Research, CH/IFUE
P.O.Box
CH-8022 Zurich, Switzerland
Phone: +41-1-711-4617 Fax: +41-1-711-5007
Email: martin.staudt@swisslife.ch
Web: http://research.swisslife.ch
---------------------------------------------

--------------------------------------------------------------------------
The dbworld list reaches many people, and should only be used for
messages of general interest to the database community.
To subscribe or unsubscribe yourself (or optionally (address)) from
dbworld, send a msg to majordomo@cs.wisc.edu with one of these lines:
subscribe dbworld (address)
unsubscribe dbworld (address)
To find out more options send a msg with the line:
help
--------------------------------------------------------------------------