(DBWORLD) cfp: KDD-97 Knowledge Discovery and Data Mining Tools Competition

Ismail Parsa (iparsa@epsilon.com)
Mon, 19 May 1997 13:16:38 -0500 (CDT)

----------------------------------------------------------------------
CALL FOR PARTICIPATION

KNOWLEDGE DISCOVERY CUP (KDD-CUP-97):

A Knowledge Discovery and Data Mining Tools Competition


to be held in conjunction with

THE THIRD INTERNATIONAL CONFERENCE ON
KNOWLEDGE DISCOVERY AND DATA MINING (KDD-97)

http://www-aig.jpl.nasa.gov/HyperNews/get/KDD97.html
----------------------------------------------------------------------

To benchmark the performance of existing knowledge discovery and data
mining (KDDM) tools, we are organizing a KDDM tools competition this
year. The Knowledge Discovery Cup will be open to all KDDM tool
vendors, academics and corporations with significant applications. All
products, applications, research prototypes and black-box solutions
are welcome. Our aim is not to rank the participants but to recognize
the most innovative, efficient and methodologically advanced KDDM
tools.

The participants are required to demonstrate the performance of their
KDDM tool in one or all of the following areas:

A. Supervised Learning: Classification or Discrimination
B. Unsupervised Learning: Clustering or Segmentation.

In the interest of time, the regression or prediction category and
other descriptive modeling techniques, such as the association rules,
are not included in the competition this year.

The registration deadline for the Knowledge Discovery Cup is June 13,
1997. The training and validation data set(s) will be sent to the
participants by June 19th. All participants must send back the results
along with a scoring code[1] by July 17th, one month prior to the
KDD-97 conference. The scoring code will be used by the KDD-CUP-97
committee to independently validate the results. A jury of experts (to
be announced) will determine the winner(s.) The results of the
competition will be privately announced to the participants by August
11, 1997.

The top three performing tools in the supervised and unsupervised
learning categories will be announced during the KDD-97 conference and
will be awarded Gold Miner, Silver Miner and Bronze Miner awards,
respectively. The winners will also be listed in the KD Nuggets web
site (http://www.kdnuggets.com) until the beginning of the KDD-98
conference, unless the participants and their affiliated companies/
institutions wish to preserve their anonymity.

If your company/institution would like to sponsor this event, please
indicate it in the registration brochure.

ATTENDANCE TO KDD-97 CONFERENCE IS NOT REQUIRED TO PARTICIPATE IN THE
CUP. If you will not attend the KDD-97 conference but would like to
participate in the KDDM tools competition then indicate it in the
registration brochure.

[1] The scoring code is a stand alone C or C++ callable program or
hard code that carries out all the steps required to implement
the learning algorithm outside the model building environment.
In addition to the numeric values of the weights, it also
includes preprocessing statements for treating the missings,
transforming/normalizing/standardizing inputs, etc. It is
ultimately used in computing the predicted value or output
outside the modeling environment. For decision tree algorithms,
the preprocessing code along with the 'if-then-else' rules
constitute the scoring code.

EVALUATION CRITERIA
-------------------

A. CLASSIFICATION OR DISCRIMINATION CATEGORY

Although the predictive power, i.e., the classification accuracy, of
the resulting model measured in terms of lift (the term 'lift' implies
improvement over random or no prediction) will be the primary
evaluation criterion in the classification category, the winner will
be selected based on a weighted combination of all of the following
(the weights are to be determined):

1) Software Novelty/Innovation, e.g., unified approach to
analyses through the implementation of analytic metadata,
integration of data mining with data visualization, integration
with other systems in novel ways, user interaction, built-in
intelligence, etc.

2) Efficiency, i.e., people and CPU time

3) KDD Methodology, including but not limited to:

- Data Archaeology, including but not limited to:

Data Hygiene (quality-control and cleaning)
Identification and elimination of noise
Preprocessing
Identification and elimination of constants
Identification and treatment of missing values
Identification (and treatment) of outliers
Identification (and treatment) of non-linearity
Identification (and treatment) of non-normality
Creation of derived features based on string-to-numeric
conversions
Creation of derived features based on dates
Creation of derived features based on time series
smoothing
Discretization or binning of continuous features
Criterion-based discretization or binning of nominal
features
Creation of derived features based on feature interactions
and/or ratios
Creation of derived features based on transformations
Identification of feature measurement scales (i.e., nominal,
ordinal, continuous)

- Exploratory Data Analysis (EDA), including but not limited to:

Collinearity screening (elimination of redundant features)
Feature dimensionality reduction
Feature subset selection
Data visualization

- Model Development and Implementation, including but not limited
to:

Application of the data mining algorithm(s) to extract
patterns
Evaluation of alternative algorithms, modeling technologies
Validation of results (to avoid over-fitting)
Interpretability of extracted patterns
Data visualization
Return on investment (ROI) or back-end analysis
Application of the learned knowledge to the universe, i.e.,
scoring.

B. CLUSTERING OR SEGMENTATION CATEGORY:

In the clustering or segmentation category, the validity of the final
solution will be determined based on a combination of the relevant
items listed above and one or more of the following:

1) External evaluation, i.e., using samples from known clusters

2) Internal evaluation, i.e., using statistical or other measures
to characterize the goodness of fit of the clustering solution

3) Replicability, i.e., using cross-validation samples

4) Relative criteria, i.e., comparison of cluster solutions obtained
from alternative clustering algorithms applied to the same data
set.

Visualization of the final clustering solution will also be important.

REGISTRATION BROCHURE
=====================

To participate in the KDD-CUP-97, please complete the application form
below and sent it in plain ASCII format to (e-mail preferred):

*-----------------------------*
| Ismail Parsa |
| Epsilon Data Management |
| 50 Cambridge Street |
| Burlington MA 01803 USA |
| |
| E-mail: iparsa@epsilon.com |
| Phone: (617) 273-0250*6734 |
| Fax: (617) 272-8604 |
*-----------------------------*

Detailed information regarding the rules of the competition will be
sent to the participants later.

---------------------------------- cut ---------------------------------

KNOWLEDGE DISCOVERY CUP (KDD-CUP-97)

Registration Brochure

Competition category..........: (_) Classification or Discrimination
(check all that apply) (_) Clustering or Segmentation

Will you attend the KDD-97
conference..................: (_) Yes (_) No

Would you like to sponsor this
event? (terms/benefits to be
determined).................: (_) Yes (_) No

Name of software/product/tool
research prototype..........:

Status of software/product/
tool/research prototype.....: (_) Alpha (_) Beta (_) Production

Release date of software/
product/tool/research
prototype (in YYMM format)..:

Platform availability.........: (_) PC (_) Unix (_) Mainframe
(check all that apply) (_) Parallel environment (_) Other

Built-in KDDM methodology/
technology..................: (_) Graphical User Interface (GUI)
(check all that apply) (_) Data Access
(_) Data Selection (sampling, etc.)
(_) Data Preprocessing
(_) Exploratory Data Analysis
(_) Link Analysis (Associations,
Sequences, etc.)
(_) Clustering or Segmentation
(_) Time Series Analysis
(_) Classification or Discrimination
(_) Prediction or Regression
(_) Multiple Learned or Combined
Models
(_) Data Postprocessing
(_) Data and Knowledge Visualization
(_) Other, specify: _______
_______

Data mining algorithms........: (_) Supervised Neural Networks (MLP,
(check all that apply and RBF, etc.)
specify the algorithms) (_) Statistical Methods (Logistic,
^^^^^^^^^^^^^^^^^^^^^ OLS, MARS, PPR, GAM, Nearest
Neighbors, etc.)
(_) Decision Trees (ID3, C4.5, CHAID,
CART, etc.)
(_) Hybrid Systems (Neuro-fuzzy systems,
GA optimized neural systems, etc.)
(_) Unsupervised Algorithms (Kohonen
networks, K-means clustering, etc.)
(_) Case-Based Reasoning
(_) Associations and Sequence Discovery
(_) Other, specify: _______
_______

Is your software/product/tool/
research prototype:

Freeware....................: (_) Yes (_) No
Available for purchase......: (_) Yes (_) No
if 'yes' then
Price (optional, in US$)..:
Number of sites installed.:

Does your software/product/
tool/research prototype
have limitations, e.g.,
number of variables and
rows it can handle, etc.....: (_) No (_) Yes, please specify: _______

Other relevant information....:

PRIMARY CONTACT:

Name..........................:
E-mail Address................:
Phone Number..................:
Fax Number....................:
Title.........................:
Name of Company/Institution...:

Mailing Address...............:


SECONDARY CONTACT:

Name..........................:
E-mail Address................:
Phone Number..................:
Title.........................:
Name of Company/Institution...:

Mailing Address...............:

---------------------------------- cut ---------------------------------

-------------------------------------------------------------------------------
The dbworld alias reaches many people, and should only be used for
messages of general interest to the database community.

Requests to get on or off dbworld should go to listproc@cs.wisc.edu.

to subscribe send
subscribe dbworld Your Full Name

to unsubscribe send
unsubscribe dbworld

to change your address
send an unsubscribe request from the old address
send a subscribe request from the new address

to find out more options send
help
------------------------------------------------------------------------FOOTER-