(DBWORLD) FINAL CfP: KDD-97 Knowledge Discovery and Data Mining Tools Competition

Ismail Parsa (iparsa@epsilon.com)
Sat, 7 Jun 1997 10:23:19 -0500 (CDT)

----------------------------------------------------------------------
FINAL CALL FOR PARTICIPATION

KNOWLEDGE DISCOVERY CUP (KDD-CUP-97):

A Knowledge Discovery and Data Mining Tools Competition


to be held in conjunction with

THE THIRD INTERNATIONAL CONFERENCE ON
KNOWLEDGE DISCOVERY AND DATA MINING (KDD-97)

http://www-aig.jpl.nasa.gov/HyperNews/get/KDD97.html
----------------------------------------------------------------------

This year, for the first time, the KDD-97 Organization is organizing a
Knowledge Discovery and Data Mining (KDDM) tools competition
(KDD-CUP-97) in conjunction with the 3rd International Conference on
Knowledge Discovery and Data Mining (KDD-97.)

The Cup is open to all KDDM tool vendors, academics and corporations
with significant applications. All products, applications, research
prototypes and black-box solutions are welcome. If requested, the
anonymity of the participants and their affiliated companies/
institutions will be preserved. Our aim is not to rank the
participants but to recognize the most innovative, efficient and
methodologically advanced KDDM tools.

Attendance at the KDD-97 conference is not required to participate in
the CUP. Participants are required to demonstrate the performance of
their KDDM tool in one or all of the following areas:

1. Supervised Learning: Classification or Discrimination
2. Unsupervised Learning: Clustering or Segmentation.

In the interest of time, the regression or prediction category and
other descriptive modeling techniques, such as the association rules,
are not included in the competition this year.

The registration deadline for the Cup and the release date for the
training and validation data set(s) is June 19, 1997. All participants
must send back the results along with a scoring code[1] by July 17th,
one month prior to the KDD-97 conference. The scoring code will be
used by the KDD-CUP-97 committee to independently validate the results.
Each participant will receive the committee's evaluation of his/her/
their performance by August 11, 1997.

The winners will be determined based on a weighted combination of
classification accuracy (or predictive power,) software novelty (or
innovation,) efficiency (people and CPU time) and the data mining
methodology employed. The top three performing tools in each category
will be awarded Gold Miner, Silver Miner and Bronze Miner awards and
they will be listed in the KD Nuggets web site (http://www.kdnuggets.com)
until the beginning of the KDD-98 conference, unless the participants
and their affiliated companies/institutions wish to remain anonymous.

[1] The scoring code is a stand alone C or C++ callable program or
hard code that carries out all the steps required to implement
the learning algorithm outside the model building environment.
In addition to the numeric values of the weights, it also
includes preprocessing statements for treating missing values,
transforming/normalizing/standardizing inputs, etc. It is
ultimately used in computing the predicted value or output from
raw data outside the modeling environment. For example, for the
decision tree algorithms, the preprocessing code along with the
'if-then-else' rules constitutes the scoring code.

+-----------------+
| Important Dates |
+-----------------+

- June 19, 1997: Registration deadline and data set release date
- July 17, 1997: Participants turn-in the results along with the
scoring code
- August 11, 1997: Individual performance evaluations sent to the
participants
- August 14, 1997: Public announcement during the KDD-97 conference
of the top three performing tools in each category.

+------------------------------+
| KDD-CUP-97 Program Committee |
+------------------------------+

Vasant Dhar, New York University, New York, NY, USA
Ronen Feldman, Bar-Ilan University, Ramat-Gan, ISRAEL
Ismail Parsa, Epsilon Data Management, Burlington, MA, USA
Gregory Piatetsky-Shapiro, Geneve Consulting Group, Cambridge, MA, USA

+---------------------+
| EVALUATION CRITERIA |
+---------------------+

A. CLASSIFICATION OR DISCRIMINATION CATEGORY

Although the predictive power, i.e., the classification accuracy, of
the resulting model measured in terms of lift (the term 'lift' implies
improvement over random or no prediction) will be the primary
evaluation criterion in the classification category, the winner will
be selected based on a weighted combination of all of the following:

1) Software Novelty/Innovation, e.g., unified approach to analyses
through the implementation of analytic metadata, integration of
data mining with data visualization, integration with other systems
in novel ways, user interaction, built-in intelligence, etc.

2) Efficiency, i.e., people and CPU time

3) KDD Methodology, including but not limited to:

- Data Archaeology, including but not limited to:

Data Hygiene (quality-control and cleaning)
Identify and eliminate noise

Preprocessing
Identify and eliminate constants
Identify and treat missing values
Identify (and treat) outliers
Identify (and treat) non-linearity
Identify (and treat) non-normality
Create derived features based on string-to-numeric conversions
Create derived features based on dates
Create derived features based on time series smoothing
Discretize or bin continuous features
Discretize or bin nominal features based on a criterion
Create derived features based on feature interactions
Create derived features based on transformations
Identify feature measurement scales: nominal, continuous, etc.

- Exploratory Data Analysis (EDA), including but not limited to:

Collinearity screening (elimination of redundant features)
Feature dimensionality reduction
Feature subset selection
Data visualization

- Model Development and Implementation, including but not limited to:

Application of data mining algorithm(s)
Evaluation of alternative algorithms, modeling technologies
Validation of results (to avoid over-fitting)
Interpretability of extracted patterns
Data visualization
Return on investment (ROI) or back-end analysis
Application of learned knowledge to the universe, i.e., scoring.

B. CLUSTERING OR SEGMENTATION CATEGORY:

In the clustering or segmentation category, the validity of the final
solution will be determined based on a combination of the relevant
items listed above and one or more of the following:

- External evaluation, i.e., using samples from known clusters

- Internal evaluation, i.e., using statistical or other measures
to characterize the goodness of fit of the clustering solution

- Replicability, i.e., using cross-validation samples

- Relative criteria, i.e., comparison of cluster solutions obtained
from alternative clustering algorithms applied to the same data
set.

Visualization of the final clustering solution will also be important.

+-----------------------+
| REGISTRATION BROCHURE |
+-----------------------+

To participate in the KDD-CUP-97, please complete the application form
below and sent it in plain ASCII format to (e-mail preferred):

+-----------------------------+
| Ismail Parsa |
| Epsilon Data Management |
| 50 Cambridge Street |
| Burlington MA 01803 USA |
| |
| E-mail: iparsa@epsilon.com |
| Phone: (617) 273-0250*6734 |
| Fax: (617) 272-8604 |
+-----------------------------+

Detailed information regarding the rules of the competition will be
sent to the participants later.

---------------------------------- cut ---------------------------------

KNOWLEDGE DISCOVERY CUP (KDD-CUP-97)

Registration Brochure

Competition category..........: (_) Classification or Discrimination
(check all that apply) (_) Clustering or Segmentation

Will you attend the KDD-97
conference..................: (_) Yes (_) No

Would you like to sponsor this
event? (terms/benefits to be
determined).................: (_) Yes (_) No

Name of software/product/tool
research prototype..........:

Status of software/product/
tool/research prototype.....: (_) Alpha (_) Beta (_) Production

Release date of software/
product/tool/research
prototype (in YYMM format)..:

Platform availability.........: (_) PC (_) Unix (_) Mainframe
(check all that apply) (_) Parallel environment (_) Other

Built-in KDDM methodology/
technology..................: (_) Graphical User Interface (GUI)
(check all that apply) (_) Data Access
(_) Data Selection (sampling, etc.)
(_) Data Preprocessing
(_) Exploratory Data Analysis
(_) Link Analysis (Associations,
Sequences, etc.)
(_) Clustering or Segmentation
(_) Time Series Analysis
(_) Classification or Discrimination
(_) Prediction or Regression
(_) Multiple Learned or Combined
Models
(_) Data Postprocessing
(_) Data and Knowledge Visualization
(_) Other, specify: _______
_______

Data mining algorithms........: (_) Supervised Neural Networks (MLP,
(check all that apply and RBF, etc.)
specify the algorithms) (_) Statistical Methods (Logistic,
^^^^^^^^^^^^^^^^^^^^^^ OLS, MARS, PPR, GAM, Nearest
Neighbors, etc.)
(_) Decision Trees (ID3, C4.5, CHAID,
CART, etc.)
(_) Hybrid Systems (Neuro-fuzzy systems,
GA optimized neural systems, etc.)
(_) Unsupervised Algorithms (Kohonen
networks, K-means clustering, etc.)
(_) Case-Based Reasoning
(_) Associations and Sequence Discovery
(_) Other, specify: _______
_______

Is your software/product/tool/
research prototype:

Freeware....................: (_) Yes (_) No
Available for purchase......: (_) Yes (_) No
if 'yes' then
Price (optional, in US$)..:
Number of sites installed.:

Does your software/product/
tool/research prototype
have limitations, e.g.,
number of variables and
rows it can handle, etc.....: (_) No (_) Yes, please specify: _______

Other relevant information....:

PRIMARY CONTACT:

Name..........................:
E-mail Address................:
Phone Number..................:
Fax Number....................:
Title.........................:
Name of Company/Institution...:

Mailing Address...............:


SECONDARY CONTACT:

Name..........................:
E-mail Address................:
Phone Number..................:
Title.........................:
Name of Company/Institution...:

Mailing Address...............:

---------------------------------- cut ---------------------------------

-------------------------------------------------------------------------------
The dbworld alias reaches many people, and should only be used for
messages of general interest to the database community.

Requests to get on or off dbworld should go to listproc@cs.wisc.edu.

to subscribe send
subscribe dbworld Your Full Name

to unsubscribe send
unsubscribe dbworld

to change your address
send an unsubscribe request from the old address
send a subscribe request from the new address

to find out more options send
help
------------------------------------------------------------------------FOOTER-