(DBWORLD) Web querying using W3QS

Oded Shmueli (oshmu@CS.Technion.AC.IL)
Fri, 23 May 1997 11:03:40 -0500 (CDT)

You are welcome to pose WWW queries using W3QS:

http://www.cs.technion.ac.il/~W3QS

W3QS is a query system for the WWW. It views the Web as a
huge database. The system supports a declarative query
language called W3QL. It also provides a number of
"friendly" querying interfaces. One of the benefits of W3QL
is its ability to use existing search services as indexes
thereby avoiding the manual sifting through multitudes of
documents.

You may obtain a W3QS account at the http address above, and
will receive your password via e-mail. A directory of
queries will then be maintained for you. The "My First Query"
button provides a quick guided tour.

You may enter a query through a number of interfaces. An
entered query is either immediately executed or is declared
as "Edit Only", in which case it is simply saved for
future use. The execution of a query proceeds independently
of whether the user stays ``logged on'' or not. At any
point, even as the query is being executed, the query may be
edited and then either re-submitted, or simply saved in the
user's directory. During execution, the query engine gen-
erates an execution trace - showing the progression of the
query engine's navigation through the Web - that may be
viewed at any time. The query engine executes at the Technion.

Once the query search is completed, the user may view the
Table of Solutions which provides a value for each query
variable. For certain users, the actual qualifying pages
are kept (and not just their addresses). Pages are kept in
a hierarchical directory structure produced for the specific
query. One may view this structure.

Many useful queries may be easily specified by the Simple
and Power intuitive interfaces. To write more powerful
queries in W3QL one needs a working knowledge of PERL regu-
lar expressions.

An interesting feature of W3QL is its ability to specify how
to fill out forms and to do so while navigating through the
Web. Another attractive feature is W3QL's ability to
specify the re-evaluation of queries at pre-determined time
intervals.

To get a taste of W3QL's querying abilities, consider the
following problem. We'd like to search through a site, say
the Data Mining group publications at West University at
http://www.west.edu/datamining/publications.html
and look for papers in Postscript that are authored by Smith
and Jones. Here is our first cut into the problem:

Select
>From n1,l1,n2
where
n1 in {
http://www.west.edu/datamining/publications.html
};
n1: PERLCOND '(n1.content =~ /Jones/i) &&
'(n1.content =~ /Smith/i);
n2: PERLCOND ' (n2.format =~ /postscript/i)';
using ISEARCHd

This specifies a pattern consisting of page n1 pointing via
link l1 to page n2. In addition, n2 should be a Postscript
file. Also, page n1 should contain the sub-strings Jones
(/i means case independent matching) and Smith. This is not
yet what we want, since currently we know that page n1 con-
tains both Jones and Smith, but there is no guarantee that
page n2 (the Postscript file) is actually co-authored by
them. The following PERL regular expression ensures co-
authoring:

(l1.content =~ /"(.*)"/) &&
(n1.content =~ /Jones[^"]+Smith[^"]+"$1"/i) ';

The first line identifies the http address component, out of
the link, and assigns it to PERL variable $1. The second
line checks that the page n2 contains the sub-string Jones
followed by any sequence of characters different than " (the
expression for that is [^"]+), followed by the content of
link l2 (as identified by $1) in quotes.

The resulting query is:

Select
>From n1,l1,n2
where
n1 in {
http://www.west.edu/datamining/publications.html
};
n1,l1: PERLCOND '
(l1.content =~ /"(.*)"/) &&
(n1.content =~
/Jones[^"]+Smith[^"]+"$1"/i) ';
n2: PERLCOND ' (n2.format =~ /postscript/i)';
using ISEARCHd

The W3QS homepage at
http://www.cs.technion.ac.il/~konop/w3qs.html
contains more details about W3QL and W3QS. Help on using
the system and its interfaces are available at

http://www.cs.technion.ac.il/~W3QS/cgi_bin/new/general.html (a general introduction)
http://www.cs.technion.ac.il/~W3QS/cgi_bin/new/simple.html (the Simple Interface)
http://www.cs.technion.ac.il/~W3QS/cgi_bin/new/complex.html (the Power Interface)

We hope you find W3QS useful. Feedback is welcome!

Disclaimers: W3QS is an experimental system built for
research and experimentation. There is no commitment to sup-
port W3QS users and there is no kind of warranty regarding
its quality. In particular:

(1) Query results may be wrong.
(2) W3QS's availability may fluctuate.
(3) W3QS service may be slow.
(4) There are no privacy guarantees concerning your
queries, they may be viewed by others.
(5) There are no durability guarantees, your queries may be
lost.

-------------------------------------------------------------------------------
The dbworld alias reaches many people, and should only be used for
messages of general interest to the database community.

Requests to get on or off dbworld should go to listproc@cs.wisc.edu.

to subscribe send
subscribe dbworld Your Full Name

to unsubscribe send
unsubscribe dbworld

to change your address
send an unsubscribe request from the old address
send a subscribe request from the new address

to find out more options send
help
------------------------------------------------------------------------FOOTER-