Main Page   Class Hierarchy   Compound List   File List   Compound Members   File Members  

Verifying.

It is time to verify the implementation w.r.t the specifications (we hardly looked at the specifications before).

It turns out that the system does not completely correspond to the specifications:

  1. It always expects two arguments like so
            httpstats config log
    while the specs say
            httpstats log [config]
    Thus the order is wrong and, moreover config is optional. If there is only 1 command line argument, the system should use a default configuration described by
            format path 1-4
            select path /
            exclude path /robots.txt
    How to fix this with minimal effort?

    Changing the order is trivial, as is checking whether there is only 1 argument. Also, supplying the default configuration can be done quickly in httpstats.C: we change the type of config_file from fstream to istream*. If the configuration file is specified on the commandline, config_file will be initialized as follows:

            config_file = new ifstream(config_file_name.c_str());
    else, we have a string representing the default configuration.
            const string default_config_text(
                    "format path 1-4\nselect path /\nexclude path /robots.txt");
    and turn this into a stream:
            config_file = new strstream(default_config_text.data(),
                                        default_config_text.size());
    We should not forget to delete config when we don't need it anymore, and replace earlier usage of config_file by *config_file. Also, we should update the Makefile check target to take into account the new command line argument order.

  2. The output for
        format path 1
    is wrong: it just shows the count for "/", instead of "/a" for each component a.

    Only now do we realize that we can probably assume that all paths in log entries start with "/", which means that we do not have to store the initial "" in the vector representation of Path. However, if we drop the initial "" component, how will we represent the path "/" ? Just reserving the vector <""> for that seems reasonable. Surprisingly, this change is extremely simple to implement: in Path::parse, we replace

            string::size_type n0(ps.find_first_not_of('/',0));
           
            if (n0!=0) // the path starts with '/' which we encode as a first "/" component.
              path_.push_back("");
            
            if (n0==string::npos) // ps must be "/"
              return true;
    by
            string::size_type n0(ps.find_first_not_of('/',0));
    
            if (n0==string::npos) { // ps must be "/"
              path_.push_back("");
              return true;
              }
    In addition, we must also change (actually simplify) operator<<(ostream &,const Path &).

  3. We misunderstood select:
            select path /~
    is supposed to select paths linke "/~jane", "/~fred" etc. In the present implementation, it would only select paths like "/~/abc".

    Probably, a similar problem exists for domains, but since the specifications are not clear about that, we stick to our present implementation for Domain (and DatePattern).

    The problem is not too hard to fix: in PathExpression::eval(), we should compare the strings instead of the the vectors: instead of

            bool
            PathExpression::eval(const LogRecord& r) const {
            const Path& pr(r.path());
            if (pr.size() < path_.size())
              return false;
            return equal(path_.begin(), path_.end(), pr.begin());
            }
    we can write
            bool
            PathExpression::eval(const LogRecord& r) const {
            const string& ps(r.path().str());
            if (ps.size() < path_.str().size())
              return false;
            return equal(path_.str().begin(), path_.str().end(), ps.begin());
            }
    where we assume that const string& Path::str() returns (a reference) to the string representation of the path.

    Unfortunately, the string corresponding to a Path is currently not available in the Path. A quick fix would be to just store the parameter string of Path::parse(const string&) as a data member that is then returned by Path::str(). That is, however, not a good idea, since such an input string may contain unwanted / 's, as in "/a//b/".

    We could also compute a ``clean'' string representation, like operator<<(ostream &,const Path &) does. The question then is: do we compute such a string data member once, e.g. at the end of Path::parse() and let subsequent calls return it or do we only compute it on demand? Note that, if there is no select on path, we will never use a PathExpression and, consequently, the vast majority of Path's will not need Path::str().

    We decide to have a simple yet reasonable solution: we will use ``memoing '' where the result of an expensive ``on demand'' operation, in this case Path::str(), is stored and returned upon later calls to the operation. Thus we will add an extra (initially empty) string data member, called Path::str_, which will be used to remember the result of the first call to Path::str().

        class Path {
        public:
          ..
          const string& str() const {
            if (str_.size()==0) {
               // compute str_
               }
            return str_;
            }
          ..
        private:
          vector<string> path_;
          mutable string str_; 
        };
    Note that we know whether a call to Path::str() is the first one from the size of Path::str_: if it is zero, it is the first call, otherwise not. (We could also use a string pointer, but that would imply adding a destructor, copy constructor and assignment to Path, too much work).

    Note also how we declare the Path::str_ data member of Path as mutable. This allows us to still have Path::str() as a const member function although it may update Path::str_.

    Also, Path::parse() should clear Path::str_, since Path::path_ will no longer correspond with any nonempty Path::str_ that may have existed before the call to Path::parse().

  4. We did not do all semantic error checking on the configuration file. In particular, a second format line will be accepted without complaining. Also, although this is not in the specifications, it would be nice if the Configuration::parse() function would indicate the line number where an error occurred. The latter is easy to fix: we let Configuration::parse return an unsigned int, where 0 means ``no errors'' and any other number is the number of the line where the first error was detected.

    In main() we change

        Configuration config; 
        if (!config.parse(*config_file)) {
          cerr << "Syntax error in configuration file \"" 
               << config_file_name << "\"" << endl;
          return 1;
          }
    to
        Configuration config; 
        unsigned int err_lineno(config.parse(*config_file));
        if (err_lineno) {
          cerr << "Syntax error on line " << err_lineno
               << " in configuration file \"" << config_file_name << "\"" << endl;
          return 1;
          }
  5. As to non-functional requirements, we seem to have done a reasonable job. Most importantly, many extensions are possible without altering the basic design: accepting (and executing) multiple formats, checking the status of a request etc. Another useful extension would be to allow more complex select and exclude combinations such as ``home directories or visits afer June 1, 2000'' which are currently not supported. But they are straightforward to implement (at least the semantics), simply by adding e.g. an OrExpression class etc.


httpstats-stage03 [ 7 April, 2001]