trans
Distributor – changes
trans
contact
impressum
legal notice
print this page
BioSolveIT
The Premier Scientific Solution Provider
================================================================================
Change Log for Distributor Project
================================================================================


2007-06         1.6.2 - Reading BSI_distributor.cfg also from /etc/BioSolveIT
================================================================================

Configuration
-------------
- Configuration file "BSI_distributor.cfg" now is searched also in directory
  "/etc/BioSolveIT" after checking directory "/etc".



2007-01         1.6.1 - Minor bugfix for shared objects
================================================================================

Main Workflow
-------------
- Bugfix: Changed search path for shared objects



2006-04         1.6 - Binary executable
================================================================================

Installation
------------
- BioSolveIT Distributor now is shipped optionally as a binary executable
  together with a subdirectory with shared objects and a template configuration
  file.

  No Python has to be installed to run Distributor.

Main Workflow
-------------
- Bugfix: If Distributor was used with python interface, one process requested
  multiple licenses when repeatedly asking for statistics info, e.g.
  >>> distributor.distributor(statistics=current_order_id)



2006-03         1.5.7.r - Bugfix, new empty orders created on merging error
================================================================================

Main Workflow
-------------
- Bugfix: In case of some error while merging, the Distributor started a new
  empty order each merger interval, fixed now.



2006-02         1.5.7.q - Bugfix, checking output failed
================================================================================

Main Workflow
-------------
- Bugfix: Merging failed when checking for differences in some dialects of
  command 'tail'.



2005-07         1.5.7 - Bugfixing, Opt --no-split to use input files unmodified
================================================================================

Main Workflow
-------------
- Distributor prints FlexLM "processor id" (COMPOSITE=XXXXXXXXXXXX) of current
  compute node, if no license was found.  Login to FlexLM license server and
  execute Distributor again to get id of license server.

- Use option --no-split (without option arguments) to pass input file(s)
  unmodified, that is: not splitted, to batch queuing system.

- Bugfix: Option --debug had no effect together with --submit, fixed now.  See
  also changes to version 1.5.6.



2005-06-09      1.5.6 - Lightweight BQS interface without job control and stats
================================================================================

Main Workflow
-------------
- Changes in command "--submit ", see also 1.5.3

  The Distributor command --submit is now redesigned to offer a lightweight
  interface to the batch queuing system without maintaining any job control or
  statistics about running jobs.

  In detail:
  - On success, Distributor writes a comma separated list of BQS job ids to
    stdout, e.g. "70589,70590,70591,70592,70593,70594,70595"; nothing else is
    written to stdout.
  - Distributor won't create any temporary order and job directories.
  - Only one file is created temporarily in the current working directory:
    BSI_JobScript.  As soon as all jobs are submitted to the batch queuing
    system, this file will be removed instantly.
  - Any output to stdout/stderr of your BQS jobs is *not* captured!
  - From the Distributor's side, there is no way to determine whether a job
    succeeded or not; you may, of course, track your jobs with your BQS tools,
    as far as supplied by your batch queuing system.

  Distributor command --submit together with option --debug does create
  temporary order directory structure as normal distributor orders would.



2005-02-18      1.5.5 - Remarkable speed up in submitting jobs
================================================================================

Main Workflow
-------------
- Remarkable speed up in submitting jobs

  When submitting more than 1,000 jobs, submitting slowed down increasingly with
  each job -> fixed.

- Bugfix: Create main tmp dir if not existent (Distributor.finish_configuration)

  Temporary directory was not created when using command --submit.



2004-12-10      1.5.4 - licensed demo version
================================================================================

Configuration
-------------
- Licensed demo version.  License is valid until 2005-01-31.

  Licensing is subject to change with coming releases.



2004-11-03      1.5.3 - special command "submit"
================================================================================

Main Workflow
-------------
- New command "--submit "

  You can use Distributor to just spawn a couple of same jobs via the batch
  queuing system.  The number of jobs to start is passed as the only argument.

  This command will create the same directory structure as "normal" orders.  A
  special input file is generated, which contains just the job numbers, each
  line one number.  This file then is splitted using the AsciiLine splitter,
  with item-size and segment-size set to 1.  Each tool will get a segment file
  as command line argument that contains just one line with the job number.



2004-08-24      1.5.2 - bugfixes
================================================================================

Configuration
-------------
- Flag options (True/False) were ignored in config file, fixed.
                                                              [2004081110000037]

- Formatted and sorted output of specific order configuration file.

- Main section renamed to [Distributor], slightly changed configuration scheme.

  There are two config hierarchies: "instances" (= files and command line) and
  sections.  The section hierarchy changed, now
  - [Distributor] takes the highest precedence over any other section entry.
    Only entries in the same section *in following instances* can override a
    setting made here.
  - Special package sections are second level, e.g. [Tool FlexX].
  - Default package sections at last, e.g. [Tool DEFAULT].  Settings made here
    are only used if not found in any upper level.

  See user guide for more details.


Basic Usage
-----------
- Messages automatically generated by cron are not longer duplicated.

  Old "auto-messages" are deleted with a regular expression configurable with
  option --crontab-comment.                                   [2004081110000037]


Documentation
-------------
- Chapter 4 "Configuration"

  - 4.1, 4.2 changed
  - 4.3 new: "Hierarchy of Configuration Sections"



2004-08-16      1.5.1 - SD bugfix
================================================================================

FileSplitter
------------
- Bugfix: *Append* delimiter $$$$ instead of prepend.         [2004081310000033]

  Changed files: Only distributor/FileSplitter/Splitsdf.py.



2004-08-13      1.5 - Merging in-order (initiated by user)
================================================================================

Main Workflow
-------------
- Merging in-order (initiated by user)        [Support request 2004081110000037]

  The user now has the possibility to merge the jobs' output preserving the same
  order as given by the order of the input segments.

  The workflow is kind of "semi-automatic" up to now:
  1. The user starts a new order with commandline option --no-merge (currently
     there is a known bug with specifying this option in the config file).
  2. The user is responsible to wait until *all* jobs have been executed.
  3. After all jobs are done, the user calls explicitly the distributor's merge
     command:
     % BSI_distributor --merge 
  4. Now the merged output is to be found in the output directory
     ** in the same order, as the segments were created. **

  Please note:
  The order of the input segments depends on the file splitter function.  Please
  keep this in mind when writing own file splitters.


BQS
---
- Changed default config settings for option "bqs" in section [BQS
  SunGridEngine] to
    bqs = %(arg-bqs-path)s/qsub -S /bin/sh
  was before
    bqs = %(arg-bqs-path)s/qsub -S /bin/bash



2004-07-29      1.4 - Interface to batch queuing system SunGridEngine
================================================================================

BQS
---
- Interface to batch queuing system SunGridEngine implemented.

  New file distributor/BqsWrapper/WrapBqsSunGridEngine.pyc.

  NOTE: Inside BSI_distributor.cfg you have to supply the path to "bash" with
  option -S to command qsub. Default settings: "-S /bin/bash".



2004-07-02      1.3.1 - Bugfix (sdf file splitter)
================================================================================

FileSplitter
------------
- Bugfix: missing separator in SDF file.



2004-06-21      1.3 - Callback
================================================================================

Main Workflow
-------------
- Implemented callback mechanism.

  See documentation section 3.5 "Callback mechanism".  If a job fails on item N,
  the tool is restarted with the same segment from item N+1 onwards.

  Requirement: The tool must call a tiny python script "BSI_ItemDone.py" after
  each successfully processed item to notify Distributor.


FileSplitter
------------
- Implemented simple unittests for each FileSplitter.

- Adapted each FileSplitter's method "do_split()" for new callback mechanism.

  New parameter "start" for skipping the first (start-1) items of the input.


Documentation
-------------
- 3.2   Appended: How to get order_dir without parsing output
- 3.4   Using Distributor from Python
- 3.5   Callback mechanism
- 5.4   Changed: Modifications of do_split() due to callback mechanism
- 5.5   Unittests for each FileSplitter
- 6.1.6 Updated and revised: Description of --statistics output



2004-05         1.2 - Added functionality
================================================================================

Configuration
-------------
- User may specify multiple temporary directories.

  Option "tmp-dir" takes a list of directories separated by ':' (colon). A
  directory tree for input and job subdirectories is built up as before in each
  of these directories, except that each input directory contains only a subset
  of job directories now. Each job writes it's output to one of these
  directories. Each directory has to be accessible from each compute node.

  Goal is to distribute the nodes' output to multiple NFS servers. Jobs are
  distributed to the different temporary directories in a simple round robin
  procedure.

  The first directory specified is the "master" directory, where Distributor
  creates the current order's directory to store additional info for this order.
  The first directory is used for job output in the same way as the others.

- Global output directory configurable via "--output/-o"

  Specifying option "--output/-o " will store total output files into
  specified directory without intermediate files in /BSI_output.

- New option segment-placeholder (description see below).

- Arbitrary tool sections allowed in BSI_distributor.cfg.

  Store your own tool settings in a new [Tool xyz] section (you can choose any
  name you like for xyz), and refer to this section with option tool-section or
  -T at the command line. All tools are handled the same:
  If the tool option contains the segment-placeholder (defaults to @), the
  placeholder is replaced by the segment file name, otherwise the segment file
  name is simply appended to the tool call separated by a blank.

- Introduced user options.

  Options beginning with "arg-" in the config file are treated as user options.
  Goal of user options is to make configuration even more flexible.

  Example:
    # BSI_distributor.cfg
    [Tool my]
    tool = /usr/bin/my_tool -c %(arg-cfgpath)s/my_tool.cfg
    arg-cfgpath = /home/user/config
    
    # commandline
    user@machine:/home/user> BSI_distributor.py input.txt -T my -a cfgpath=/tmp

  User options (or user arguments) can be defined at the command line with
  "--arg =" or "-a =". This will define "arg-".
  User options (as any other option as well) can be used in other options as
  python-style string placeholders with the format "%()s", e.g.
  "%(arg-script)s"

  There is no default for user options! So, if used in another option as
  placeholder, you are responsible to set a value for this placeholder, either
  in a config file or at the command line.

- Implemented max-file-size checking.

  Global output files are splitted into _part_12345, if their
  size exceed the file size specified in option max-file-size (in bytes).

  One thing to consider: Each file is treated separately, i.e. only this file
  will be splitted.  If multiple files have to be splitted, the part numbers do
  *not* correspond.

- Expansion of environment variables.

  Expansion is done only once on creation of order.  Used config settings are
  stored in /BSI_distributor.cfg.  Example: tmp-dir = $HOME/tmp

- Shortcut for --bqsqueueid: --queue/-q

  If you specify a value for option "-q" at the commandline, this value is
  available in other options as "%(bqsqueueid)s".
  
  Example (snippet of BSI_distributor.cfg and command prompt):
  [BQS OpenPBS]
  bqsqueueid = defaultqueue
  bqs = qsub -q %(bqsqueueid)s
  --
  % BSI_distributor.py molecule.mol2 -q fastq

FileSplitter
------------
- SplitAsciiPattern now takes (only) regular expressions as patterns.

  Additional options:
  --no-keep-pattern:     Default is to keep pattern between items in a segment
                         file. Use this option to turn off.
  --keep-pattern-before: Put pattern before first item in segment. Default:
                         don't put pattern before first item.
  --keep-pattern-after:  Put pattern after last item in segment. Default:
                         don't put pattern after last item.


Basic Usage
-----------
- Statistic info is returned as a simple dictionary when called from python.

  Fields supplied currently:
  'order id', 'order dir', 'output dir', 'input files', 'job count',
  'bqs job ids', 'jobs done', 'jobs merged', 'jobs removed', 'jobs failed',
  'finished'.  Field types: count: int, finished: bool, : str.

- Sending email stats and new command --mailstats.

  If Distributor is started with a positive value for "mail-interval", it will
  send a status info message each  minutes. This option defaults
  to 10.  You may want to start this email message service later manually with
  Distributor command "--mailstats ".  This message uses the cron
  daemon as well as the merging process does.  For low-level access use
  'crontab -l' to check these entries, 'crontab -e' to edit/delete.

  Distributor will stop sending emails as soon as the order has run to
  completion.  If you quit the order by "--kill" or "--remove", email sending
  will be stopped as well.


2004-04         1.1 - Stability improvements
================================================================================

Basic Usage
-----------
- Renamed options --file-type, --tool-name and --bqs-name to
  - --filetype-section (-F)
  - --tool-section     (-T)
  - --bqs-section      (-B)
  to clarify connection to referring sections in BSI_distributor.cfg; old option
  names are still available.
- Current order's configuration is stored in order directory.
- Renamed command --statistic to --statistics.
- For convenience you can use the order directory (absolute path) as
  argument to distributor special commands like --statistics. The order
  directory including order ID is given on stdout after having sent jobs.

BQS
---
- The batch queuing system is assumed to support the options -o and -e to
  specify the file names for stdout and stderr!
- Currently supported: OpenPBS.

Diverse
-------
- FileSplitter directory moved under distributor directory. Easier installation.
- Main log is world writable.


2004-03-31      1.0.1 - Bugfixes
================================================================================

Main Workflow
-------------
- Order directory is returned when called from python

FileSplitter
------------
- sln, smiles, AsciiLine, AsciiPattern

Diverse
-------
- Version info via --version


2004-03-26      1.0 - Initial installation
================================================================================

- --remove


2004-03-18      First version of --kill and --resume
================================================================================

- Logging       Done, extendable
- Merger        Done, todo: check merge
- --kill        First version, presumably buggy
- --resume      First version, presumably buggy


2004-02-20      Initial CVS version
================================================================================

- Main workflow: distributor/__init__.py, distributor/Distributor.py
- File Splitter: FileSplitter/*
- Tool Wrapper:  distributor/ToolWrapper/* -- First trial version
- BQS Wrapper:   distributor/BqsWrapper/*  -- First trial version
- Merger:        distributor/Merger.py     -- In progress ...
Last modified Monday, 18. Jun 2007 15:03 CEST by WebMaster