================================================================================
Change Log for Distributor Project
================================================================================
2007-06 1.6.2 - Reading BSI_distributor.cfg also from /etc/BioSolveIT
================================================================================
Configuration
-------------
- Configuration file "BSI_distributor.cfg" now is searched also in directory
"/etc/BioSolveIT" after checking directory "/etc".
2007-01 1.6.1 - Minor bugfix for shared objects
================================================================================
Main Workflow
-------------
- Bugfix: Changed search path for shared objects
2006-04 1.6 - Binary executable
================================================================================
Installation
------------
- BioSolveIT Distributor now is shipped optionally as a binary executable
together with a subdirectory with shared objects and a template configuration
file.
No Python has to be installed to run Distributor.
Main Workflow
-------------
- Bugfix: If Distributor was used with python interface, one process requested
multiple licenses when repeatedly asking for statistics info, e.g.
>>> distributor.distributor(statistics=current_order_id)
2006-03 1.5.7.r - Bugfix, new empty orders created on merging error
================================================================================
Main Workflow
-------------
- Bugfix: In case of some error while merging, the Distributor started a new
empty order each merger interval, fixed now.
2006-02 1.5.7.q - Bugfix, checking output failed
================================================================================
Main Workflow
-------------
- Bugfix: Merging failed when checking for differences in some dialects of
command 'tail'.
2005-07 1.5.7 - Bugfixing, Opt --no-split to use input files unmodified
================================================================================
Main Workflow
-------------
- Distributor prints FlexLM "processor id" (COMPOSITE=XXXXXXXXXXXX) of current
compute node, if no license was found. Login to FlexLM license server and
execute Distributor again to get id of license server.
- Use option --no-split (without option arguments) to pass input file(s)
unmodified, that is: not splitted, to batch queuing system.
- Bugfix: Option --debug had no effect together with --submit, fixed now. See
also changes to version 1.5.6.
2005-06-09 1.5.6 - Lightweight BQS interface without job control and stats
================================================================================
Main Workflow
-------------
- Changes in command "--submit ", see also 1.5.3
The Distributor command --submit is now redesigned to offer a lightweight
interface to the batch queuing system without maintaining any job control or
statistics about running jobs.
In detail:
- On success, Distributor writes a comma separated list of BQS job ids to
stdout, e.g. "70589,70590,70591,70592,70593,70594,70595"; nothing else is
written to stdout.
- Distributor won't create any temporary order and job directories.
- Only one file is created temporarily in the current working directory:
BSI_JobScript. As soon as all jobs are submitted to the batch queuing
system, this file will be removed instantly.
- Any output to stdout/stderr of your BQS jobs is *not* captured!
- From the Distributor's side, there is no way to determine whether a job
succeeded or not; you may, of course, track your jobs with your BQS tools,
as far as supplied by your batch queuing system.
Distributor command --submit together with option --debug does create
temporary order directory structure as normal distributor orders would.
2005-02-18 1.5.5 - Remarkable speed up in submitting jobs
================================================================================
Main Workflow
-------------
- Remarkable speed up in submitting jobs
When submitting more than 1,000 jobs, submitting slowed down increasingly with
each job -> fixed.
- Bugfix: Create main tmp dir if not existent (Distributor.finish_configuration)
Temporary directory was not created when using command --submit.
2004-12-10 1.5.4 - licensed demo version
================================================================================
Configuration
-------------
- Licensed demo version. License is valid until 2005-01-31.
Licensing is subject to change with coming releases.
2004-11-03 1.5.3 - special command "submit"
================================================================================
Main Workflow
-------------
- New command "--submit "
You can use Distributor to just spawn a couple of same jobs via the batch
queuing system. The number of jobs to start is passed as the only argument.
This command will create the same directory structure as "normal" orders. A
special input file is generated, which contains just the job numbers, each
line one number. This file then is splitted using the AsciiLine splitter,
with item-size and segment-size set to 1. Each tool will get a segment file
as command line argument that contains just one line with the job number.
2004-08-24 1.5.2 - bugfixes
================================================================================
Configuration
-------------
- Flag options (True/False) were ignored in config file, fixed.
[2004081110000037]
- Formatted and sorted output of specific order configuration file.
- Main section renamed to [Distributor], slightly changed configuration scheme.
There are two config hierarchies: "instances" (= files and command line) and
sections. The section hierarchy changed, now
- [Distributor] takes the highest precedence over any other section entry.
Only entries in the same section *in following instances* can override a
setting made here.
- Special package sections are second level, e.g. [Tool FlexX].
- Default package sections at last, e.g. [Tool DEFAULT]. Settings made here
are only used if not found in any upper level.
See user guide for more details.
Basic Usage
-----------
- Messages automatically generated by cron are not longer duplicated.
Old "auto-messages" are deleted with a regular expression configurable with
option --crontab-comment. [2004081110000037]
Documentation
-------------
- Chapter 4 "Configuration"
- 4.1, 4.2 changed
- 4.3 new: "Hierarchy of Configuration Sections"
2004-08-16 1.5.1 - SD bugfix
================================================================================
FileSplitter
------------
- Bugfix: *Append* delimiter $$$$ instead of prepend. [2004081310000033]
Changed files: Only distributor/FileSplitter/Splitsdf.py.
2004-08-13 1.5 - Merging in-order (initiated by user)
================================================================================
Main Workflow
-------------
- Merging in-order (initiated by user) [Support request 2004081110000037]
The user now has the possibility to merge the jobs' output preserving the same
order as given by the order of the input segments.
The workflow is kind of "semi-automatic" up to now:
1. The user starts a new order with commandline option --no-merge (currently
there is a known bug with specifying this option in the config file).
2. The user is responsible to wait until *all* jobs have been executed.
3. After all jobs are done, the user calls explicitly the distributor's merge
command:
% BSI_distributor --merge
4. Now the merged output is to be found in the output directory
** in the same order, as the segments were created. **
Please note:
The order of the input segments depends on the file splitter function. Please
keep this in mind when writing own file splitters.
BQS
---
- Changed default config settings for option "bqs" in section [BQS
SunGridEngine] to
bqs = %(arg-bqs-path)s/qsub -S /bin/sh
was before
bqs = %(arg-bqs-path)s/qsub -S /bin/bash
2004-07-29 1.4 - Interface to batch queuing system SunGridEngine
================================================================================
BQS
---
- Interface to batch queuing system SunGridEngine implemented.
New file distributor/BqsWrapper/WrapBqsSunGridEngine.pyc.
NOTE: Inside BSI_distributor.cfg you have to supply the path to "bash" with
option -S to command qsub. Default settings: "-S /bin/bash".
2004-07-02 1.3.1 - Bugfix (sdf file splitter)
================================================================================
FileSplitter
------------
- Bugfix: missing separator in SDF file.
2004-06-21 1.3 - Callback
================================================================================
Main Workflow
-------------
- Implemented callback mechanism.
See documentation section 3.5 "Callback mechanism". If a job fails on item N,
the tool is restarted with the same segment from item N+1 onwards.
Requirement: The tool must call a tiny python script "BSI_ItemDone.py" after
each successfully processed item to notify Distributor.
FileSplitter
------------
- Implemented simple unittests for each FileSplitter.
- Adapted each FileSplitter's method "do_split()" for new callback mechanism.
New parameter "start" for skipping the first (start-1) items of the input.
Documentation
-------------
- 3.2 Appended: How to get order_dir without parsing output
- 3.4 Using Distributor from Python
- 3.5 Callback mechanism
- 5.4 Changed: Modifications of do_split() due to callback mechanism
- 5.5 Unittests for each FileSplitter
- 6.1.6 Updated and revised: Description of --statistics output
2004-05 1.2 - Added functionality
================================================================================
Configuration
-------------
- User may specify multiple temporary directories.
Option "tmp-dir" takes a list of directories separated by ':' (colon). A
directory tree for input and job subdirectories is built up as before in each
of these directories, except that each input directory contains only a subset
of job directories now. Each job writes it's output to one of these
directories. Each directory has to be accessible from each compute node.
Goal is to distribute the nodes' output to multiple NFS servers. Jobs are
distributed to the different temporary directories in a simple round robin
procedure.
The first directory specified is the "master" directory, where Distributor
creates the current order's directory to store additional info for this order.
The first directory is used for job output in the same way as the others.
- Global output directory configurable via "--output/-o"
Specifying option "--output/-o " will store total output files into
specified directory without intermediate files in /BSI_output.
- New option segment-placeholder (description see below).
- Arbitrary tool sections allowed in BSI_distributor.cfg.
Store your own tool settings in a new [Tool xyz] section (you can choose any
name you like for xyz), and refer to this section with option tool-section or
-T at the command line. All tools are handled the same:
If the tool option contains the segment-placeholder (defaults to @), the
placeholder is replaced by the segment file name, otherwise the segment file
name is simply appended to the tool call separated by a blank.
- Introduced user options.
Options beginning with "arg-" in the config file are treated as user options.
Goal of user options is to make configuration even more flexible.
Example:
# BSI_distributor.cfg
[Tool my]
tool = /usr/bin/my_tool -c %(arg-cfgpath)s/my_tool.cfg
arg-cfgpath = /home/user/config
# commandline
user@machine:/home/user> BSI_distributor.py input.txt -T my -a cfgpath=/tmp
User options (or user arguments) can be defined at the command line with
"--arg =" or "-a =". This will define "arg-".
User options (as any other option as well) can be used in other options as
python-style string placeholders with the format "%()s", e.g.
"%(arg-script)s"
There is no default for user options! So, if used in another option as
placeholder, you are responsible to set a value for this placeholder, either
in a config file or at the command line.
- Implemented max-file-size checking.
Global output files are splitted into _part_12345, if their
size exceed the file size specified in option max-file-size (in bytes).
One thing to consider: Each file is treated separately, i.e. only this file
will be splitted. If multiple files have to be splitted, the part numbers do
*not* correspond.
- Expansion of environment variables.
Expansion is done only once on creation of order. Used config settings are
stored in /BSI_distributor.cfg. Example: tmp-dir = $HOME/tmp
- Shortcut for --bqsqueueid: --queue/-q
If you specify a value for option "-q" at the commandline, this value is
available in other options as "%(bqsqueueid)s".
Example (snippet of BSI_distributor.cfg and command prompt):
[BQS OpenPBS]
bqsqueueid = defaultqueue
bqs = qsub -q %(bqsqueueid)s
--
% BSI_distributor.py molecule.mol2 -q fastq
FileSplitter
------------
- SplitAsciiPattern now takes (only) regular expressions as patterns.
Additional options:
--no-keep-pattern: Default is to keep pattern between items in a segment
file. Use this option to turn off.
--keep-pattern-before: Put pattern before first item in segment. Default:
don't put pattern before first item.
--keep-pattern-after: Put pattern after last item in segment. Default:
don't put pattern after last item.
Basic Usage
-----------
- Statistic info is returned as a simple dictionary when called from python.
Fields supplied currently:
'order id', 'order dir', 'output dir', 'input files', 'job count',
'bqs job ids', 'jobs done', 'jobs merged', 'jobs removed', 'jobs failed',
'finished'. Field types: count: int, finished: bool, : str.
- Sending email stats and new command --mailstats.
If Distributor is started with a positive value for "mail-interval", it will
send a status info message each minutes. This option defaults
to 10. You may want to start this email message service later manually with
Distributor command "--mailstats ". This message uses the cron
daemon as well as the merging process does. For low-level access use
'crontab -l' to check these entries, 'crontab -e' to edit/delete.
Distributor will stop sending emails as soon as the order has run to
completion. If you quit the order by "--kill" or "--remove", email sending
will be stopped as well.
2004-04 1.1 - Stability improvements
================================================================================
Basic Usage
-----------
- Renamed options --file-type, --tool-name and --bqs-name to
- --filetype-section (-F)
- --tool-section (-T)
- --bqs-section (-B)
to clarify connection to referring sections in BSI_distributor.cfg; old option
names are still available.
- Current order's configuration is stored in order directory.
- Renamed command --statistic to --statistics.
- For convenience you can use the order directory (absolute path) as
argument to distributor special commands like --statistics. The order
directory including order ID is given on stdout after having sent jobs.
BQS
---
- The batch queuing system is assumed to support the options -o and -e to
specify the file names for stdout and stderr!
- Currently supported: OpenPBS.
Diverse
-------
- FileSplitter directory moved under distributor directory. Easier installation.
- Main log is world writable.
2004-03-31 1.0.1 - Bugfixes
================================================================================
Main Workflow
-------------
- Order directory is returned when called from python
FileSplitter
------------
- sln, smiles, AsciiLine, AsciiPattern
Diverse
-------
- Version info via --version
2004-03-26 1.0 - Initial installation
================================================================================
- --remove
2004-03-18 First version of --kill and --resume
================================================================================
- Logging Done, extendable
- Merger Done, todo: check merge
- --kill First version, presumably buggy
- --resume First version, presumably buggy
2004-02-20 Initial CVS version
================================================================================
- Main workflow: distributor/__init__.py, distributor/Distributor.py
- File Splitter: FileSplitter/*
- Tool Wrapper: distributor/ToolWrapper/* -- First trial version
- BQS Wrapper: distributor/BqsWrapper/* -- First trial version
- Merger: distributor/Merger.py -- In progress ...
|