Contents |
QED is a batch job queue manager in the likes of NQS or
GNU queue for GNU/Linux.
It allows programmes to execute in background as batch jobs. Jobs may be scheduled to run at a later time
as in at(1). They may also be run on a resource controlled environment which includes, among other, the enforcing
of I/O bandwidth limits. An optional job completion notification delivery via email is also available.
QED is network aware and has a limited support for clustering by automatically rerouting requests to the least loaded node, thus achieving a simple form of load balancing.
Communication between clients and QED follows a simple and buzzword compatible XML based protocol. There's, beside the obvious primitive for submitting jobs, provision for job cancellation, pending queue inspection and status reporting for completed jobs. Support for job suspension and resuming may be on the way.
QED can operate as a privileged (zero uid) process or an unprivileged one. In the former case a submission request must bear valid user credentials in order to authenticate in the system. Jobs will subsequently run with the uid and gid of the calling user. In unprivileged mode all jobs are accepted and will run with the same user id QED is running under.
For logging and bookkeeping purposes, QED produces a completion certificate for each completed job. A job is considered to have terminated successfully if it returned a null status. This behaviour follows the convention pervasive in Unix systems. The completion certificate includes, among other fields, the exit time, the return status and, if available, a dump of the job's stdout and stderr.
Please note that QED is currently alpha software and may have bugs, use at your own risk.
You can (and should) report bugs to the author's email contained in the AUTHORS file present in the distribution.
QED must be set up listening on at least one interface. It can optionally send load average information to a multicast group, thereby allowing each member to have a snapshot at the load distribution among the cluster. Balancing is thus achieved by rerouting to the least busy node.
There's no support for SSL. I'm not a big fan of it: it doesn't work for
datagrams and defeats the use of zero-copy optimisations present in the kernel
(most notably sendfile(2)). Anyway, the lack of built-in support may be regarded
as a good thing: it lets you make a choice between using stunnel or IPsec ESP.
QED supports the specification of certain system resource limits both globally or job-wise. The latter may be conveyed in the submission request itself. Global values will take precedence over any other ones whenever they're more restrictive. There's currently no support for a more fine-grained control (e.g. specifying limits on a user basis).
The following resources are supported:
Starting with fsize, the limits are enforced via setrlimit(2) and reaching
those values will trigger the semantics documented in the man pages.
The bandwidth enforcing is only available for the x86 architecture.
It is implemented using ptrace(2) tricks and code injection,
thereby hijacking all system calls related to I/O (read(2), write(2) et al.)
and introducing a suitable delay whenever the instantaneous byte-rate exceeds
the allowed one. Asynchronous I/O system calls are currently not trapped.
Byte-rate limits may be specified as a whole, thus constraining all I/O or alternatively can be break down into the following families:
Each category may be further subdivided into inbound I/O, outbound I/O or joint I/O (a limit regardless of "direction").
The remaining resources are controlled by peeping periodically at the suitable entries in /proc.
pcpu_params is a more elaborate directive. It comprises the following sub-parameters:
If QED is launched as root it will turn on mandatory authentication for all
requests made. QED uses Linux-PAM API to authenticate the user. Upon successful
authentication jobs will run under that user id after chdir(2) to his/her home
directory. The environment will comprise the following variables:
HOME equates to the user's home directory.
PATH equates to the path defined in the configuration appended with
the component $HOME/bin.
Cancelling and listing jobs will be limited to those ones submitted by the authenticated user.
Note that scheduled jobs cannot be rerouted to another host. This is because, since authentication is deferred till the time the job is to be started, the user credentials would have to be stored persistently in the meantime, possibly in plain text, which is notoriously a security breach.
Please set up an stunnel or IPsec ESP (this is a good starting point) when running QED under root unless confidentiality, specially of user credentials, is of no concern to you.
When running as a non-privileged user QED will accept every request from everyone allowed to connect to it. Jobs will run under the same user id of the QED process.
As part of the distribution it is also provided a simple QED client written in perl. Make sure you have the following perl modules installed:
POSIX IO::Socket Term::ReadKey HTML::Entities Getopt::LongAlternatively you can write you own dedicated client with such prosaic tools as a shell script and
netcat(1).
It's not difficult since QED uses a simple XML based protocol.
For proper installation start by unpacking the QED release and change directory into the extracted top directory:
$ tar xvjf qed-x.x.x.tar.bz2
$ cd qed-x.x.x
Then run:
$ ./configure [--prefix=<dir>] [--enable-debug] [--disable-optimization] [--disable-pam]
The available options are:
prefix: is the top directory where QED gets installed. Check that you have permissions to create it. The
installation process will create the following subdirectories:
bin: the QED binary and the provided client will be installed here;
lib: support libraries;
etc: configuration files;
doc: html documentation (this file).
enable-debug: builds QED in debug mode which will produce a torrent of debug messages mainly useful for
bug tracking or developing. It also adds the -g flag to the compilation options so that a symbol table
is present in the final program allowing one to use the debugger;
disable-optimization: compiles QED without optimization, ie with -O0. This is useful if
you intend to run under the debugger or otherwise experience a compiler internal error while compiling one of
the sources;
disable-pam: disable the use of libPAM, therefore rendering useless the authentication features that allows
QED to run jobs under different user ids. Turn it off if you don't intent to run QED under root or don't have libPAM
installed and don't want to bother installing it.
$ make && make install
$ qed [-p <pid-file>] [-c <config-file>] [-d] [-l]
Options are as follows:
-p: file which will hold the pid of the running process. Useful for init scripts. It defaults to a file
named qed.pid in the current working directory;
-c: the QED configuration file. See the section Configuration for details.
If not specified QED will try to read the file qed.conf.xml in the current working directory;
-d: daemonize. If not present QED will run in the foreground which might be useful if it has been compiled with
debug support;
-l: use localtime instead of UTC for date representation. This affects date fields appearing
on responses to queue_stat_request and
completion certificates. Since localisation can be a mess, specially when you're
subject to summer daylight savings, you're probably better off using the UTC default.
Now, for something more detailed. If you intend to run QED as root so you can authenticate different users and run
jobs under their respective ids, you'll have to add a file entry named qed in the pam configuration directory present in your
installation, which typically is /etc/pam.d.
/etc/passwd and /etc/shadow, and,
in this case, the qed file contents shall be:
auth required pam_env.so auth required pam_unix.so nodelayIn case you use ldap authentication, replace the above lines with:
auth required pam_env.so auth required pam_ldap.soYou get the idea. If you don't don't want to mess with PAM or just don't trust this code, then don't run QED as root, period. As mentioned above, a simple QED client is included in the distribution. It is, unsurprisingly, named
qed-client.
You can use it to submit jobs, query their state or cancel them. The usage is:
qed-client [-a] [-l <limits-file>] [qed-host:port] {submit|queue-stat|job-stat|cancel} ...
qed-host:port specifies the address and port the QED server is bound to. It defaults to localhost:3345.
Options are as follows:
-a: use authentication. The client will request a username and a password and pass on the credentials to QED
which will try to authenticate the user. On success QED will run the job under the specified uid. Note that QED must be
running as root and pam must be properly configured to support this feature.
-l: specifies a file containing a XML tree describing the resource limits which will constrain the
submitted job. If there are more restrictive limits specified within QED's configuration file, these will always
take precedence.
submit [-r] [-c count] [-p period] [-t <iso8601-date>] [-n email [-n email ...]] <command-line>: submits a job for
execution. It expects the command line to be executed to follow, eg. qed-client submit wget -O - http://some.host.
Upon success QED will return a job identification. See submit_response.
The command line must not start with a dash. Sub-options have the following meanings:
-r: if present, QED will try to use its internal load balancing strategy to consider redirecting the request to another QED host;
-c: retry count. Number of times this job will be retried if unsuccessful. Defaults to 0;
-p: amount of seconds to wait between retries. This is only meaningful if -c is specified. Defaults to 0, ie. the job will
be resubmitted immediately;
-t: timestamp for deferred execution, that is, ask QED to submit job at the specified time;
-n: email address for completion notification. It may be specified multiple times to build a recipient list.
queue-stat [-g]: queries QED about the jobs running or pending. If authentication is in force, it only shows jobs for this user.
Note that completed jobs are not shown. You'll have to browse through the
Completion certificates. If the switch -g is present after the command,
it will instruct QED to propagate the query throughout all known QED hosts and return an aggregation of all the results.
job-stat <job-id>: retrieves the completion certificate for the specified job id.
cancel <job-id>: stop a job given a job id returned in either one of the previous commands.
<submit-request>
<auth-info> <!-- authentication descriptor: optional -->
<user>string</user>
<cred>string</cred> <!-- credential, typically a password -->
</auth-info>
<!-- executable file to be submitted ie argv[0] -->
<command>string</command>
<arg>string</arg> <!-- further optional argument argv[1] -->
<!-- ... -->
<arg>string</arg> <!-- further optional argument argv[n] -->
<!-- alternatively the program can be specified as it would be invoked
in a shell although no meta characters are supported -->
<command-line>string</command-line>
<!-- if present, the submitted job is to be deferred for execution at the the specified timestamp.
Use the ISO 8601 format, eg yyyy-mm-dd [hh:mm:ss] [{+|-}hh]. Default is to submit right away.-->
<submit-time>string</submit-time>
<!-- number of times this job will be resubmitted if not successful ie.
whenever its exit code is non null. default: 0 -->
<retries>int</retries>
<!-- amount of seconds the job will be postponed before being retried. It's only meaningful if <retries>
has been specified. default: 0 -->
<retry-period>int</retry-period>
<!-- if true allows this job to be submitted on another qed host if it is
deemed convenient according to the load balancing algorithm. default: false -->
<redirect>bool</redirect>
<!-- email address used for mail notification -->
<notify-rctp>string</notify-rctp>
<!-- ... -->
<notify-rctp>string</notify-rctp> <!-- further optional address -->
<!-- job priority. queued jobs are maintained in descending priority order
so misuse of this parameter could lead to resource starvation of the lower
priority jobs. default: 0 -->
<priority>int</priority>
<!-- resource limits description -->
<limits>see here</limits>
</submit-request>
<submit-response>
<!-- error code. 0 if successful -->
<status>int</status>
<!-- an error message, if that is the case -->
<reason>string</reason>
<!-- job id -->
<jid>int</jid>
<!-- uri of the qed host the request has been redirected to, if applicable -->
<redir-uri>string</redir-uri>
</submit-response>
<cancel-request>
<auth-info> <!-- authentication descriptor: optional -->
<user>string</user>
<cred>string</cred> <!-- credential, typically a password -->
</auth-info>
<!-- job id to cancel -->
<jid>int</jid>
</cancel-request>
<cancel-response>
<!-- error code. 0 if successful -->
<status>int</status>
<!-- an error message, if that is the case -->
<reason>string</reason>
</cancel-response>
<queue-stat-request>
<auth-info> <!-- authentication descriptor: optional -->
<user>string</user>
<cred>string</cred> <!-- credential, typically a password -->
</auth-info>
<!-- if true this request is multicast to every qed host in order to retrieve
job status from any job in the farm submitted by this very user
default: false -->
<global>bool</global>
</queue-stat-request>
0 (false)
value, only information about jobs submitted to the specified QED host will be returned.
The format is as follows:
<queue-stat-response>
<!-- error code. 0 if successful -->
<status>int</status>
<!-- an error message, if that is the case -->
<reason>string</reason>
<!-- job id -->
<jid>int</jid>
<uri>localhost</uri> <!-- literally "localhost" but it really doesn't matter -->
<entry> <!-- one entry per job -->
<!-- job's corresponding executable, argv[0] -->
<command>string</command>
<arg>string</arg> <!-- argv[1] -->
<!-- ... -->
<arg>string</arg> <!-- argv[n] -->
<submit-time>string</submit-time> <!-- submission timestamp -->
<running>bool</running> <!-- whether job is running or still pending -->
<start-time>string</start-time> <!-- timestamp of process creation if running -->
<pid>int</pid> <!-- corresponding process id if job is running -->
<retries>int</retries> <!-- current retry count if specified at submission -->
</entry>
</queue-stat-response>
In the case global has a true (non null) value, the response will be like:
<queue-stat-response>
<!-- error code. 0 if successful -->
<status>int</status>
<!-- an error message, if that is the case -->
<reason>string</reason>
<!-- job id -->
<jid>int</jid>
<peer> <!-- one entry per QED host that currently has jobs submitted by the
user -->
<uri>string</uri> <!-- QED host address and port -->
<entry> <!-- one entry per job -->
<!-- job's corresponding executable, argv[0] -->
<command>string</command>
<arg>string</arg> <!-- argv[1] -->
<!-- ... -->
<arg>string</arg> <!-- argv[n] -->
<submit-time>string</submit-time> <!-- submission timestamp -->
<running>bool</running> <!-- whether job is running or still pending -->
<start-time>string</start-time> <!-- timestamp of process creation if running -->
<pid>int</pid> <!-- corresponding process id if job is running -->
<retries>int</retries> <!-- current retry count if specified at submission -->
</entry>
</peer>
</queue-stat-response>
<job-stat-request>
<auth-info> <!-- authentication descriptor: optional -->
<user>string</user>
<cred>string</cred> <!-- credential, typically a password -->
</auth-info>
<!-- if true this request is multicast to every qed host in order to retrieve
job status from any job in the farm submitted by this very user
default: false -->
<global>bool</global>
</job-stat-request>
<job-stat-response>
<!-- error code. 0 if successful -->
<status>int</status>
<!-- an error message, if that is the case -->
<reason>string</reason>
<!-- if status is null, the following fields are the ones found in the completion certificate. -->
</job-stat-response>
<error>
<msg>string</msg> <!-- an error message -->
</error>
Furthermore, when QED runs as root, one additional subdirectory per user will be created. It has the appropriate
ownership and a permission mode of 0700, which is to say, only accessible by the owner.
This subdirectory will contain all certificates pertaining to jobs submitted by that user.
An exit status is considered to be a success one if it's null, and a failure otherwise.
There are myriads of reasons for a job to abort including crashes and most notably forced exits whenever resource control usage is in force and a process has hit a limit. A process may also be coerced into a premature exit when QED receives a TERM signal.
The format of the completion record is as follows:
<job>
<!-- job id -->
<jid>int</jid>
<!-- program file: argv[0] -->
<command>string</command>
<arg>string</arg> <!-- argv[1] -->
<!-- ... -->
<arg>string</arg> <!-- argv[n] -->
<!-- user id -->
<uid>int</uid>
<!-- group id -->
<gid>int</gid>
<start-time>string</start-time> <!-- timestamp of process creation -->
<pid>int</pid> <!-- process id -->
<exit-time>string</exit-time> <!-- timestamp of process exit -->
<!-- exit status -->
<status>int</status>
<!-- captured dump of process stdout up to a configurable limit -->
<stdout>string</stdout>
<!-- captured dump of process stderr up to a configurable limit -->
<stderr>string</stderr>
<!-- message explaining the reason for a forced exit, if applicable -->
<coerced-exit>string</coerced-exit>
</job>
<config>
<!--
Top of the directory hierarchy used by QED for bookkeeping purposes, namely job queues
and completion certificates. The latter will be stored in spool-dir/results and
spool-dir/errors, for successful and failed jobs respectively. When QED runs as root
these will be found instead at spool-dir/results/user and spool-dir/errors/user,
respectively. Naturally, user refers to the username of the job submitter.
Defaults to "/var/spool/qed" -->
<spool-dir>string</spool-dir>
<!-- a network endpoint QED will listen to. It has the uri-like format
xml://address:port.
This is a mandatory parameter and can be specified multiple times for establishing more
than one listening address
<listener>string</listener>
<!-- ... -->
<listener>string</listener>
<!-- value of the PATH environment variable to be passed to every spawned process.
Defaults to "/bin:/usr/bin" -->
<path>string</path>
<!-- full path of the mailer program used for notification of job completion.
Defaults to "/usr/sbin/sendmail" -->
<mailer>string</mailer>
<!-- thread-pool related directives -->
<thread-pool>
<!-- maximum number of threads in the pool. Default: 4 -->
<max-threads>int</max-threads>
<!-- if false specifies that the threads should only be created when needed.
Default is true, meaning that all threads are created upon QED startup.-->
<on-demand>bool</on-demand>
</thread-pool>
<!-- network related directives -->
<network>
<!-- timeout in msec for socket I/O. Default: 5s -->
<recv-timeout>int</recv-timeout>
<!-- size in bytes of the internal buffers used for stream socket I/O.
Default: 8192 (8KB) -->
<stream-buffer-size>int</stream-buffer-size>
<!-- largest amount of bytes allocated by QED when parsing a request. This is
to avoid insanely malformed requests. Default: 65536 (64KB) -->
<max-alloc>int</max-alloc>
<!-- multicast endpoint used for internal communication among multiple QED hosts.
Its format is mcast-address:port (eg 224.0.0.254:1025). Default: none -->
<mcast_group>string</mcast_group>
</network>
<!-- process related directives -->
<process>
<!-- maximum number of bytes of captured process output to
stdout or stderr. Defaults to 0, ie no limit. It might be wise to modify this
value, though. -->
<max-capture>int</max-capture>
<!-- maximum number of running jobs at a time. Default: 4 -->
<max-workers>int</max-workers>
<!-- QED's process monitor polling period in msec. This affects several important
core actions like waiting for finished processes, gathering of statistics and
resource usage, etc. Default: 10000 -->
<poll-tick>int</poll-tick>
<!-- resource limits descriptor: see discussion in Resource usage control -->
<limits>
<time-quota>int</time-quota>
<cpu>int</cpu>
<vsize>int</vsize>
<fsize>int</fsize>
<nproc>int</nproc>
<nofile>int</nofile>
<pcpu-params> <!-- cpu usage descriptor -->
<!-- job consumption of cpu time as a percentage of the
elapsed cpu time during a polling period -->
<value>int</value>
<!-- number of times (polling intervals) a job is allowed to
keep the above cpu usage before being penalised. Default: 0 -->
<multiplicity>int</multiplicity>
<!-- whether a job is to be either terminated (killed) or
suspended if it exceeds its quota. Default: true -->
<terminate>bool</terminate>
</pcpu-params>
<bw> <!-- NB joint and the pair (in,out) are mutually exclusive.
joint will always take precedence -->
<global> <!-- overall bandwidth limit -->
<joint>int</joint> <!--
joint, ie. both inbound and outbound, byte-rate -->
<in>int</in>
<out>int</out>
</global>
<net> <!-- socket bandwidth limit -->
<joint>int</joint>
<in>int</in>
<out>int</out>
</net>
<block> <!-- block device (typically disk) bandwidth limit -->
<joint>int</joint>
<in>int</in>
<out>int</out>
</block>
<chr> <!-- character device bandwidth limit -->
<joint>int</joint>
<in>int</in>
<out>int</out>
</chr>
</bw>
</limits>
</process>
</config>