setting load limit for atd batch system?

Discussion:

(too old to reply)

Woozy Song

2024-05-24 02:34:48 UTC

So the atd supposedly will not start another job until load factor falls
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby
overloading the CPU. Documentation suggests setting load limit to more
than n-1 for n CPU cores, but I think that is intended for single-thread
jobs. I have tried altering the load limit in atd.service file to all
sorts of values, but second job keeps starting while the first is
flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the
desired load limit. I am aware that the load factor is an average, you
can see it changes slowly in top/htop/glances. So I also increase the
delay between jobs to 30 seconds, but still nothing works. So it looks
like I have to specify a time like 'now+60 minutes' when I submit,
requiring some guess how long first job runs. I know I can install a
proper job scheduler such as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.

Woozy Song

2024-05-24 04:24:05 UTC

Permalink

Post by Woozy Song
So the atd supposedly will not start another job until load factor falls
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby
overloading the CPU. Documentation suggests setting load limit to more
than n-1 for n CPU cores, but I think that is intended for single-thread
jobs. I have tried altering the load limit in atd.service file to all
sorts of values, but second job keeps starting while the first is
flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the
desired load limit. I am aware that the load factor is an average, you
can see it changes slowly in top/htop/glances. So I also increase the
delay between jobs to 30 seconds, but still nothing works. So it looks
like I have to specify a time like 'now+60 minutes' when I submit,
requiring some guess how long first job runs. I know I can install a
proper job scheduler such as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.

I found the trick: you have to add '-q B' to command, then load-limit
rule applies (it behaves like batch command instead of at). Otherwise it
uses default queue 'a' that only uses time without load limit.

Andreas Eder

2024-05-24 15:25:28 UTC

Permalink

Post by Woozy Song
So the atd supposedly will not start another job until load factor falls
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby
overloading the CPU. Documentation suggests setting load limit to more
than n-1 for n CPU cores, but I think that is intended for single-thread
jobs. I have tried altering the load limit in atd.service file to all
sorts of values, but second job keeps starting while the first is flogging
the CPU. I check with 'ps -ef|grep atd' to see it is using the desired
load limit. I am aware that the load factor is an average, you can see it
changes slowly in top/htop/glances. So I also increase the delay between
jobs to 30 seconds, but still nothing works. So it looks like I have to
specify a time like 'now+60 minutes' when I submit, requiring some guess
how long first job runs. I know I can install a proper job scheduler such
as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.

I found the trick: you have to add '-q B' to command, then load-limit rule
applies (it behaves like batch command instead of at). Otherwise it uses
default queue 'a' that only uses time without load limit.

I think ot is '-q b', the small letter b for the batch queue (a is for at).
The other letters are not used by default and just serve to indicate niceness.

'Andreas

--
ceterum censeo redmondinem esse delendam

Rich

2024-05-24 16:18:21 UTC

Permalink

An alternative to using at and batch (batch is what observes the load
limit by-the-way) is to install Task Spooler and use it for 'background
jobs'. You can tell it to run jobs sequentially, or max X in parallel
(you get to pick X).

https://viric.name/soft/ts/

You can also submit jobs that "depend upon" other jobs, so that the
dependent job won't run until the "parent" completes successfully.