Discussion:
setting load limit for atd batch system?
Add Reply
Woozy Song
2024-05-24 02:34:48 UTC
Reply
Permalink
So the atd supposedly will not start another job until load factor falls
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby
overloading the CPU. Documentation suggests setting load limit to more
than n-1 for n CPU cores, but I think that is intended for single-thread
jobs. I have tried altering the load limit in atd.service file to all
sorts of values, but second job keeps starting while the first is
flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the
desired load limit. I am aware that the load factor is an average, you
can see it changes slowly in top/htop/glances. So I also increase the
delay between jobs to 30 seconds, but still nothing works. So it looks
like I have to specify a time like 'now+60 minutes' when I submit,
requiring some guess how long first job runs. I know I can install a
proper job scheduler such as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.
Woozy Song
2024-05-24 04:24:05 UTC
Reply
Permalink
Post by Woozy Song
So the atd supposedly will not start another job until load factor falls
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby
overloading the CPU. Documentation suggests setting load limit to more
than n-1 for n CPU cores, but I think that is intended for single-thread
jobs. I have tried altering the load limit in atd.service file to all
sorts of values, but second job keeps starting while the first is
flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the
desired load limit. I am aware that the load factor is an average, you
can see it changes slowly in top/htop/glances. So I also increase the
delay between jobs to 30 seconds, but still nothing works. So it looks
like I have to specify a time like 'now+60 minutes' when I submit,
requiring some guess how long first job runs. I know I can install a
proper job scheduler such as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.
I found the trick: you have to add '-q B' to command, then load-limit
rule applies (it behaves like batch command instead of at). Otherwise it
uses default queue 'a' that only uses time without load limit.
Andreas Eder
2024-05-24 15:25:28 UTC
Reply
Permalink
Post by Woozy Song
So the atd supposedly will not start another job until load factor falls
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby
overloading the CPU. Documentation suggests setting load limit to more
than n-1 for n CPU cores, but I think that is intended for single-thread
jobs. I have tried altering the load limit in atd.service file to all
sorts of values, but second job keeps starting while the first is flogging
the CPU. I check with 'ps -ef|grep atd' to see it is using the desired
load limit. I am aware that the load factor is an average, you can see it
changes slowly in top/htop/glances. So I also increase the delay between
jobs to 30 seconds, but still nothing works. So it looks like I have to
specify a time like 'now+60 minutes' when I submit, requiring some guess
how long first job runs. I know I can install a proper job scheduler such
as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.
I found the trick: you have to add '-q B' to command, then load-limit rule
applies (it behaves like batch command instead of at). Otherwise it uses
default queue 'a' that only uses time without load limit.
I think ot is '-q b', the small letter b for the batch queue (a is for at).
The other letters are not used by default and just serve to indicate niceness.

'Andreas
--
ceterum censeo redmondinem esse delendam
Rich
2024-05-24 16:18:21 UTC
Reply
Permalink
Post by Woozy Song
So the atd supposedly will not start another job until load factor falls
below a limit. Different documentation gives the default as 0.8 or 1.5
Now I launch a job that uses 4 cores on a 6-core CPU. If I run top
command, I see four processes running close to 100%.
Now if I submit another job 10 seconds later, that starts thereby
overloading the CPU. Documentation suggests setting load limit to more
than n-1 for n CPU cores, but I think that is intended for single-thread
jobs. I have tried altering the load limit in atd.service file to all
sorts of values, but second job keeps starting while the first is
flogging the CPU. I check with 'ps -ef|grep atd' to see it is using the
desired load limit. I am aware that the load factor is an average, you
can see it changes slowly in top/htop/glances. So I also increase the
delay between jobs to 30 seconds, but still nothing works. So it looks
like I have to specify a time like 'now+60 minutes' when I submit,
requiring some guess how long first job runs. I know I can install a
proper job scheduler such as Some Grid Engine, but that is more work.
This is on Debian 11, by the way.
An alternative to using at and batch (batch is what observes the load
limit by-the-way) is to install Task Spooler and use it for 'background
jobs'. You can tell it to run jobs sequentially, or max X in parallel
(you get to pick X).

https://viric.name/soft/ts/

You can also submit jobs that "depend upon" other jobs, so that the
dependent job won't run until the "parent" completes successfully.
Loading...