Skip to content
Snippets Groups Projects
Commit 659fcfec authored by Xavier Morel's avatar Xavier Morel Committed by fw-bot
Browse files

[FIX] core: SIGXCPU in worker processes


odoo/odoo#30688 (6ce2d6ef) added an
indirection in prefork workers: Python-level signal handlers are
delayed until native calls have ended (e.g. accept() or
execute()). Running the actual work in a sub-thread allowed the main
thread to handle signals in all cases.

However there is apparently an issue with SIGXCPU on linux (possibly
other cases as well): SIGXCPU is delivered to the child thread (if
possible?) and Thread.join apparently stops it from redelivered to
the main thread (Thread.join is signal-interruptible since 3.2 but
possibly not Python-interruptible).

Blocking SIGXCPU on the child thread causes the OS to deliver on the
main thread and fixes the issue.

Also split set_limits so it sets the signal handler in the parent
thread but properly updates the soft limit in the child after each
request, as the goal is to put a hard limit on the CPU time per
request, not on the worker. 6ce2d6ef would set the limit once then
never update it, likely cycling workers more than desired.

While at it:

* block other signals with a handler set, they seem to work
  regardless on linux but other OS may have a different way of
  dispatching process-directed signals
* unset signals which are set by the prefork server but whose
  set behavior makes no sense in workers:

  - TERM and CHLD were already unset
  - HUP is used to restart the server, workers can just be killed
  - TTIN and TTOU configure the number of workers

closes odoo/odoo#39723

X-original-commit: 549bd199
Signed-off-by: default avatarXavier Morel (xmo) <xmo@odoo.com>
parent bb490301
No related branches found
No related tags found
No related merge requests found
......@@ -883,6 +883,13 @@ class Worker(object):
def signal_handler(self, sig, frame):
self.alive = False
def signal_time_expired_handler(self, n, stack):
# TODO: print actual RUSAGE_SELF (since last check_limits) instead of
# just repeating the config setting
_logger.info('Worker (%d) CPU time limit (%s) reached.', self.pid, config['limit_time_cpu'])
# We dont suicide in such case
raise Exception('CPU time limit exceeded.')
def sleep(self):
try:
select.select([self.multi.socket, self.wakeup_fd_r], [], [], self.multi.beat)
......@@ -909,15 +916,9 @@ class Worker(object):
set_limit_memory_hard()
def set_limits(self):
# SIGXCPU (exceeded CPU time) signal handler will raise an exception.
# update RLIMIT_CPU so limit_time_cpu applies per unit of work
r = resource.getrusage(resource.RUSAGE_SELF)
cpu_time = r.ru_utime + r.ru_stime
def time_expired(n, stack):
_logger.info('Worker (%d) CPU time limit (%s) reached.', self.pid, config['limit_time_cpu'])
# We dont suicide in such case
raise Exception('CPU time limit exceeded.')
signal.signal(signal.SIGXCPU, time_expired)
soft, hard = resource.getrlimit(resource.RLIMIT_CPU)
resource.setrlimit(resource.RLIMIT_CPU, (cpu_time + config['limit_time_cpu'], hard))
......@@ -938,11 +939,15 @@ class Worker(object):
self.multi.socket.setblocking(0)
signal.signal(signal.SIGINT, self.signal_handler)
signal.signal(signal.SIGXCPU, self.signal_time_expired_handler)
signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGHUP, signal.SIG_DFL)
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
signal.set_wakeup_fd(self.wakeup_fd_w)
signal.signal(signal.SIGTTIN, signal.SIG_DFL)
signal.signal(signal.SIGTTOU, signal.SIG_DFL)
self.set_limits()
signal.set_wakeup_fd(self.wakeup_fd_w)
def stop(self):
pass
......@@ -964,14 +969,18 @@ class Worker(object):
sys.exit(1)
def _runloop(self):
signal.pthread_sigmask(signal.SIG_BLOCK, {
signal.SIGXCPU,
signal.SIGINT, signal.SIGQUIT, signal.SIGUSR1,
})
try:
while self.alive:
self.check_limits()
self.multi.pipe_ping(self.watchdog_pipe)
self.sleep()
if not self.alive:
break
self.process_work()
self.check_limits()
except:
_logger.exception("Worker %s (%s) Exception occured, exiting...", self.__class__.__name__, self.pid)
sys.exit(1)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment