[Backgroundrb-devel] trouble stopping backgroundrb
John O'Shea
joshea at nooked.com
Thu Sep 18 06:21:00 EDT 2008
Slight variation that
- deletes pid for already-gone processes
- exits (with errror code -1) without deleting the pid file if there was
a permission problem
begin
- pgid = Process.getpgid(pid)
- Process.kill('TERM', pid)
- Process.kill('-TERM', pgid)
- Process.kill('KILL', pid)
- rescue Errno::ESRCH => e
- puts "Deleting pid file"
- rescue
+ pgid = Process.getpgid(pid)
+ Process.kill('-TERM', pgid)
+ rescue Errno::ESRCH
+ puts $!
+ # No process - Do nothing.
+ rescue Errno::EPERM
+ # Permission denied.
+ puts $!
+ Process.exit!
ensure
File.delete(arg_pid_file) if File.exists?(arg_pid_file)
end
hemant kumar wrote:
> Okay folks here is a patch to "backgroundrb" script, which should fix
> some issues:
>
> diff --git a/script/backgroundrb b/script/backgroundrb
> index dabf80b..8d4bb78 100755
> --- a/script/backgroundrb
> +++ b/script/backgroundrb
> @@ -49,18 +49,9 @@ when 'stop'
> def kill_process arg_pid_file
> pid = nil
> File.open(arg_pid_file, "r") { |pid_handle| pid =
> pid_handle.gets.strip.chomp.to_i }
> - begin
> - pgid = Process.getpgid(pid)
> - Process.kill('TERM', pid)
> - Process.kill('-TERM', pgid)
> - Process.kill('KILL', pid)
> - rescue Errno::ESRCH => e
> - puts "Deleting pid file"
> - rescue
> - puts $!
> - ensure
> - File.delete(arg_pid_file) if File.exists?(arg_pid_file)
> - end
> + pgid = Process.getpgid(pid)
> + Process.kill('-TERM', pgid)
> + File.delete(arg_pid_file) if File.exists?(arg_pid_file)
> end
> pid_files = Dir["#{RAILS_HOME}/tmp/pids/backgroundrb_*.pid"]
> pid_files.each { |x| kill_process(x) }
>
> What it does is:
> 1. Deleting by group id is enough for master process.
> 2. Do not delete the pid file if, there was an exception while stopping
> the daemon.
> 3. Do not handle exceptions silently.
>
> Please try this and let me know, how it goes.
>
>
>
> On Wed, 2008-09-17 at 17:35 +0100, John O'Shea wrote:
>
>> Jonathan,
>> Glad you raised this, I've been spending some time trying to
>> diagnose this exact same problem.
>> The exception handling code in the "when 'stop'" block (in
>> script/backgroundrb) could definitely could be improved somewhat
>> - check that the process with 'pid' exists before trying to kill it
>> - rescue permission exceptions (Errno::EPERM)
>> - only delete the pid file if the process pid does not still exist (in
>> ensure block)
>> - be a little more verbose to stdout/stderr
>>
>> While we are on the subject of shutdown, - when the backgroundrb process
>> gets a HUP signal does it wait for existing workers to complete any work
>> methods that are executing or is the 'Process.kill('-TERM', pgid)' call
>> intended to make the OS handle this?
>>
>> We use capistrano to deploy our application (stopping and restarting
>> backgroundrb after the rails app has been updated). It would be great
>> if we could have more predictability regarding shutting down
>> backgroundrb (i.e. have the backgroundrb disable the reactor loop in
>> idle workers and wait for all active workers to finish methods, then
>> shutdown").
>>
>> John.
>>
>> Jonathan Wallace wrote:
>>
>>> Hi Ryan,
>>>
>>> I recently ran into the same issue where the backgroundrb process
>>> would not respond to ./script/backgroundrb stop command. The pid file
>>> was being deleted but the actual process was not being killed. I'm
>>> running packet 0.1.12 on gentoo.
>>>
>>> I'm not exactly sure what conditions put backgroundrb into such a
>>> state but I've decided to modify the script/backgroundrb to behave a
>>> little differently.
>>>
>>> My hypothesis is that if one of the Process.kill method calls in
>>> script/backgroundrb raises an exception, the pid file is deleted even
>>> though the kill signal is never sent. At this point, running starting
>>> and stopping backgroundrb never affects the original still running
>>> backgroundrb process.
>>>
>>> There are a couple of reasons that I believe an exception could be
>>> raised. Either the Process.getpgid(pid), Process.kill('TERM', pid) or
>>> the PRocess.kill('-TERM', pgid) raise an exception or the effective
>>> uid of the user running script/backgroundrb stop does not have
>>> permission to kill those processes.
>>>
>>> To fix this, we've removed the Process.getpgid and the two
>>> Process.kill's that are sending the TERM signal. Since we've
>>> architected our backgroundrb jobs to be persistent and idempotent (a
>>> db backed queue written before the feature appeared in bdrb), we'll
>>> just use the KILL signal.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Jonathan
>>>
>>> On Tue, Sep 16, 2008 at 12:11 PM, Ryan Case <mrryancase at gmail.com> wrote:
>>>
>>>
>>>> Hi folks -
>>>>
>>>> I'm having trouble getting backgroundrb to stop after one of the
>>>> packet_worker_r processes dies.
>>>>
>>>> If backgroundrb is running properly,
>>>> "/path/to/application/script/backgroundrb stop" works fine, but often
>>>> one of the packet_worker_r processes dies, and the stop command no
>>>> longer works after that (it runs, but it does not stop the processes,
>>>> and so then start doesn't work).
>>>>
>>>> The only thing that seems to work at that point is to manually kill
>>>> the processes that are still running, and then the start works, but
>>>> that is going to make restarting via monit a lot less clean.
>>>>
>>>> Any ideas would be much appreciated!
>>>>
>>>> I'm using github version of backgroundrb, and packet 0.1.13 running on ubuntu.
>>>>
>>>> Thanks!
>>>> Ryan
>>>> _______________________________________________
>>>> Backgroundrb-devel mailing list
>>>> Backgroundrb-devel at rubyforge.org
>>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Backgroundrb-devel mailing list
>>> Backgroundrb-devel at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>
>>>
>>
>
>
--
John O'Shea, CTO at Nooked
www: http://www.nooked.com/
cell: +353 87 992 9959
skype: joshea
More information about the Backgroundrb-devel
mailing list