On Fri, 17 Feb 2006, Graeme Davis wrote:
> It hangs the whole server and I have a script that will kill -USR1 backtrace
> & restart Roxen when it's hung. Looking at some of the debug logs, it seems
> like most threads are hung on destroy() calls. But it's hung on calls to
> my local MySQL dbs which I know are up, so something is causing everything
> to hang.... Could it be the create() call that locks stuff?
The locking has probably been done in DBManager.pmod, which means that the
lock is held for a too long time.
> Background: "PERPT" is the shoddy Oracle DB that goes down a lot.
>
> 15:51:00 : __builtin.mutex: lock()
> 14m38.7s : base_server/roxenloader.pike:1413: SQL( "mysql;//u:<p[at]h>/etms:-"
> )->destroy()
> : base_server/emit_object.pike:61: get_row()
>
> : ### Thread 11:
> : __builtin.mutex: lock()
> : base_server/roxenloader.pike:1413: SQL( "local:rw" )->destroy()
> : base_server/roxen.pike:5038:
> roxen->compile_security_pattern("",RoxenModule(CCARE/email#0))
>
> : ### Thread 14:
> : pike/lib/pike/modules/Sql.pmod/oracle.pike:
> create("PERPT","","u","p")
> : pike/lib/pike/modules/Sql.pmod/Sql.pike:223:
> create("PERPT",0,"u","p",0)
> 15:51:00 : etc/modules/DBManager.pmod:243:
> sql_cache_get("oracle://u:<p[at]PERPT>")
Ok, to me the above looks like thread 14 has taken the sq_cache_lock, and
thread 11 (and others) hang waiting for it.
A possible work around could be to start the server with -DNO_DB_REUSE.
A proper fix would probably involve letting DBManager.sql_cache_get()
release the sq_cache_lock during the call to get_sql_handler().
> Hope this provides more info on potential solutions =)
>
> Thanks a lot,
>
> Graeme
--
Henrik Grubbström <grubba[at]roxen.com>
Roxen Internet Software AB
|