roxen.lists.roxen.general

Subject Author Date
RE: database timeouts? Henrik_Grubbström <grubba[at]roxen[dot]com> 17-02-2006
On Fri, 17 Feb 2006, Graeme Davis wrote:

> It hangs the whole server and I have a script that will kill -USR1 backtrace
> & restart Roxen when it's hung.  Looking at some of the debug logs, it seems
> like most threads are hung on destroy() calls.   But it's hung on calls to
> my local MySQL dbs which I know are up, so something is causing everything
> to hang.... Could it be the create() call that locks stuff?

The locking has probably been done in DBManager.pmod, which means that the 
lock is held for a too long time.

> Background: "PERPT" is the shoddy Oracle DB that goes down a lot.
>
> 15:51:00  : __builtin.mutex: lock()
> 14m38.7s  : base_server/roxenloader.pike:1413: SQL( "mysql;//u:<p[at]h>/etms:-"
> )->destroy()
>          : base_server/emit_object.pike:61: get_row()
>
>          : ### Thread 11:
>          : __builtin.mutex: lock()
>          : base_server/roxenloader.pike:1413: SQL( "local:rw" )->destroy()
>          : base_server/roxen.pike:5038:
> roxen->compile_security_pattern("",RoxenModule(CCARE/email#0))
>
>          : ### Thread 14:
>          : pike/lib/pike/modules/Sql.pmod/oracle.pike:
> create("PERPT","","u","p")
>          : pike/lib/pike/modules/Sql.pmod/Sql.pike:223:
> create("PERPT",0,"u","p",0)
> 15:51:00  : etc/modules/DBManager.pmod:243:
> sql_cache_get("oracle://u:<p[at]PERPT>")

Ok, to me the above looks like thread 14 has taken the sq_cache_lock, and 
thread 11 (and others) hang waiting for it.

A possible work around could be to start the server with -DNO_DB_REUSE.

A proper fix would probably involve letting DBManager.sql_cache_get() 
release the sq_cache_lock during the call to get_sql_handler().

> Hope this provides more info on potential solutions =)
>
> Thanks a lot,
>
> Graeme

--
Henrik Grubbström					<grubba[at]roxen.com>
Roxen Internet Software AB