Undetected Deadlocking in Sybase ASE’s Parallel Queries

SYMPTOM:  A worker process of a parallel query never seems to finish.  It may retain locks on an actual database object (e.g. table) causing blocked processes to pile up if the other processes attempt to obtain exclusive locks on the locked areas.

WORKAROUND:  While not always feasible, you can disable parallelism.

How to release the locks?   Restart ASE.
Cause (from Sybase TechSupport):

Yes, it looks like there is problem in our parallel merge code in this condition.

The parent spid 1260 and child spid 2198 both have same exclusive lock address


spid: 1260, kpid: 347801870, pss: 0x2423fd538, proc: 0x24c78f218
User: MY_DBDBO, Command: 193 SELECT
Status: (0x00010000 (P_USERPROC), 0x00000002 (P_KILLYOURSELF))
*****************************************
spid: 2198, kpid: 196611, pss: 0x246e7cc58, proc: 0x24c46ed40, parent_spid: 1260
User: MY_DBDBO, Blocked by spid: 1260, loid: 0, Command: 193 WORKER PROCESS
Status: (0x00010000 (P_USERPROC), 0x00008000 (P_BACKOUT), 0x00004000 (P_EXPRINT),
0x00000002 (P_KILLYOURSELF))

ADDRESS LOCKS:

21b33d3d0 Address 0x266d9f020, (bucket 0)
21aa775c0 swstatus=()swstatus2=(), swskipped=0, swsemaphore=0x21b33d3d0
21b00e2f0 lrlid_floid=2520, lrlid_tloid=2520, ownerspid=1260, lrtype=exclusive address, lrsemawai
t=0x21aa775c0, lrstatus=(granted ), lrcontext=(), lrsuffclass=0, lrcolumns=0x0
221924880 swstatus=(waiting )swstatus2=(), swskipped=0, swsemaphore=0x21b33d3d0
21c675180 lrlid_floid=2520, lrlid_tloid=4396, ownerspid=2198, lrtype=exclusive address, lrsemawai
t=0x221924880, lrstatus=(), lrcontext=(), lrsuffclass=0, lrcolumns=0x0

It seems that parent holds an ex_add lock but then
its child queued behind in trying to get the same type of address lock.
Thus block the child.

The parent holds also a logical lock (sh_int) on the given table.
It seems that the parent waits for the childs to disconnect but
one of the childs is not in the process of disconnecting,
instead it tries to merge some results and apparently
needs to grab an ex_addr lock on the same address hence blocked by the
parent.

spid: 2198, kpid: 196611, pss: 0x246e7cc58, proc: 0x24c46ed40, parent_spid: 1260
User: MY_DBDBO, Blocked by spid: 1260, loid: 0, Command: 193 WORKER PROCESS
Status: (0x00010000 (P_USERPROC), 0x00008000 (P_BACKOUT), 0x00004000 (P_EXPRINT),
0x00000002 (P_KILLYOURSELF))

*****************************************

spid: 1260, kpid: 347801870, pss: 0x2423fd538, proc: 0x24c78f218
User: MY_DBDBO, Command: 193 SELECT
Status: (0x00010000 (P_USERPROC), 0x00000002 (P_KILLYOURSELF))
*****************************************

spid: 77, kpid: 1572888, pss: 0x23c608888, proc: 0x24c476168, parent_spid: 1260
User: MY_DBDBO, Command: 193 WORKER PROCESS
Status: (0x00010000 (P_USERPROC))
*****************************************
spid: 78, kpid: 1507351, pss: 0x23c61cdd8, proc: 0x24c475be0, parent_spid: 1260
User: MY_DBDBO, Command: 193 WORKER PROCESS
Status: (0x00010000 (P_USERPROC))
*****************************************
spid: 79, kpid: 1441814, pss: 0x23c631328, proc: 0x24c475658, parent_spid: 1260
User: MY_DBDBO, Command: 193 WORKER PROCESS
Status: (0x00010000 (P_USERPROC))

21b33d3d0 Address 0x266d9f020, (bucket 0)
21aa775c0 swstatus=()swstatus2=(), swskipped=0, swsemaphore=0x21b33d3d0
21b00e2f0 lrlid_floid=2520, lrlid_tloid=2520, ownerspid=1260, lrtype=exclusive address, lrsemawai
t=0x21aa775c0, lrstatus=(granted ), lrcontext=(), lrsuffclass=0, lrcolumns=0x0
221924880 swstatus=(waiting )swstatus2=(), swskipped=0, swsemaphore=0x21b33d3d0
21c675180 lrlid_floid=2520, lrlid_tloid=4396, ownerspid=2198, lrtype=exclusive address, lrsemawai
t=0x221924880, lrstatus=(), lrcontext=(), lrsuffclass=0, lrcolumns=0x0

21c61aaf0 Objid 1582625650, dbid 12, (bucket 18)
21be5a380 swstatus=()swstatus2=(), swskipped=0, swsemaphore=0x21c61aaf0
21b54b9d0 lrlid_floid=2520, lrlid_tloid=2520, ownerspid=1260, lrtype=shared intent, lrsemawait=0x21be5a380

From the stack trace, the parent in err_disconnect_allchildren() calls
wt_disconnect() to disconnect the execution of the child threads. This
calls uwdisconnect() that loops waiting all the childs to disconnect.
One of the childs wants to merge results and lands in tbm_merge_beg->tbm__merge_lock->lock_address on pllctx->pc_tbm_mrg and gets queued behind the parent!

We have filed Investigated CR 476574 / ER 17226 for this issue.

CR 476574:  PLLEXEC: PLLERR: hung parent spid ex_add lock is blocking its child spid

It goes without saying, if you run into the bug, or if you think you have, contact Sybase Technical Support and reference the CR 476574 so we can get it fixed.

Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *