Monday, March 17, 2008

Jabberd2 Fails

You know how certain programs may work for the longest time, but only fail once someone else uses them? Jabberd2's done that to me. It was on my server for months, giving me good service until a friend of mine tried to log into his Jabber account after months of not doing so. Since then, it segfaults at startup. Well, the router part does, anyway.

sx (io.c:234) decoded read data (71 bytes):
>>> NAD OP nad_new: 0xc832be10
>>> NAD OP nad_add_namespace: 0xc832be10
>>> NAD OP nad_find_scoped_namespace: 0xc832be10
>>> NAD OP nad_add_namespace: 0xc832be10
>>> NAD OP nad_find_scoped_namespace: 0xc832be10
>>> NAD OP nad_append_elem: 0xc832be10
>>> NAD OP nad_append_attr: 0xc832be10
>>> NAD OP nad_print: 0xc832be10
sx (io.c:89) completed nad:
sx (chain.c:119) calling nad read chain
>>> NAD OP nad_find_attr: 0xc832be10
sx (sasl_gsasl.c:291) auth request from client (mechanism=DIGEST-MD5)
sx (sasl_gsasl.c:334) sasl context initialised for 9

Program received signal SIGSEGV, Segmentation fault.
0x00002d46e72f1715 in ?? ()
(gdb) where
#0 0x00002d46e72f1715 in ?? ()
#1 0x0000000000000001 in ?? ()
#2 0x000004f5c81db5ad in ?? ()
#3 0x0000000000000008 in ?? ()
#4 0x000004f5c832bf10 in ?? ()
#5 0x00000071ffffffff in ?? ()
#6 0x000004f500000000 in ?? ()
#7 0x0000000000000000 in ?? ()

Changing CFLAGS/CXXFLAGS has no bearing on this. It looks like I'll finally get around to finding the problem in the code today, though.


It looks like I've located the problem, finally. I've modified their code to print lots of stuff:

sasl_gsasl.c (340, 400)

puts("ANONYMOUS");
/*
* special case for SASL ANONYMOUS: ignore the initial
* response provided by the client and generate a random
* authzid for the user, as specified in XEP-0175
*/
(ctx->cb)(sx_sasl_cb_GEN_AUTHZID, NULL, (void **)&out, s, ctx->cbarg);
buf = strdup(out);
buflen = strlen(buf);
} else {
puts("NON-ANON");
/* decode and process */
_sx_sasl_decode(in, inlen, &buf, &buflen);
puts(buf);
puts("SASL_DECODE");
}
ret = gsasl_step(sd, buf, buflen, &out, (size_t *) &outlen);
puts("GSASL_STEP");
if(ret != GSASL_OK && ret != GSASL_NEEDS_MORE) {
puts("NOT NEEDS MORE");
_sx_debug(ZONE, "gsasl_step failed, no sasl for this conn; (%d): %s", ret, gsasl_strerror(ret));
_sx_nad_write(s, _sx_sasl_failure(s, _sasl_err_MALFORMED_REQUEST), 0);
puts("STARTFREE");
if(out != NULL) free(out);
if(buf != NULL) free(buf);
puts("ENDFREE");
return;
}
}

else {
/* decode and process */
_sx_sasl_decode(in, inlen, &buf, &buflen);
if(!sd) {
_sx_debug(ZONE, "response send before auth request enabling mechanism (decoded: %.*s)", buflen, buf);
_sx_nad_write(s, _sx_sasl_failure(s, _sasl_err_MECH_TOO_WEAK), 0);
if(buf != NULL) free(buf);
return;
}
_sx_debug(ZONE, "response from client (decoded: %.*s)", buflen, buf);
ret = gsasl_step(sd, buf, buflen, &out, (size_t *) &outlen);
}

puts("0:FREE BUF");
if(buf != NULL) free(buf);
puts("0:FREED BUFF");

/* auth completed */
if(ret == GSASL_OK) {
puts("GSASL_OK");
_sx_debug(ZONE, "sasl handshake completed");

puts("FREE OUT");
if(out != NULL) free(out);
puts("FREED OUT");

/* send success */
puts("SUCCESS!");
_sx_nad_write(s, _sx_sasl_success(s), 0);
sx (io.c:234) decoded read data (71 bytes):
sx (io.c:89) completed nad:
sx (chain.c:119) calling nad read chain
sx (sasl_gsasl.c:291) auth request from client (mechanism=DIGEST-MD5)
sx (sasl_gsasl.c:334) sasl context initialised for 6
SASL CONTEXT INIT
VOID SD
NON-ANON

SASL_DECODE
GSASL_STEP
0:FREE BUF

Program received signal SIGSEGV, Segmentation fault.
0x000031b7b4783715 in ?? ()

It looks like it's trying to free() memory it can't, but I've only seen "invalid double free()" in situations like this, so something else must also be at work. I've checked the beginning of their function, and they do initialize the pointers to NULL.


Poking around a bit more, I've removed the free()-related stuff, and get this in the resolver, rather than the router:

sx (io.c:210) passed 216 read bytes
sx (chain.c:93) calling io read chain
sx (io.c:234) decoded read data (216 bytes): cmVhbG09ImphYmJlcmQtcm91dGVyIiwgbm9uY2U9IkVnWGtwVTVaQ1pyZkpQNDhUUGtqYUE9PSIsIHFvcD0iYXV0aCwgYXV0aC1pbnQiLCBjaGFyc2V0PXV0Zi04LCBhbGdvcml0aG09bWQ1LXNlc3M=
sx (io.c:89) completed nad: cmVhbG09ImphYmJlcmQtcm91dGVyIiwgbm9uY2U9IkVnWGtwVTVaQ1pyZkpQNDhUUGtqYUE9PSIsIHFvcD0iYXV0aCwgYXV0aC1pbnQiLCBjaGFyc2V0PXV0Zi04LCBhbGdvcml0aG09bWQ1LXNlc3M=
sx (chain.c:119) calling nad read chain
sx (sasl_gsasl.c:448) data from client
sx (sasl_gsasl.c:452) decoded data: realm="jabberd-router", nonce="EgXkpU5ZCZrfJP48TPkjaA==", qop="auth, auth-int", charset=utf-8, algorithm=md5-sess

Program received signal SIGSEGV, Segmentation fault.
0x000039405831632b in ?? ()

Looking into that a bit more, I've added some more lines to the code.

sasl_gsasl.c (440, 500)

}

/** process handshake packets from the server */
static void _sx_sasl_server_process(sx_t s, sx_plugin_t p, Gsasl_session *sd, char *in, int inlen) {
char *buf, *out;
int buflen, outlen, ret;

_sx_debug(ZONE, "data from client");

/* decode the response */
puts("DECODE");
_sx_sasl_decode(in, inlen, &buf, &buflen);
_sx_debug(ZONE, "decoded data: %.*s", buflen, buf);
puts("COMPLETE");

/* process the data */
puts("GSASL STEP");
ret = gsasl_step(sd, buf, buflen, &out, (size_t *) &outlen);
puts("0: FREE BUF");
if(buf != NULL) {
free(buf);
buf = NULL;
}
puts("0: FREED BUF");

/* in progress */
if(ret == GSASL_OK || ret == GSASL_NEEDS_MORE) {
puts("OK OR NEEDS MORE");
_sx_debug(ZONE, "sasl handshake in progress (response: %.*s)", outlen, out);

/* encode the response */
puts("ENCODE");
_sx_sasl_encode(out, outlen, &buf, &buflen);
puts("COMPLETE");

puts("1: FREE OUT");
if(out != NULL) free(out);
puts("1: FREED OUT");

_sx_nad_write(s, _sx_sasl_response(s, buf, buflen), 0);

puts("1: FREE BUF");
if(buf != NULL) free(buf);
puts("2: FREE BUF");

return;
}

puts("2: FREE OUT");
if(out != NULL) free(out);
puts("2: FREED OUT");

/* its over */
puts("ABORT!");
_sx_debug(ZONE, "sasl handshake aborted; (%d): %s", ret, gsasl_strerror(ret));

_sx_nad_write(s, _sx_sasl_abort(s), 0);
}

/** main nad processor */

It doesn't look good, as gsasl_step() fails here.

sx (io.c:89) completed nad: cmVhbG09ImphYmJlcmQtcm91dGVyIiwgbm9uY2U9IlgyQUdiVFpvaFdFQi9xV2tNeVNyS3c9PSIsIHFvcD0iYXV0aCwgYXV0aC1pbnQiLCBjaGFyc2V0PXV0Zi04LCBhbGdvcml0aG09bWQ1LXNlc3M=
sx (chain.c:119) calling nad read chain
sx (sasl_gsasl.c:448) data from client
DECODE
sx (sasl_gsasl.c:453) decoded data: realm="jabberd-router", nonce="X2AGbTZohWEB/qWkMySrKw==", qop="auth, auth-int", charset=utf-8, algorithm=md5-sess
COMPLETE
GSASL STEP

Program received signal SIGSEGV, Segmentation fault.
0x000035dc3dbbd32b in ?? ()

I'm going to try removing gSASL and switch back to Cyrus. As far as I can tell, that'll require a lot of ebuild tweaks.


My assumption was incorrect: I've been using gSASL since before it stopped working, and Cyrus isn't supported. It looks like I won't have a Jabberd on my server anytime soon. Even Ejabberd fails to work properly.

No comments: