Random thoughts from an unusual company

Debug Days. Sametime Failing To Login

Gabriella Davis  5 July 2012 12:53:50
Here was an old problem that came back from nowhere.

I built two new ST 8.5.2 IFR1 servers for a customer.  With a Sametime Proxy Server and a separate Multiplexor.  I set up an LTPA Token and SSO. I pointed the servers at the production LDAP environment (Domino based). I tested and all was perfect. I  handed over to the customer for their testing.

Fast forward 6 weeks.  The customer is now ready to do testing so I go back to the servers, create a name change request for LDAP conversion, copy over the vpuserinfo from the existing (8.0.2) servers and run stnamechange.  All converts fine.  I bring the servers back up.

I get "server not responding".  Weird.  From both servers and the MUX.  From the Sametime Proxy I get "cannot perform request". The WAS logs show error 80000000 which basically means "yeah, something's broken but not sure what".  The WAS code sends me to a blank IBM documentation page showing the code and empty lines for "Description" and "Resolution".  So here's what I try
  1. I can login fine using a browser going into stcenter
  2. I verify the vpuserinfo is accessible, readable and has the right data.nsf
  3. I try removing the SSO configuration
  4. I verify the LTPA token (which isn't called ltpatoken) is defined correctly in sametime.ini (Debug section ST_TOKEN_TYPE="nameoftoken")
  5. I remove the ltpatoken definition from sametime.ini
  6. I enable vp_trace_all=1 in sametime.ini

between each of these steps I restart everything which is a 20 minute cycle but I want to narrow in and I'm seeing no detail in the logs.
With vp_trace_all on I can see that when I try and login it does find me in LDAP but still I get "server not responding"

So I try using a fake name and password, that gives me "incorrect login".  Interesting. So "server not responding" only occurs if I do manage to authenticate. I confirm this with other accounts.  

Now I'm weirded out so I ping Carl on chat to run through what I did and see if it rings any bells with him.  After a while, he remembers a very old bug from early Sametime and suggests I try a new vpuserinfo.  So I had a backup of my vanilla vpuserinfo from the original install, I replace that and everything works.  I do a load convert on the converted vpuserinfo , replace that onto the server and I'm back in business.

Here "server not responding" really meant "there's something I don't like about your vpuserinfo".  It should have been a 1hr troubleshoot if I had workable error codes, instead of the 6hrs it turned into.  Now added to my todo for a migration is running a "convert" (not a design refresh) on the vpuserinfo I'm trying to deploy.