Gabriella Davis 5 July 2012 12:53:50Here was an old problem that came back from nowhere.
I built two new ST 8.5.2 IFR1 servers for a customer. With a Sametime Proxy Server and a separate Multiplexor. I set up an LTPA Token and SSO. I pointed the servers at the production LDAP environment (Domino based). I tested and all was perfect. I handed over to the customer for their testing.
Fast forward 6 weeks. The customer is now ready to do testing so I go back to the servers, create a name change request for LDAP conversion, copy over the vpuserinfo from the existing (8.0.2) servers and run stnamechange. All converts fine. I bring the servers back up.
I get "server not responding". Weird. From both servers and the MUX. From the Sametime Proxy I get "cannot perform request". The WAS logs show error 80000000 which basically means "yeah, something's broken but not sure what". The WAS code sends me to a blank IBM documentation page showing the code and empty lines for "Description" and "Resolution". So here's what I try
- I can login fine using a browser going into stcenter
- I verify the vpuserinfo is accessible, readable and has the right data.nsf
- I try removing the SSO configuration
- I verify the LTPA token (which isn't called ltpatoken) is defined correctly in sametime.ini (Debug section ST_TOKEN_TYPE="nameoftoken")
- I remove the ltpatoken definition from sametime.ini
- I enable vp_trace_all=1 in sametime.ini
between each of these steps I restart everything which is a 20 minute cycle but I want to narrow in and I'm seeing no detail in the logs.
With vp_trace_all on I can see that when I try and login it does find me in LDAP but still I get "server not responding"
So I try using a fake name and password, that gives me "incorrect login". Interesting. So "server not responding" only occurs if I do manage to authenticate. I confirm this with other accounts.
Now I'm weirded out so I ping Carl on chat to run through what I did and see if it rings any bells with him. After a while, he remembers a very old bug from early Sametime and suggests I try a new vpuserinfo. So I had a backup of my vanilla vpuserinfo from the original install, I replace that and everything works. I do a load convert on the converted vpuserinfo , replace that onto the server and I'm back in business.
Here "server not responding" really meant "there's something I don't like about your vpuserinfo". It should have been a 1hr troubleshoot if I had workable error codes, instead of the 6hrs it turned into. Now added to my todo for a migration is running a "convert" (not a design refresh) on the vpuserinfo I'm trying to deploy.
- Comments