Gabriella Davis 4 August 2011 14:43:02Time for another Sametime blog post. I've been installing Sametime servers in various configurations for a few months now and in between I've been working with Marie and Tom on the Sametime Admin Guide which is complete in its first draft and in the revision stage. In its new incarnation Sametime isn't the easiest product to troubleshoot and I thought it would be useful to share with you some problems I've found and how I tracked them down. In many cases I should have opened support calls with IBM to resolve this stuff but I never had the time to do that, everything needed to be up and working and there wasn't the luxury of spending a few days following through support calls. I'm sorry about that because I know it's easier to identify a problem when it still exists but hopefully this will help someone else on the same path. I'll also cross-post to the Sametime forum on developerworks.
Incident 1: Sametime on Linux Security Lockout
When installing the Sametime System Console on Linux (so far confirmed with both Redhat and SLES) I have discovered that once you get to the point of exporting the LTPA token to set up SSO, on the next SSC restart you will no longer be able to log in with your admin credentials. I found an old IBM technote reference to enabling global security on portal with linux which had the same issue (and which I can't find now!). I was not only able to repeat this on different installs, but I've had 3 other companies email and ask about the same thing. The only reliable "fix" is to turn off security for the SSC so you just login using the user name (ouch) or, my preference, stop using linux for the SSC at least.
Incident 2: IBM's documents on turning off security
This document has the syntax you need for turning off security so you can login if you lose your credentials (or if incident 1 occurs). Unfortunately nowhere is there documentation to turn it back on again. I ended up using the following on the deployment manager install:
then type "securityon" on the prompt to turn security back on
Note in both these cases the servers will need restarting for the settings to take effect.
Incident 3: Server install on linux cannot connect to SSC deployment manager to pick up profile
When completing a linux install of Sametime Proxy or Meeting or Media server, etc, you would usually first create a deployment plan in the SSC then, during the install of your new Sametime component, you would connect back to the SSC and pick up that deployment plan. The default connection assumes the use of SSL and port 9443. This works fine for Windows but on some linux installs I received "server not responding" when trying to connect to the deployment manager. I stopped iptables, checked the hostname of the SSC was pingable from the new machine I was attempting to install onto, checked 9443 was listening on the SSC server, etc, but I still got the same error. Eventually I unchecked the "use SSL" checkbox and connected using port 9043 instead and it worked straightaway. Since the certificate shipped with the SSC is an internally-generated IBM certificate that isn't recognised by most browsers, I believe the linux install was refusing to connect using it, whereas Windows was much less stringent.
Incident 4: Making sure you know what hostname you install the SSC as
The SSC is the first component we install in most new Sametime implementations and during the install it uses the hostname of the machine you are on to create itself. Even if you have another FQHN that is resolvable to the SSC box, when trying to complete an install of another component such as the Meeting Server and connect to the SSC deployment manager to do so, the connection will want to use the hostname (box name) that you originally installed the SSC as. This must be resolvable from the machine you are installing from. If in doubt check the logs on the installing machine to verify what hostname it is trying to use to connect to the SSC.
Incident 5: Recreating the SSC
I'm sure there must be an easy way to re-use an existing STSC database for a rebuilt or moved install so all your configuration is already in place but avoid having the SSC installer itself assume it needs to create a new database connection as it installs.
My scenario was that I was building each Sametime component and the DB2 server all on different machines, everything virtualised and with snapshots. I build DB2 which works fine. I build the Domino server for LDAP. I build the Domino install which will host the Community Server. I create the DB2 database for the System console on the DB2 server. I build the System Console (SSC) . I start my work setting up the SSC to connect to LDAP and generate a deployment plan for the Community Server and then I install the Community Server. At this point I've built 5 servers and I realise the SSC is running out of disk space and throwing errors. The virtual machine build only had 35GB of disk and it simply ran out. I could have tried to add more disk but I'm not a hardware girl and it seemed simplest to roll back the SSC leaving everything else in place and rebuild it pointing to the same SSC DB2 database containing all the configuration I'd already done. So I do a new clean install of the SSC but I don't create a new DB2 database for it because there is already one still on the DB2 server with the name I want. All appears to work fine, I log into the SSC and am delighted to see my Community Server, LDAP, Deployment plans, etc, still in place. Then I see under DB2 databases in the SSC, two entries for STSC (the SSC database name). So somewhere during the SSC install it created a pointer to the database STSC on the db2 server but there was already a pointer from the earlier install so now I have two. Neither can be removed. Neither work. If I try and edit or modify either I get errors. I ended up having to delete the STSC database on the DB2 server and rolling back, creating the configuration all over again.
Incident 6: In the SSC some components can no longer be accessed
Install your servers, most components having their own machines. At some point you go into the SSC and choose your Meeting Server but instead of coming up with your meeting server you get a WAS error with "portlet not installed". The Meeting Server itself continues to run, start, stop etc fine. Meetings work. Policies can still be setup and applied. But the SSC interface can no longer show Meeting Server management no matter how many reboots are tried. Errors in logs report missing portlet. No changes were made to the environment which had been running a couple of weeks before this occured. So far only seen on linux installs.
Incident 7: Meeting Server kind of stops working but is still working
This was 11hrs of my life this week. My meeting server which was built on Windows 2008 in early June, stopped working. The server still started and stopped OK. It showed no errors in any logs but if I attempted to attend a meeting from any client machine the meeting would open in the rich client then drop out within 10 seconds with the helpful error that "You cannot join the meeting at this time". I created new meetings , same error. I checked the DB2 database, meetings are being created OK. I use the Web interface, the meeting opens in the browser and appears OK but the big clue is no awareness in the participant list, and an error if I try and upload any files. Usually online awareness not working in meetings is a problem with DNS or the proxy server so I run down that path for a bit. My client logs show a 503 error and suggest I turn on .com.ibm.rtccore=finest logging as detailed in this document. I do that but on every client machine where it's enabled, the error continues along with the request to turn on that logging. It simply isn't being picked up. Eventually I track it down by connecting directly to the Meeting Server on port 9082 instead of to its WAS Proxy on port 443. The WAS proxy works, that's how we were logging in and doing everything but by connecting directly the error goes away and I discover there are no problems with the Meeting Server itself, just its proxy.
Every product has its teething problems and there is so much new, and so much good stuff in 8.5.2 I'm not surprised I've found a few things. I should also mention the above is the result of over 30 installs, many of which were error-free so don't be disheartened. I share this because if you find these problems, well it's good to know it's not just you isn't it :-)
I'm going to do my next blog post on reading WAS logs and how to hunt down errors. It's something I do a lot of and I think some people will find it useful.
- Comments