Skip to end of metadata
Go to start of metadata

Overview

These pages provide information necessary for the diagnosis and maintenance of the Groups 2.0 system.



Audience

This is an internal maintenance document for the edification of UWIT staff who maintain the service and for any others who are merely interested. You are expected to have a working knowledge of the tools used by theGroup Service, including: Apache, Tomcat, postgres, LDAP, Spring, and Velocity.

Architecture

See: Group Service Architecture.

Components of the service

Groups 2.0 consists of a registry, which maintains the authoritative group and membership information, LDAP and other subordinate directories, which provide read-only copies of the registry, a webservice and UI for group maintenance, institutional group provisioners, ldap directory provisioners, and various administrative tools.

Registry

A postgres instance on iamdb21 (with spare on iamdb22) provides the groups database.  Grouper tools and libraries provide most of the interface to the database.

GWS webservice

The GWS webservice provides the GWS API, including the browser GUI.

There are two principal components to the API:

GWS Provisioners

The GWS provisioners are responsible for all institutional groups. They accept information from a variety of sources and update GWS groups accordingly.

Ldap Provisioner

The Ldap provisioner monitors changes to groups and memberships and updates associated Ldap groups and affiliate group systems. And sends event notices to AWS topics.  The task run as daemons on each web service system. Only one daemon is active (primary) at a time.

Multi-group Assistant

The multi-group assistant is an html and javascript application (a wizard of sorts) that facilitates multiple group and multiple subject actions. 

It works entirely through ajax and uses the user's GWS session for authentication to GWS, acting as the user.  Therefore it is always installed and run on a system that also runs GWS.

Source:  gws/util/gws-assistant


Evaluation Service

An evaluation service and database (also used for development) is maintained.  It uses a snapshot copy of the production database.

Components

These are current settings.  We hope to move these to cloud services soon.

ComponentLocation
databaseiamdbdev01
ldap cacherufus16
eval servicegwseval01
eval urleval.groups.uw.edu
dev servicegwsdev01
dev urldev.groups.uw.edu

Notes

  1. An ldap provisioner must run on either eval or dev (or both)
  2. Snapshots are taken at (should be) regular intervals. The UI header shows the snapshot date.

Taking a snapshot

(Yes, this needs to be scripted.)

  1. Stop all clients: (on gwsdev01, gwseval01)
    1. kill ldap_provision processes
    2. /etc/init.d/tomcat stop
  2. Get a new database (on iamdbdev01)
    1. $ psql -U postgres
    2. psql> drop database gwsdev23;
    3. psql> create database gwsdev23 with template=template0 owner=gws encoding='UTF8';
    4. $ time (ssh iamdb21.s.uw.edu '/usr/local/pgversion/bin/pg_dump -U postgres gws23'; echo '\q') | psql -U postgres gwsdev23
      1. Can run 2+ hours.  
      2. Next step (3) can be done concurrently.
  3. Build an up-to-date ldap cache (on rufus16)
    1.  # cd /data/au30aa
    2.  # /usr/local/openldap/groups/bin/slapd.sh stop
    3.  # rm data.mdb
    4.  # scp fox@stilpo21:/data/backup/groups.ldif groups.ldif
      1. (use your own username)
    5.  # /usr/local/openldap/sbin/slapadd -c -q -f /usr/local/openldap/groups/config/slapd.conf < groups.ldif
    6.  # chown accsys.wheel data.mdb lock.mdb
    7.  # /usr/local/openldap/groups/bin/slapd.sh start
    8. Note that slapd logs to /var/openldap/syslog/local4.log
  4. Analyze the new database (on iamdbdev01, when step (2) complete)
    1. $ psql -U postgres
    2. psql> analyze verbose;
  5. Update configurations to show new snapshot date (on gwsdev01, gwseval01)
    1. /data/webapps/gws_grouper/WEB-INF/gws-servlet.xml
      1. see "bannerText"
    2. Don't forget to update the configuration sources for eval and dev
  6. Restart the clients
    1. The ones you killed in step (1).
  7. Start a sync process (on gwseval01 or gwsdev01)
    1. $ cd /data/local/ldap-provisioning
    2. $ bin/ldap_verify -v -update -cfg dev.conf
      1. That will run for some time.
      2. This is only to catch any updates that were missed in step (2,3).


Troubleshooting

Status of GWS Components

Use the GWS status page on iam-tools or on eval.groups.uw.edu to see the current the status of several GWS components on each system that supports them.  The latter site will also allow you to verify the consistancy a group's membership between the registry and the ldap caches.

Monitoring and automatic recovery

Apache watcher

The apache watcher runs as a deamon on each web service host and verifies the Apache service is responsive.

  • Unknown attacks or misbehaving clients can overwhelm the service.
  • In case of a non-responsive Apache
    • Various debug dumps are triggered
      • last tomcat status
      • netstat report
      • httpd open file report
    • apache is restarted
      • systemctl restart httpd
    • alert is sent
      • level 3
  • The monitor is installed on each host
    • /data/local/src/gws-mon/gws-mon.sh
    • logs to same directory
    • source: tbd
Cert monitor

A cron job watches the certs used by GWS and friends and warns of soon to expire certs.

  • /data/local/bin/certwatch.sh
  • /data/local/etc/local_certs.txt  ( cert db )

GDS consistency verification

If the GDS (LDAP) has not been correctly updated GWS can give inconsistent results.  Some queries go to the registry; some to the LDAP. A weekly process runs on the ldap-provision master every Saturday (takes about 24 hours) and verifies all groups, similar to the second item below.

Any group's consistency may be verified or repaired manually.

  • To verify and fix selected groups:
    • On iamtools11 or iamtools12:
    •  $ /data/local/gws-provisioning/bin/ldap-fix.sh
      • With a group cn as parameter it will verify and fix as necessary
      • With no argument it accepts a list of group cns to verify and fix
  • To verify and fix all groups
    • On iamtools11 iamtools12:
    • $ cd /data/local/gws-provisioning
    • $ bin/ldap_verify -v -update -cfg etc/gws-ldap.conf -all
    • ( use "-?" to see all options )

GDS Ldap needs index rebuild

When an ldap's index space fills ("$ding stilpo2x argmon s" reports > 80% full) the index must be rebuilt.

On the ldap host, e.g. stilpo21.

  1. Remove the stilpo host from the ldap cluster

    1. # /etc/init.d/loadr idle

  2. Remove the host from any iam2x hosts
    1. Note: the iam2x use the corresponding stilpo2x as the first choice, so it is necessary to edit only the matching iam2x host's gws_ldap config.
    2. on the iam2x host:
      1. edit /www/gws_ldap/etc/v2.conf to remove the stilpo host being rebuilt
      2. Refresh the gws_ldaps on the affected iam2x:
        1.  # systemctl restart httpd
  3. Wait for the hourly backup to be refreshed (20 min past the hour)

    1.  # ls -l /data/backup/groups.ldif
    2. make sure the file is complete - by 'ls -l'
  4.  Stop the ldap service (slapd)
    1.  # /etc/init.d/slapd stop
  5.  Rebuild the database
    1.  # cd /data/au20aa
    2.  # rm data.mdb
    3. # /usr/local/openldap/sbin/slapadd -c -q -f /usr/local/openldap/groups/config/slapd.conf < /data/backup/groups.ldif
    4.  # chown accsys.wheel data.mdb lock.mdb
  6. Start the service
    1.  # /etc/init.d/slapd start
  7. Watch the log to see that it did start
    1.  # tail -f /var/openldap/syslog/local4.log
  8. Note the time.
  9. Scan (logger3)/logs/syslog/local6 for gws-ldap-provisioner events:
    1. Something like these (do all three) (replace day/time values as needed)
      1. grep gws-ldap-provision /logs/syslog/local6 | egrep '9 11:2[012].*needs update' | awk '{print $12}'
      2. grep gws-ldap-provision /logs/syslog/local6 | egrep '9 11:2[012].*sending' | awk '{print $12}'
      3. grep gws-ldap-provision /logs/syslog/local6 | egrep '9 11:2[012].*putting | awk '{print $8}'
  10. On iamtools11:
    1.  # /data/local/gws-provisioning/ldap_fix.sh
    2. It reads stdin.  Feed it each group found in (9).  Copy and paste of the grep output should work.
    3. The update sometimes erroneous reports an error 500.  Repeat the group to assure it has been fixed. 
  11. Put the ldap server back in the cluster.
    1.  # /etc/init.d/loadr start
  12. Restore the host in any iam2x hosts it was removed from
    1. (on the iam2x host)
      1. Edit /www/gws_ldap/etc/v2.conf to restore the stiplo2x as first choice


SWS consistency verification

Any course group can be manually verified against SWS data.

TBD


Adding an LDAP server ( DRAFT )

Note. This requires gws_grouper 2.5.0 and ldap_provision 2.4


Adding an LDAP is complicated by a couple of factors:

  1. Each gws_grouper has a list of ldaps that it updates with member adds and deletes.
    1. Configured in /data/local/gws/ldap-hosts.txt
    2. Update is automatically detected by the gws_grouper process.
  2. Each ldap_provisioner has a list of ldaps to provision.
    1. One per gws host.
    2. One active, others standby.
    3. Configured in (ldap-provisioning)/provision.conf
    4. Update is automatically detected by the provisioner.
  3. Startup of the new LDAP (e.g. stilpo31) requires:
    1. Add stilpo31 to the prod config files: gwsdev01:/data/local/src/gws/*.prod 
    2. Copy ldap dump from a live ldap. (e.g. stilpo21)
      1. On stilpo31
        1. systemctl stop slapd.service

        2. systemctl stop slapdmon.service

      2. On stilpo21
        1. /data/au20aa/slapcat.sh

          1. creates /data/backup/groups.ldif2
          2. omits uwnetid ou
      3. On stilpo31
        1. cd /data/groups/data
        2. scp <you>@stilpo26:/data/backup/groups.ldif2 .
        3. /data/openldap/sbin/slapadd -c -q -f /data/groups/config/slapd.conf < groups.ldif2

        4. systemctl start slapd.service

        5. systemctl start slapdmon.service

    3. Push configs with stilpo31 to all processes.
      1. On gwsdev01: /data/local/src/gws/ldaps/push prod
    4. Run ldap_fix on all missed events
      1. TBD


  • No labels