Skip to end of metadata
Go to start of metadata

Doc Purpose: Track and highlight Groups Directory Service load/stress test findings that may be useful to preserve or share. 
Doc Status: This is an informal doc, not meant to provide complete coverage of GDS stress testing.  Most of the numbers are approximate, from which very general conclusions are drawn.

Terminology

  1. CHO - Continuous Hours of Operation, a long running test that, in our case, includes a load or stressor.  CHO is used to certify a system for performing under the given load for a particular length of time. Critical enterprise systems often include a CHO goal of one or two weeks as a part of release testing.
  2. GDS - UW's Groups Directory Service, which is comprised of two OpenLDAP servers.
  3. Large Groups - UW's GDS has a small number of group objects that have between 100,000 and 200,000 members; we call them large groups.  The replication performance of updates to large groups can be terrible, depending on the replication method chosen.  Syncrepl puts the entire membership on the wire even if only a single member was changed.  Delta-syncrepl puts only the 'delta' on the wire.
  4. Projected Worst Hour - this test simulates a projected worst case load of updates (10,000 modifies) against GDS in one hour.  We can generate 10k mods in an hour by running 5 instances of the Stressor for 51 loops each.  The Stressor "with large groups" includes operations to read a large group, add a member to a large group, and delete that member from the large group.  The Stressor with "no large groups" does not perform any operations on large groups; hence the syncrepl replication burden is relieved somewhat.
  5. Stressor- this is the actual looping functional test that we launch, sometimes several instances simultaneously, to perform a CHO test.  Unless we indicate running it for a limited number of loops, assume we've launched it to run continuously (for such a large number of loops that it won't complete for many days).  Each Stressor loop results in the following calls against GDS:
    • 67 ldap_search calls
    • 39 ldap_modify calls
    • 26 ldap_add calls
  6. Stressor(N) - We use this shorthand to indicate running N simultaneous instances of the Stressor test client. 

Table: CHO Test Results for "Single Master" Configurations

All the tests in the table below were configured to run a "Single Master" replication model.  There was one Master (or Producer) and one Slave (or Consumer).  In other words, none of the tests below were configured with Multi-Master Replication or mirror mode.
These tests were performed the week of June 22, 2009.

Config

Test

Results

Conclusion

SLURP (PRODUCTION)
- OpenLDAP 2.3.24, BDB 4.3
- Repl Mech: Slurp

CHO
- Stressor(5) for at least 24 hours

- Replication kept pace with updates to Master.
- No core dumps.

Production config stands up to CHO.

SYNCREPL
- OpenLDAP 2.4.16, BDB 4.7 w/patch.4.7.25.[SMW:1-4]
- Repl Mech: Syncrepl

CHO
- Stressor(2) for about 24 hours

NOTE: some question as to whether this was 2 or 5 Stressors running.

- Producer 15-70% CPU busy
- Consumer constantly 100% CPU busy.
- Replication lagged enormously.  25 hours after CHO was stopped, Consumer was pulling updates timestamped 38 hours prior.

Syncrepl consumer cannot keep up, or catch up, with a high load of updates to Producer.

SYNCREPL
(same as above)

Projected Worst Hour (no large groups)
- Stressor(5) with NO large group updates for 51 loops (about an hour)

- Replication kept pace with updates on Producer.

Syncrepl performance is probably good enough for UW's current peak load of GDS updates.

SYNCREPL
(same as above)

Projected Worst Hour (w/large groups)
- Stressor(5) with large group updates for 51 loops

- Producer crashed after 17 loops.
- Core dump shows familiar "fetch_parent returns NULL ptr" somewhere in caching code.

Syncrepl is not stable.  Can run for hours, or crash immediately, under the same test load.

DELTA-SYNCREPL
- OpenLDAP 2.4.16, BDB 4.7 w/patches
- Repl Mech: Delta-syncrepl

Projected Worst Hour (w/large groups)

- Producer 2 - 90% CPU busy
- Consumer 10% CPU busy
- Replication never lagged.

Delta-syncrepl performance is brilliant for heavy update scenarios.

DELTA-SYNCREPL
(same as above)

CHO
- Stressor(5) for as long as possible.

- Producer crashed almost immediately.

Delta-syncrepl configuration is not stable, either. 

  • No labels