New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 9781 link

Starred by 3 users

Issue metadata

Status: Released
Owner:
Closed: Dec 19
Cc:
Components:



Sign in to add a comment

LDAP Groups membership disappears when project cache is not fully loaded

Project Member Reported by luca.mil...@gmail.com, Sep 29

Issue description

*****************************************************************
*****                                                       *****
***** !!!! THIS BUG TRACKER IS FOR GERRIT CODE REVIEW !!!!  *****
*****                                                       *****
***** DO NOT SUBMIT BUGS FOR CHROME, ANDROID, CYANOGENMOD,  *****
***** INTERNAL ISSUES WITH YOUR COMPANY'S GERRIT SETUP, ETC.*****
*****                                                       *****
*****   THOSE ISSUES BELONG IN DIFFERENT ISSUE TRACKERS     *****
*****                                                       *****
*****************************************************************

Affected Version: 2.14.x or later

What steps will reproduce the problem?
1. Configure Gerrit with LDAP Groups resolution
2. Add the user 'john' to the LDAP group 'johngroup'
3. Grand the permissions 'Push' to a test-project for group 'johngroup'
4. Restart Gerrit

What is the expected output?
'john' has 'johngroup' listed in his own groups
(e.g. ssh -p 29418 john@slave1.gerritforge.com gerrit ls-groups --user john)

What do you see instead?
'john' does not have 'johngroup' in his list, until he logs in again from the GUI and ldap eager group fetch is enabled.

Please provide any additional information below.
This is due to a "cache poisoning" at restart when not all projects and his groups file are  fully loaded.

What happens is that the group cache associated to 'john' gets populated without the ldap group because of this optimisation:

LdapGroupMembership.java:L57
  public Set<AccountGroup.UUID> getKnownGroups() {
    Set<AccountGroup.UUID> g = new HashSet<>(get().getKnownGroups());
    g.retainAll(projectCache.guessRelevantGroupUUIDs());
    return g;
  }

The g.retainAll(projectCache.guessRelevantGroupUUIDs()) is only a "guess" and may not contains all the groups that are actually used in all the projects, but only the ones in the projects in cache, which can be incomplete.

The result of the groups resolution gets cached and then the 'john' user 'disappears' from the 'johngroup'.
 
Project Member

Comment 1 by luca.mil...@gmail.com, Sep 29

It seems that this behaviour was introduced in Change-Id: I0fb5053f24af52014fd860e795d08cd38fc25f1c for filtering the list of groups returned by LDAP.

The filtering is a pure guess and may result in having valid group ownership filtered out and then cached.

It is incorrect to assume that all the projects cache is always fully loaded, because some project cache may expire and thus the LDAP groups used in there would be filtered out.

Because the problem was introduced a long time ago, I would suggest to introduce a config option to enable or disable this filtering behaviour. By default, we keep the current filtering but with the disclaimer that it could result in incorrect ACL evaluation.

Adding @Dave to the discussion because this is one of the key features of the Gerrit ACLs.
Project Member

Comment 2 by luca.mil...@gmail.com, Sep 29

Cc: dborowitz@google.com
Project Member

Comment 3 by luca.mil...@gmail.com, Sep 29

Another fix could be changing the ProjectCache method:

public Set<AccountGroup.UUID> guessRelevantGroupUUIDs() {
    Set<AccountGroup.UUID> groups = Sets.newHashSet();
    for (Project.NameKey n : all()) {
      ProjectState p = byName.getIfPresent(n);
      if (p != null) {
        groups.addAll(p.getConfig().getAllGroupUUIDs());
      }
    }
    return groups;
  }

The ".getIfPresent" can be changed to ".get" and then it won't the best guess anymore :-) The drawback is the triggering of the eager load of the Projects cache and thus potentially a substantial performance degradation if the caches aren't fully loaded.

@Dave what would be your preference?
Project Member

Comment 4 by dborowitz@google.com, Oct 1

I am not comfortable with making ProjectCache#guessRelevantGroupUUIDs eagerly load all groups. This method is called on every change view via ChangeResource#prepareETag -> CurrentUser#getEffectiveGroups -> GroupMembership#getKnownGroups, so this would be equivalent to eagerly loading the project cache on startup, which we don't want.

For now, as far as I'm concerned, you can make whatever change you want to the LDAP group backend to make it work, as long as you take responsibility if any performance issues crop up :)

In the long term, I still want to persist the ProjectCache so it's always warm immediately at startup. After that would be a good time to revisit the implementation of guessRelevantGroupUUIDs.

> The ".getIfPresent" can be changed to ".get" and then it won't the best guess anymore :-)

That's not what "guess" means in this context. Even when the ProjectCache is fully loaded, the set of external groups mentioned in all ProjectConfigs is not necessarily the set of all groups in the external system. So we "guess" that only groups that are mentioned in some ACL are relevant.

In your example, if "johngroup" is the only group ever mentioned in an ACL, but there is another group "janegroup" on the server, guessRelevantGroupUUIDs will never return "janegroup".

The reason for this is it's not possible to list all groups in some group systems (e.g. Google Groups), and even if it were, the set may be so large as to make this operation infeasible.
Project Member

Comment 5 by luca.mil...@gmail.com, Oct 1

> I am not comfortable with making ProjectCache#guessRelevantGroupUUIDs eagerly load all groups. This method is called on every change view via ChangeResource#prepareETag -> CurrentUser#getEffectiveGroups -> GroupMembership#getKnownGroups, so this would be equivalent to eagerly loading the project cache on startup, which we don't want.

Yes, that what I thought as well: the projects cache warm up can take several minutes and thus having some critical parts of the system, like ACL resolution, depend on it means basically having the entire Gerrit instance stuck at startup.

> For now, as far as I'm concerned, you can make whatever change you want to the LDAP group backend to make it work, as long as you take responsibility if any performance issues crop up :)

I would rather make it configurable because there is really no perfect solution out there:

a) If we filter, we need to accept the fact that that could be incomplete when project caching is not fully populated.

b) If we don't filter, we need to accept that the list of LDAP groups could be potentially very large and thus accept the associated performance penalty associated.

Which one is better or worse? I would leave to the user to decide rather than forcing always a).

> So we "guess" that only groups that are mentioned in some ACL are relevant

More than "guess" is assuming that only groups that are mentioned in the projects most recently used are relevant, excluding the ones that aren't loaded yet. This is how I interpreted the "guessing".

> The reason for this is it's not possible to list all groups in some group systems (e.g. Google Groups), and even if it were, the set may be so large as to make this operation infeasible

Makes sense, however the LDAP case is a bit different because the sequence of operation is:

1. Fetch all groups, regardless if they are used or not
2. Filter the full list by the ones used in the Projects cached in memory

The optimization then is not really for avoiding expensive LDAP group lookup, that is made anyway and regardless of the fact that they are used or not. 

What about then making it configurable on the LDAP side, if the full list should be filtered or not?
Project Member

Comment 6 by luca.mil...@gmail.com, Oct 1

Cc: sasa.ziv...@sap.com
@Saša what do you think? You guys are heavy LDAP users isn't it?
Project Member

Comment 7 by dborowitz@google.com, Oct 4

> The optimization then is not really for avoiding expensive LDAP group lookup, that is made anyway and regardless of the fact that they are used or not. 

Ah, I wasn't aware that the LDAP integration always requests the full list of groups from the server.
Project Member

Comment 8 by luca.mil...@gmail.com, Oct 4

> Ah, I wasn't aware that the LDAP integration always requests the full list of groups from the server.

Oh yes. The post-filtering is more for two reasons I believe:

1) Avoiding the UX to explode with thousands of groups, many of them not relevant because not used in Gerrit

2) Optimizing the execution of the ACL evaluation, avoiding to check for groups that are never used.

However, as I said in this ticket, *IF* the project cache is not fully loaded, the 1) and 2) lead to unexpected behaviors hard to understand for the end-user.

The user reported: "Gosh, why all my permissions have gone after I have been away for 2 days?" 
Project Member

Comment 9 by luca.mil...@gmail.com, Dec 19

Any more feedback on this? If not, I'll proceed with the fix for Gerrit v2.14 and merge it upstream up to master, once reviewed and merged.
Project Member

Comment 10 by luca.mil...@gmail.com, Dec 19

Status: ChangeUnderReview (was: New)
Fix uploaded to stable-2.14:
https://gerrit-review.googlesource.com/c/gerrit/+/207963
Project Member

Comment 11 by luca.mil...@gmail.com, Dec 19

Status: Submitted (was: ChangeUnderReview)
Project Member

Comment 12 by luca.mil...@gmail.com, Dec 21

Labels: FixedIn-2.16.2
Status: Released (was: Submitted)

Sign in to add a comment