dscpPolicy: prevent cache overflow
It has been a long standing bug that eventually the dscp
policy socket cache fills up and stops working for new
sockets.
The intent was this would be periodically cleared on some
sort of LRU basis from userspace.
But why bother?
We convert from hash[cookie:u64] to array[cookie % X].
If the cache misses, we simply recalculate and replace it.
This also allows us to reduce the size of the map:
we go from 1024 entries to 32 entries per cpu (likely ~8) so 256.
We're unlikely to have many active flows on any given cpu
at any given time - and the chance of collision is reasonable,
so thrashing should be minimal.
Longer term it would probably be even better to use
socket local storage for this purpose...
(Note: we don't have generation ids on the policy,
so we still don't apply new policy / policy changes
to old sockets... oh, well...)
Test: TreeHugger, atest CtsNetTestCases:android.net.cts.DscpPolicyTest
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ia570aec44e399d326c165b271ae4dad01f8aa100
diff --git a/bpf/progs/dscpPolicy.c b/bpf/progs/dscpPolicy.c
index baabb02..39f2961 100644
--- a/bpf/progs/dscpPolicy.c
+++ b/bpf/progs/dscpPolicy.c
@@ -23,7 +23,10 @@
#define ECN_MASK 3
#define UPDATE_TOS(dscp, tos) ((dscp) << 2) | ((tos) & ECN_MASK)
-DEFINE_BPF_MAP_GRW(socket_policy_cache_map, HASH, uint64_t, RuleEntry, CACHE_MAP_SIZE, AID_SYSTEM)
+// The cache is never read nor written by userspace and is indexed by socket cookie % CACHE_MAP_SIZE
+#define CACHE_MAP_SIZE 32 // should be a power of two so we can % cheaply
+DEFINE_BPF_MAP_GRO(socket_policy_cache_map, PERCPU_ARRAY, uint32_t, RuleEntry, CACHE_MAP_SIZE,
+ AID_SYSTEM)
DEFINE_BPF_MAP_GRW(ipv4_dscp_policies_map, ARRAY, uint32_t, DscpPolicy, MAX_POLICIES, AID_SYSTEM)
DEFINE_BPF_MAP_GRW(ipv6_dscp_policies_map, ARRAY, uint32_t, DscpPolicy, MAX_POLICIES, AID_SYSTEM)
@@ -43,6 +46,8 @@
uint64_t cookie = bpf_get_socket_cookie(skb);
if (!cookie) return;
+ uint32_t cacheid = cookie % CACHE_MAP_SIZE;
+
__be16 sport = 0;
uint16_t dport = 0;
uint8_t protocol = 0; // TODO: Use are reserved value? Or int (-1) and cast to uint below?
@@ -105,7 +110,8 @@
return;
}
- RuleEntry* existing_rule = bpf_socket_policy_cache_map_lookup_elem(&cookie);
+ // this array lookup cannot actually fail
+ RuleEntry* existing_rule = bpf_socket_policy_cache_map_lookup_elem(&cacheid);
if (existing_rule &&
v6_equal(src_ip, existing_rule->src_ip) &&
@@ -192,7 +198,7 @@
};
// Update cache with found policy.
- bpf_socket_policy_cache_map_update_elem(&cookie, &value, BPF_ANY);
+ bpf_socket_policy_cache_map_update_elem(&cacheid, &value, BPF_ANY);
if (new_dscp < 0) return;