[IMP] api: improve storing performance of the api cache

This revision moves the `cache_key` to the first level dict of the cache, instead of the last one. Doing so, we reduce the number of times the reference to the cache key is stored in the dict. For instance, for 100.000 records, 20 fields and 2 env (e.g. with and without sudo) formerly, there were 100.000 * 20 * 2 occurences of cache key references now, there is only 2 references. Storing references to an object consumes memory. Therefore, by reducing the number of object references in the cache, we reduce the memory consumed by the cache. Also, we reduce the time to access a value in the cache as the cache size is smaller. The time and memory consumption are therefore improved, while keeping the advantages of revision d7190a3f which was about sharing the cache of fields which do not depends on the context, but only on the cursor and user id. This revision relies on the fact there are less different references to the cache key then references to fields/records. Indeed, this is more likely to have 100.000 different records stored in the cache rather than 100.000 different environments. Here is the Python proof of concept that was used to make the conclusion that setting the cache_key in the first level dict of the cache is more efficient. ```Python import os import psutil import time from collections import defaultdict cr = object() uid = 1 fields = [object() for i in range(20)] number_items = 500000 p = psutil.Process(os.getpid()) m = p.memory_info().rss s = time.time() cache_key = (cr, uid) cache = defaultdict(lambda: defaultdict(dict)) for field in fields: for i in range(number_items): cache[field][i][cache_key] = 5.0 # cache[cache_key][field][i] = 5.0 print('Memory: %s' % (p.memory_info().rss - m,)) print('Time: %s' % (time.time() - s,)) ``` - Using `cache[field][i][cache_key]`: - Time: 3.17s - Memory: 3138MB - Using `cache[cache_key][field][i]`: - Time: 1.43s - Memory: 756MB Even worse, when the cache key tuple is instantiated inside the loop, for the former cache structure (e.g. `cache[field][i][(cr, uid)]`), the time goes from 3.17s to 25.63s and the memory from 3138MB to 3773MB Here is the same proof of concept, but using the Odoo API and Cache: ```Python import os import psutil import time from odoo.api import Cache model = env['res.users'] records = [model.new() for i in range(100000)] p = psutil.Process(os.getpid()) m = p.memory_info().rss s = time.time() cache = Cache() char_fields = [field for field in model._fields.values() if field.type == 'char'] for field in char_fields: for record in records: cache.set(record, field, 'test') print('Memory: %s' % (p.memory_info().rss - m,)) print('Time: %s' % (time.time() - s,)) ``` - Before (`cache[field][record_id][cache_key]` and cache_key tuple instantiated in the loop): - Time: 4.12s - Memory: 810MB - After (`cache[cache_key][field][record_id]` and cache_key tuple stored in the env and re-used): - Time: 1.63s - Memory: 125MB This can be played in an Odoo shell, for instance by storing it in `/tmp/test.py`, and then piping it to the Odoo shell: `cat /tmp/test.py | ./odoo-bin shell -d 12.0` closes odoo/odoo#29676 closes odoo/odoo#30554

[IMP] api: improve storing performance of the api cache
8fb76c42 · Denis Ledoux · 79416ba6 · 8fb76c42 · 8fb76c42
Commit 8fb76c42 authored 6 years ago by Denis Ledoux
--- a/odoo/addons/base/tests/test_cache.py
+++ b/odoo/addons/base/tests/test_cache.py
@@ -120,7 +120,7 @@ class TestRecordCache(TransactionCase):
    def test_memory(self):
        """ Check memory consumption of the cache. """
        NB_RECORDS = 100000
-        MAX_MEMORY = 500
+        MAX_MEMORY = 100

        cache = self.env.cache
        model = self.env['res.partner']

--- a/odoo/api.py
+++ b/odoo/api.py
@@ -927,7 +927,7 @@ class Environment(Mapping):

    def cache_key(self, field):
        """ Return the key to store the value of ``field`` in cache, the full
-            cache key being ``(field, record.id, key)``.
+            cache key being ``(key, field, record.id)``.
        """
        return self if field.context_dependent else self._cache_key

@@ -953,52 +953,52 @@ class Environments(object):
 class Cache(object):
    """ Implementation of the cache of records. """
    def __init__(self):
-        # {field: {record_id: {key: value}}}
+        # {key: {field: {record_id: value}}}
        self._data = defaultdict(lambda: defaultdict(dict))

    def contains(self, record, field):
        """ Return whether ``record`` has a value for ``field``. """
        key = record.env.cache_key(field)
-        return key in self._data[field].get(record.id, ())
+        return record.id in self._data[key].get(field, ())

    def get(self, record, field):
        """ Return the value of ``field`` for ``record``. """
        key = record.env.cache_key(field)
-        value = self._data[field][record.id][key]
+        value = self._data[key][field][record.id]
        return value.get() if isinstance(value, SpecialValue) else value

    def set(self, record, field, value):
        """ Set the value of ``field`` for ``record``. """
        key = record.env.cache_key(field)
-        self._data[field][record.id][key] = value
+        self._data[key][field][record.id] = value

    def remove(self, record, field):
        """ Remove the value of ``field`` for ``record``. """
        key = record.env.cache_key(field)
-        del self._data[field][record.id][key]
+        del self._data[key][field][record.id]

    def contains_value(self, record, field):
        """ Return whether ``record`` has a regular value for ``field``. """
        key = record.env.cache_key(field)
-        value = self._data[field][record.id].get(key, SpecialValue(None))
+        value = self._data[key][field].get(record.id, SpecialValue(None))
        return not isinstance(value, SpecialValue)

    def get_value(self, record, field, default=None):
        """ Return the regular value of ``field`` for ``record``. """
        key = record.env.cache_key(field)
-        value = self._data[field][record.id].get(key, SpecialValue(None))
+        value = self._data[key][field].get(record.id, SpecialValue(None))
        return default if isinstance(value, SpecialValue) else value

    def get_special(self, record, field, default=None):
        """ Return the special value of ``field`` for ``record``. """
        key = record.env.cache_key(field)
-        value = self._data[field][record.id].get(key)
+        value = self._data[key][field].get(record.id)
        return value.get if isinstance(value, SpecialValue) else default

    def set_special(self, record, field, getter):
        """ Set the value of ``field`` for ``record`` to return ``getter()``. """
        key = record.env.cache_key(field)
-        self._data[field][record.id][key] = SpecialValue(getter)
+        self._data[key][field][record.id] = SpecialValue(getter)

    def set_failed(self, records, fields, exception):
        """ Mark ``fields`` on ``records`` with the given exception. """
@@ -1012,62 +1012,65 @@ class Cache(object):
        """ Return the fields with a value for ``record``. """
        for name, field in record._fields.items():
            key = record.env.cache_key(field)
-            if name != 'id' and key in self._data[field].get(record.id, ()):
+            if name != 'id' and record.id in self._data[key].get(field, ()):
                yield field

    def get_records(self, model, field):
        """ Return the records of ``model`` that have a value for ``field``. """
        key = model.env.cache_key(field)
-        ids = [
-            record_id
-            for record_id, field_record_cache in self._data[field].items()
-            if key in field_record_cache
-        ]
+        ids = list(self._data[key][field])
        return model.browse(ids)

    def get_missing_ids(self, records, field):
        """ Return the ids of ``records`` that have no value for ``field``. """
        key = records.env.cache_key(field)
-        field_cache = self._data[field]
+        field_cache = self._data[key][field]
        for record_id in records._ids:
-            if key not in field_cache.get(record_id, ()):
+            if record_id not in field_cache:
                yield record_id

    def copy(self, records, env):
        """ Copy the cache of ``records`` to ``env``. """
        src, dst = records.env, env
-        for field, field_cache in self._data.items():
-            src_key = src.cache_key(field)
-            dst_key = dst.cache_key(field)
-            for record_cache in field_cache.values():
-                if src_key in record_cache and not isinstance(record_cache[src_key], SpecialValue):
-                    # But not if it's a SpecialValue, which often is an access error
-                    # because the other environment (eg. sudo()) is well expected to have access.
-                    record_cache[dst_key] = record_cache[src_key]
+        for src_key, dst_key in [(src, dst), (src._cache_key, dst._cache_key)]:
+            if src_key == dst_key:
+                break
+            src_cache = self._data[src_key]
+            dst_cache = self._data[dst_key]
+            for field, src_field_cache in src_cache.items():
+                dst_field_cache = dst_cache[field]
+                for record_id, value in src_field_cache.items():
+                    if not isinstance(value, SpecialValue):
+                        # But not if it's a SpecialValue, which often is an access error
+                        # because the other environment (eg. sudo()) is well expected to have access.
+                        dst_field_cache[record_id] = value

    def invalidate(self, spec=None):
        """ Invalidate the cache, partially or totally depending on ``spec``. """
        if spec is None:
            self._data.clear()
        elif spec:
-            data = self._data
            for field, ids in spec:
                if ids is None:
-                    data.pop(field, None)
+                    for data in self._data.values():
+                        data.pop(field, None)
                else:
-                    field_cache = data[field]
-                    for id in ids:
-                        field_cache.pop(id, None)
+                    for data in self._data.values():
+                        field_cache = data.get(field)
+                        if field_cache:
+                            for id in ids:
+                                field_cache.pop(id, None)

    def check(self, env):
        """ Check the consistency of the cache for the given environment. """
        # make a full copy of the cache, and invalidate it
        dump = defaultdict(dict)
-        for field, field_cache in self._data.items():
-            key = env.cache_key(field)
-            for record_id, field_record_cache in field_cache.items():
-                if record_id and key in field_record_cache:
-                    dump[field][record_id] = field_record_cache[key]
+        for key in [env, env._cache_key]:
+            key_cache = self._data[key]
+            for field, field_cache in key_cache.items():
+                for record_id, value in field_cache.items():
+                    if record_id:
+                        dump[field][record_id] = value

        self.invalidate()