Skip to content
Snippets Groups Projects
Commit 8fb76c42 authored by Denis Ledoux's avatar Denis Ledoux
Browse files

[IMP] api: improve storing performance of the api cache

This revision moves the `cache_key` to the first level
dict of the cache, instead of the last one.

Doing so, we reduce the number of times the reference
to the cache key is stored in the dict.

For instance,
for 100.000 records, 20 fields and 2 env (e.g. with and without sudo)
formerly, there were 100.000 * 20 * 2 occurences of cache key references
now, there is only 2 references.

Storing references to an object consumes memory.
Therefore, by reducing the number of object references
in the cache, we reduce the memory consumed by the cache.
Also, we reduce the time to access a value in the cache
as the cache size is smaller.

The time and memory consumption are therefore improved,
while keeping the advantages of revision
d7190a3f
which was about sharing the cache of fields
which do not depends on the context, but
only on the cursor and user id.

This revision relies on the fact there are less different references
to the cache key then references to fields/records.
Indeed, this is more likely to have 100.000 different records stored
in the cache rather than 100.000 different environments.

Here is the Python proof of concept that was used
to make the conclusion that setting the cache_key
in the first level dict of the cache is more efficient.
```Python
import os
import psutil
import time

from collections import defaultdict

cr = object()
uid = 1
fields = [object() for i in range(20)]
number_items = 500000

p = psutil.Process(os.getpid())
m = p.memory_info().rss
s = time.time()

cache_key = (cr, uid)
cache = defaultdict(lambda: defaultdict(dict))
for field in fields:
    for i in range(number_items):
        cache[field][i][cache_key] = 5.0
        # cache[cache_key][field][i] = 5.0

print('Memory: %s' % (p.memory_info().rss - m,))
print('Time: %s' % (time.time() - s,))

```
- Using `cache[field][i][cache_key]`:
   - Time: 3.17s
   - Memory: 3138MB
- Using `cache[cache_key][field][i]`:
   - Time: 1.43s
   - Memory: 756MB

Even worse, when the cache key tuple is instantiated inside the loop,
for the former cache structure (e.g. `cache[field][i][(cr, uid)]`),
the time goes from 3.17s to 25.63s and the memory from 3138MB to 3773MB

Here is the same proof of concept, but using the Odoo API and Cache:
```Python
import os
import psutil
import time

from odoo.api import Cache

model = env['res.users']
records = [model.new() for i in range(100000)]

p = psutil.Process(os.getpid())
m = p.memory_info().rss
s = time.time()

cache = Cache()
char_fields = [field for field in model._fields.values() if field.type == 'char']
for field in char_fields:
    for record in records:
        cache.set(record, field, 'test')

print('Memory: %s' % (p.memory_info().rss - m,))
print('Time: %s' % (time.time() - s,))
```
- Before (`cache[field][record_id][cache_key]` and cache_key tuple instantiated in the loop):
   - Time: 4.12s
   - Memory: 810MB
- After (`cache[cache_key][field][record_id]` and cache_key tuple stored in the env and re-used):
   - Time: 1.63s
   - Memory: 125MB

This can be played in an Odoo shell, for instance
by storing it in `/tmp/test.py`, and then
piping it to the Odoo shell:
`cat /tmp/test.py | ./odoo-bin shell -d 12.0`

closes odoo/odoo#29676

closes odoo/odoo#30554
parent 79416ba6
Branches
Tags
No related merge requests found
......@@ -120,7 +120,7 @@ class TestRecordCache(TransactionCase):
def test_memory(self):
""" Check memory consumption of the cache. """
NB_RECORDS = 100000
MAX_MEMORY = 500
MAX_MEMORY = 100
cache = self.env.cache
model = self.env['res.partner']
......
......@@ -927,7 +927,7 @@ class Environment(Mapping):
def cache_key(self, field):
""" Return the key to store the value of ``field`` in cache, the full
cache key being ``(field, record.id, key)``.
cache key being ``(key, field, record.id)``.
"""
return self if field.context_dependent else self._cache_key
......@@ -953,52 +953,52 @@ class Environments(object):
class Cache(object):
""" Implementation of the cache of records. """
def __init__(self):
# {field: {record_id: {key: value}}}
# {key: {field: {record_id: value}}}
self._data = defaultdict(lambda: defaultdict(dict))
def contains(self, record, field):
""" Return whether ``record`` has a value for ``field``. """
key = record.env.cache_key(field)
return key in self._data[field].get(record.id, ())
return record.id in self._data[key].get(field, ())
def get(self, record, field):
""" Return the value of ``field`` for ``record``. """
key = record.env.cache_key(field)
value = self._data[field][record.id][key]
value = self._data[key][field][record.id]
return value.get() if isinstance(value, SpecialValue) else value
def set(self, record, field, value):
""" Set the value of ``field`` for ``record``. """
key = record.env.cache_key(field)
self._data[field][record.id][key] = value
self._data[key][field][record.id] = value
def remove(self, record, field):
""" Remove the value of ``field`` for ``record``. """
key = record.env.cache_key(field)
del self._data[field][record.id][key]
del self._data[key][field][record.id]
def contains_value(self, record, field):
""" Return whether ``record`` has a regular value for ``field``. """
key = record.env.cache_key(field)
value = self._data[field][record.id].get(key, SpecialValue(None))
value = self._data[key][field].get(record.id, SpecialValue(None))
return not isinstance(value, SpecialValue)
def get_value(self, record, field, default=None):
""" Return the regular value of ``field`` for ``record``. """
key = record.env.cache_key(field)
value = self._data[field][record.id].get(key, SpecialValue(None))
value = self._data[key][field].get(record.id, SpecialValue(None))
return default if isinstance(value, SpecialValue) else value
def get_special(self, record, field, default=None):
""" Return the special value of ``field`` for ``record``. """
key = record.env.cache_key(field)
value = self._data[field][record.id].get(key)
value = self._data[key][field].get(record.id)
return value.get if isinstance(value, SpecialValue) else default
def set_special(self, record, field, getter):
""" Set the value of ``field`` for ``record`` to return ``getter()``. """
key = record.env.cache_key(field)
self._data[field][record.id][key] = SpecialValue(getter)
self._data[key][field][record.id] = SpecialValue(getter)
def set_failed(self, records, fields, exception):
""" Mark ``fields`` on ``records`` with the given exception. """
......@@ -1012,62 +1012,65 @@ class Cache(object):
""" Return the fields with a value for ``record``. """
for name, field in record._fields.items():
key = record.env.cache_key(field)
if name != 'id' and key in self._data[field].get(record.id, ()):
if name != 'id' and record.id in self._data[key].get(field, ()):
yield field
def get_records(self, model, field):
""" Return the records of ``model`` that have a value for ``field``. """
key = model.env.cache_key(field)
ids = [
record_id
for record_id, field_record_cache in self._data[field].items()
if key in field_record_cache
]
ids = list(self._data[key][field])
return model.browse(ids)
def get_missing_ids(self, records, field):
""" Return the ids of ``records`` that have no value for ``field``. """
key = records.env.cache_key(field)
field_cache = self._data[field]
field_cache = self._data[key][field]
for record_id in records._ids:
if key not in field_cache.get(record_id, ()):
if record_id not in field_cache:
yield record_id
def copy(self, records, env):
""" Copy the cache of ``records`` to ``env``. """
src, dst = records.env, env
for field, field_cache in self._data.items():
src_key = src.cache_key(field)
dst_key = dst.cache_key(field)
for record_cache in field_cache.values():
if src_key in record_cache and not isinstance(record_cache[src_key], SpecialValue):
# But not if it's a SpecialValue, which often is an access error
# because the other environment (eg. sudo()) is well expected to have access.
record_cache[dst_key] = record_cache[src_key]
for src_key, dst_key in [(src, dst), (src._cache_key, dst._cache_key)]:
if src_key == dst_key:
break
src_cache = self._data[src_key]
dst_cache = self._data[dst_key]
for field, src_field_cache in src_cache.items():
dst_field_cache = dst_cache[field]
for record_id, value in src_field_cache.items():
if not isinstance(value, SpecialValue):
# But not if it's a SpecialValue, which often is an access error
# because the other environment (eg. sudo()) is well expected to have access.
dst_field_cache[record_id] = value
def invalidate(self, spec=None):
""" Invalidate the cache, partially or totally depending on ``spec``. """
if spec is None:
self._data.clear()
elif spec:
data = self._data
for field, ids in spec:
if ids is None:
data.pop(field, None)
for data in self._data.values():
data.pop(field, None)
else:
field_cache = data[field]
for id in ids:
field_cache.pop(id, None)
for data in self._data.values():
field_cache = data.get(field)
if field_cache:
for id in ids:
field_cache.pop(id, None)
def check(self, env):
""" Check the consistency of the cache for the given environment. """
# make a full copy of the cache, and invalidate it
dump = defaultdict(dict)
for field, field_cache in self._data.items():
key = env.cache_key(field)
for record_id, field_record_cache in field_cache.items():
if record_id and key in field_record_cache:
dump[field][record_id] = field_record_cache[key]
for key in [env, env._cache_key]:
key_cache = self._data[key]
for field, field_cache in key_cache.items():
for record_id, value in field_cache.items():
if record_id:
dump[field][record_id] = value
self.invalidate()
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment