Catch database locked error when checking for response in loop
@tmichela let me know that AGIPD correction failed in MID p5536 r192, with the logs showing Calparrot errors like this:
ERROR:tornado.application:Uncaught exception GET /api/calibration_constant_versions/get_by_detector_conditions?detector_identifier=MID_DET_AGIPD1M-1&calibration_id=%5B15%2C+17%5D&karabo_da=&event_at=2024-04-26T05%3A55%3A32.523611%2B00%3A00&pdu_snapshot_at=2024-04-26T05%3A55%3A32.523611%2B00%3A00 (127.0.0.1)
HTTPServerRequest(protocol='http', host='127.0.0.1:34882', method='GET', uri='/api/calibration_constant_versions/get_by_detector_conditions?detector_identifier=MID_DET_AGIPD1M-1&calibration_id=%5B15%2C+17%5D&karabo_da=&event_at=2024-04-26T05%3A55%3A32.523611%2B00%3A00&pdu_snapshot_at=2024-04-26T05%3A55%3A32.523611%2B00%3A00', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/current/.venv/lib/python3.8/site-packages/tornado/web.py", line 1790, in _execute
result = await result
File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/current/.venv/lib/python3.8/site-packages/calparrot/proxy.py", line 204, in get
response = await self._get_with_cache(req_path)
File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/current/.venv/lib/python3.8/site-packages/calparrot/proxy.py", line 175, in _get_with_cache
status, reason, headers, body = await asyncio.wait_for(
File "/home/xcal/.pyenv/versions/3.8.11/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
return fut.result()
File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/current/.venv/lib/python3.8/site-packages/calparrot/db.py", line 57, in wait_get_response
status, reason, headers, body = self.get(url, req_body)
File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/current/.venv/lib/python3.8/site-packages/calparrot/db.py", line 47, in get
row = self.conn.execute(
sqlite3.OperationalError: database is locked
ERROR:tornado.access:500 GET /api/calibration_constant_versions/get_by_detector_conditions?detector_identifier=MID_DET_AGIPD1M-1&calibration_id=%5B15%2C+17%5D&karabo_da=&event_at=2024-04-26T05%3A55%3A32.523611%2B00%3A00&pdu_snapshot_at=2024-04-26T05%3A55%3A32.523611%2B00%3A00 (127.0.0.1) 6584.96ms
My guess is that GPFS had a hiccup, which lead to a write transaction in another process (not all jobs failed) taking several seconds, instead of finishing almost instantly. That would mean the database stays locked, so the process trying to read gives up after 5 seconds. At present, we don't catch this exception, so tornado returns a generic 500 error, which causes a JSONDecodeError
in the notebook.
This change catches the 'database locked' error. We allow up to 15 seconds for a response to appear in the SQLite DB, so it will try again after finding the database locked. If 3 attempts all fail, it will time out and fall back to sending the query upstream to CalCat.