[Webservice] Fix database lock timeouts (!832) · Merge requests · calibration / pycalibration

"README.md" did not exist on "32e2a3e4e3589ed79011bee87a46a18200b36574"

Merged Thomas Kluyver requested to merge fix/websvc-db-write-locking into master 2 years ago

Description

We have been seeing errors in the webservice where it tries to write to the database and times out on a lock. I found 12 instances in the log, with tracebacks pointing to 3 different INSERT statements.

Traceback (most recent call last):                                                                                                                                                                        
  File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/pycalibration-2023-03-22-3.9.2-9e88f31/webservice/webservice.py", line 1034, in _continue
    ret, _ = await self.launch_jobs(                                                                                                                                                                      
  File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/pycalibration-2023-03-22-3.9.2-9e88f31/webservice/webservice.py", line 1397, in launch_jobs
    cur = self.job_db.execute(                                                                                                                                                                            
sqlite3.OperationalError: database is locked

This indicates that something running concurrently with the webservice is keeping the database locked (i.e. holding a transaction open) for several seconds. The jobs monitor and the web overview (http://max-exfl016.desy.de:8008/) are the only other things that routinely access this database.

I think I've found the culprit in the job monitor code. When updating job statuses, it was issuing SQL updates in a loop which also runs sacct to get job statuses, and then committing (finishing the transaction) outside the loop. If a lot of jobs had finished since the last update, or sacct was slow for some reason, this would mean the database was locked until it finished. I've addressed this by batching the updates needed and writing to the database outside the loop.

I also added some timing code wherever we write to the database, so we can see if it's taking longer than expected.

How Has This Been Tested?

TBD

Types of changes

Bug fix (non-breaking change which fixes an issue)

Checklist:

My code follows the code style of this project.

Reviewers

@ahmedk @schmidtp

Edited 2 years ago by Thomas Kluyver

Activity

Please register or sign in to reply

[Webservice] Fix database lock timeouts

Description

How Has This Been Tested?

Types of changes

Checklist:

Reviewers

Merge request reports

Activity