Skip to content
Snippets Groups Projects

[Webservice] Fix database lock timeouts

Merged Thomas Kluyver requested to merge fix/websvc-db-write-locking into master

Description

We have been seeing errors in the webservice where it tries to write to the database and times out on a lock. I found 12 instances in the log, with tracebacks pointing to 3 different INSERT statements.

Traceback (most recent call last):                                                                                                                                                                        
  File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/pycalibration-2023-03-22-3.9.2-9e88f31/webservice/webservice.py", line 1034, in _continue
    ret, _ = await self.launch_jobs(                                                                                                                                                                      
  File "/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/pycalibration-2023-03-22-3.9.2-9e88f31/webservice/webservice.py", line 1397, in launch_jobs
    cur = self.job_db.execute(                                                                                                                                                                            
sqlite3.OperationalError: database is locked

This indicates that something running concurrently with the webservice is keeping the database locked (i.e. holding a transaction open) for several seconds. The jobs monitor and the web overview (http://max-exfl016.desy.de:8008/) are the only other things that routinely access this database.

I think I've found the culprit in the job monitor code. When updating job statuses, it was issuing SQL updates in a loop which also runs sacct to get job statuses, and then committing (finishing the transaction) outside the loop. If a lot of jobs had finished since the last update, or sacct was slow for some reason, this would mean the database was locked until it finished. I've addressed this by batching the updates needed and writing to the database outside the loop.

I also added some timing code wherever we write to the database, so we can see if it's taking longer than expected.

How Has This Been Tested?

TBD

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the code style of this project.

Reviewers

@ahmedk @schmidtp

Edited by Thomas Kluyver

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading