taylorism.schedulers

Contains classes for Schedulers.

Among a set of instructions to be passed to a Worker, and according to its own criteria, the Scheduler determine at the current moment the ones that can be launched right now simultaneously, and those that must be delayed.

A scheduler hence basically has one method: launchable(pending_instructions, workers, report).

Its parameters (constant among execution) can be attributed in its constructor. Other quantities, variables among execution, must be available within workers (work being done) and report (work done).

A set of basic schedulers is given.

Starting from version 1.0.7, schedulers should be created using the footprints package:

import footprints as fp
# In order to create a NewMaxThreadsScheduler scheduler:
mt_sched = fp.proxy.scheduler(limit='threads', max_threads=2)

Compatibility classes are still provided (see LaxistScheduler, MaxThreadsScheduler, MaxMemoryScheduler and SingleOpenFileScheduler) but they should not be used anymore.

Dependencies

footprints (MF package)

Functions

taylorism.schedulers.BindingAwareScheduler(cls)[source]

A class decorator that wraps the original _workers_hooks method to add binding’s setup method to the list Worker’s hooks.

NB: The class’ footprint should include a ‘binded’ attribute.

taylorism.schedulers.binding_setup(worker)[source]

Bind a worker to its scheduler_ticket compute core.

Classes

class taylorism.schedulers.BaseScheduler(*args, **kw)[source]

Bases: FootprintBase

Abstract base class for schedulers.

Note

This class is managed by footprint.

  • info: Not documented

  • priority: PriorityLevel::DEFAULT (rank=1)

Automatic parameters from the footprint:

  • identity (builtins.str) - rxx - Scheduler identity.

    • Optional. Default is ‘anonymous’.

identity

Scheduler identity (see the documentation above for more details).

launchable(pending_instructions, workers, report)[source]

Split pending_instructions into “launchable” and “not_yet_launchable” instructions according to the scheduler own rules.

For that purpose and in a generic manner, the scheduler may need:

  • pending_instructions: todo

  • workers: being done

  • report: done.

class taylorism.schedulers.LaxistScheduler[source]

Bases: _AbstractOldSchedulerProxy

Deprecated class: should not be used from now and on.

class taylorism.schedulers.LongerFirstScheduler(*args, **kw)[source]

Bases: NewMaxMemoryScheduler

A scheduler based on the NewMaxMemory Scheduler. It aims at launching as soon as possible the workers that are expected to have the longest run-time (whilst there is enough available memory).

Workers needs to have 2 attributes:

  • expected_time representing the expected run time

  • memory representing the expected memory consumed

This scheduler is typically used with Bateur workers, in vortex’s src/common/algo/odbtools.py (for parallel BATOR run).

Note

This class is managed by footprint.

  • info: Abstract binded attribute

  • priority: PriorityLevel::DEFAULT (rank=1)

Automatic parameters from the footprint:

  • binded (builtins.bool) - rxx - Binds the process to a single cpu.

    • Optional. Default is False.

  • identity (builtins.str) - rxx - Scheduler identity.

    • Optional. Default is ‘anonymous’.

  • limit (builtins.str) - rxx - Not documented, sorry.

    • Values: set([‘threads+memory’])

    • Remap: dict(mem = ‘memory’,)

  • max_memory (builtins.float) - rwx - Amount of usable memroy (in MiB)

    • Optional. Default is None.

  • max_threads (builtins.int) - rxx - Not documented, sorry.

    • Remap: dict(0 = 1.0,)

  • memory_max_percentage (builtins.float) - rxx - Max memory level as a fraction of the totalsystem memory (used only if max_memroy is not provided).

    • Optional. Default is 0.75.

  • memory_per_task (builtins.float) - rxx - If a worker do not provide any information on memory, request at least memory_per_task MiB of memory.

    • Optional. Default is 2048.0.

Aliases of some parameters:

  • maxpc is an alias of max_threads.

  • maxthreads is an alias of max_threads.

Setup the maximum available memory.

binded

Binds the process to a single cpu (see the documentation above for more details).

identity

Scheduler identity (see the documentation above for more details).

launchable(pending_instructions, workers, report)[source]

Limit strategy: only processes that fit in a given amount of memory could run.

limit

(see the documentation above for more details).

max_memory

Amount of usable memroy (in MiB) (see the documentation above for more details).

max_threads

(see the documentation above for more details).

memory_max_percentage

Max memory level as a fraction of the totalsystem memory (used only if max_memroy is not provided) (see the documentation above for more details).

memory_per_task

If a worker do not provide any information on memory, request at least memory_per_task MiB of memory (see the documentation above for more details).

class taylorism.schedulers.MaxMemoryScheduler(max_memory_percentage=0.75, total_system_memory=None)[source]

Bases: _AbstractOldSchedulerProxy

Deprecated class: should not be used from now and on.

Parameters:
  • max_memory_percentage (float) – Max memory level as a fraction of the total system memory

  • total_system_memory (float) – Memory available on this system (in MiB)

class taylorism.schedulers.MaxThreadsScheduler(max_threads=0)[source]

Bases: _AbstractOldSchedulerProxy

Deprecated class: should not be used from now and on.

class taylorism.schedulers.NewLaxistScheduler(*args, **kw)[source]

Bases: BaseScheduler

No sorting is done !

Note

This class is managed by footprint.

  • info: Not documented

  • priority: PriorityLevel::DEFAULT (rank=1)

Automatic parameters from the footprint:

  • identity (builtins.str) - rxx - Scheduler identity.

    • Optional. Default is ‘anonymous’.

  • nosort (builtins.bool) - rxx - Not documented, sorry.

    • Values: set([True])

Aliases of some parameters:

  • unsorted is an alias of nosort.

  • laxist is an alias of nosort.

identity

Scheduler identity (see the documentation above for more details).

launchable(pending_instructions, workers, report)[source]

Very crude strategy: any pending instruction could be triggered.

nosort

(see the documentation above for more details).

class taylorism.schedulers.NewLimitedScheduler(*args, **kw)[source]

Bases: BaseScheduler

A scheduler that dequeue the pending list as long as a maximum number of simultaneous tasks (max_threads) is not reached.

Note

This class is managed by footprint.

  • info: Not documented

  • priority: PriorityLevel::DEFAULT (rank=1)

Automatic parameters from the footprint:

  • identity (builtins.str) - rxx - Scheduler identity.

    • Optional. Default is ‘anonymous’.

  • limit (builtins.str) - rxx - Not documented, sorry.

    • Values: set([‘threads+memory’, ‘memory’, ‘mem’, ‘threads’])

    • Remap: dict(mem = ‘memory’,)

identity

Scheduler identity (see the documentation above for more details).

limit

(see the documentation above for more details).

class taylorism.schedulers.NewMaxMemoryScheduler(*args, **kw)[source]

Bases: NewLimitedScheduler

A basic scheduler that dequeue the pending list as long as a critical memory level (according to ‘memory’ element of workers instructions (in MB) and total system memory) is not reached.

Note

This class is managed by footprint.

  • info: Abstract binded attribute

  • priority: PriorityLevel::DEFAULT (rank=1)

Automatic parameters from the footprint:

  • binded (builtins.bool) - rxx - Binds the process to a single cpu.

    • Optional. Default is False.

  • identity (builtins.str) - rxx - Scheduler identity.

    • Optional. Default is ‘anonymous’.

  • limit (builtins.str) - rxx - Not documented, sorry.

    • Values: set([‘memory’, ‘mem’])

    • Remap: dict(mem = ‘memory’,)

  • max_memory (builtins.float) - rwx - Amount of usable memroy (in MiB)

    • Optional. Default is None.

  • memory_max_percentage (builtins.float) - rxx - Max memory level as a fraction of the totalsystem memory (used only if max_memroy is not provided).

    • Optional. Default is 0.75.

  • memory_per_task (builtins.float) - rxx - If a worker do not provide any information on memory, request at least memory_per_task MiB of memory.

    • Optional. Default is 2048.0.

Setup the maximum available memory.

binded

Binds the process to a single cpu (see the documentation above for more details).

identity

Scheduler identity (see the documentation above for more details).

launchable(pending_instructions, workers, report)[source]

Limit strategy: only processes that fit in a given amount of memory could run.

limit

(see the documentation above for more details).

max_memory

Amount of usable memroy (in MiB) (see the documentation above for more details).

memory_max_percentage

Max memory level as a fraction of the totalsystem memory (used only if max_memroy is not provided) (see the documentation above for more details).

memory_per_task

If a worker do not provide any information on memory, request at least memory_per_task MiB of memory (see the documentation above for more details).

class taylorism.schedulers.NewMaxThreadsScheduler(*args, **kw)[source]

Bases: NewLimitedScheduler

A basic scheduler that dequeue the pending list as long as a maximum number of simultaneous tasks (max_threads) is not reached.

Note

This class is managed by footprint.

  • info: Abstract binded attribute

  • priority: PriorityLevel::DEFAULT (rank=1)

Automatic parameters from the footprint:

  • binded (builtins.bool) - rxx - Binds the process to a single cpu.

    • Optional. Default is False.

  • identity (builtins.str) - rxx - Scheduler identity.

    • Optional. Default is ‘anonymous’.

  • limit (builtins.str) - rxx - Not documented, sorry.

    • Values: set([‘processes’, ‘threads’])

    • Remap: dict(mem = ‘memory’, processes = ‘threads’,)

  • max_threads (builtins.int) - rxx - Not documented, sorry.

    • Remap: dict(0 = 1.0,)

Aliases of some parameters:

  • maxpc is an alias of max_threads.

  • maxthreads is an alias of max_threads.

binded

Binds the process to a single cpu (see the documentation above for more details).

identity

Scheduler identity (see the documentation above for more details).

launchable(pending_instructions, workers, report)[source]

Limit strategy: only max_threads processes could run simultaneously.

limit

(see the documentation above for more details).

max_threads

(see the documentation above for more details).

class taylorism.schedulers.NewSingleOpenFileScheduler(*args, **kw)[source]

Bases: NewMaxThreadsScheduler

Ensure that files will not be open 2 times simultaneously by 2 workers. And with a maximum threads number.

Note

This class is managed by footprint.

  • info: Abstract binded attribute

  • priority: PriorityLevel::DEFAULT (rank=1)

Automatic parameters from the footprint:

  • binded (builtins.bool) - rxx - Binds the process to a single cpu.

    • Optional. Default is False.

  • identity (builtins.str) - rxx - Scheduler identity.

    • Optional. Default is ‘anonymous’.

  • limit (builtins.str) - rxx - Not documented, sorry.

    • Values: set([‘processes’, ‘threads’])

    • Remap: dict(mem = ‘memory’, processes = ‘threads’,)

  • max_threads (builtins.int) - rxx - Not documented, sorry.

    • Remap: dict(0 = 1.0,)

  • singlefile (builtins.bool) - rxx - Not documented, sorry.

    • Values: set([True])

Aliases of some parameters:

  • maxpc is an alias of max_threads.

  • maxthreads is an alias of max_threads.

binded

Binds the process to a single cpu (see the documentation above for more details).

identity

Scheduler identity (see the documentation above for more details).

launchable(pending_instructions, workers, report)[source]

Limit strategy: deal with what looks like a file locking.

limit

(see the documentation above for more details).

max_threads

(see the documentation above for more details).

singlefile

(see the documentation above for more details).

class taylorism.schedulers.SingleOpenFileScheduler(max_threads=0)[source]

Bases: _AbstractOldSchedulerProxy

Deprecated class: should not be used from now and on.