tuulos / disco

Home | Edit | New

Home

See Rules of Thumb for Map/Reduce programming

For an up-to-date list of bugs and feature request, see our [bug tracker](http://disco.lighthouseapp.com/projects/17865/home).

wishlist

  • For each job, make a process that encapsulates job’s information. This way job info can be queried from various modules in the system without carrying a large record around. When this is done, use the mechanism to parse Python client’s version from the request so that a corresponding interpreter can be used on the nodes. This should solve the problem with mismatching python versions. (tuulos)
  • Disco.job() implementation for other languages besides Python, using the external interface (tuulos)
  • General speed-ups: Replace urllib with pycurl, rewrite netstr_reader (tuulos)
  • Support for streaming data between maps and reduces: If sorting is disabled, we could stream map outputs to reduces directly, without writing any intermediate files, and without the reduces needing to wait for maps to finish. (tuulos)
  • A way to stop map / reduce before all data has been consumed (tuulos)
  • Separate users / groups: A personal joblist etc. (tuulos)
  • Distribute params-files to multiple servers — fix the issue with all tasks trying to retrieve the same params file from the master simultaneously when they start (tuulos)
Last edited by tuulos, Sun Oct 05 16:13:16 -0700 2008
Home | Edit | New
Versions: