Parallelized for loop in Bashbash script for printer administrationAn atexit for BashParallel for loop in Java 8Parallel for loop in Java 8 - follow-upParallel for loop in Java - follow-up 2Process files in all subdirectories and save output to new files based on their current pathLocating the Bash history file for a userSimple Bash Parallel Tool (env_parallel dies on big env)bash input parameter logicLocal backup bash

Keeping track of theme when improvising

ISP is not hashing the password I log in with online. Should I take any action?

Why does this Apple //e drops into system monitor when booting?

Reviewing papers at a journal where your own work is currently submitted

How to search for Android apps without ads?

Can Mage Hand be used to indirectly trigger an attack?

Any gotchas in buying second-hand sanitary ware?

Idiom for 'person who gets violent when drunk"

How effective would a full set of plate armor be against wild animals found in temperate regions (bears, snakes, wolves)?

What do I need to do, tax-wise, for a sudden windfall?

My mom's return ticket is 3 days after I-94 expires

Why is gun control associated with the socially liberal Democratic party?

What are the advantages of using TLRs to rangefinders?

Can I attach a DC blower to intake manifold of my 150CC Yamaha FZS FI engine?

Boss making me feel guilty for leaving the company at the end of my internship

Is it ethical to cite a reviewer's papers even if they are rather irrelevant?

Are athletes' college degrees discounted by employers and graduate school admissions?

What do you call the action of "describing events as they happen" like sports anchors do?

Why is my Taiyaki (Cake that looks like a fish) too hard and dry?

Why would a home insurer offer a discount based on credit score?

Why are backslashes included in this shell script?

I received a gift from my sister who just got back from

What did the 8086 (and 8088) do upon encountering an illegal instruction?

The instances where verbs might take the genitive case



Parallelized for loop in Bash


bash script for printer administrationAn atexit for BashParallel for loop in Java 8Parallel for loop in Java 8 - follow-upParallel for loop in Java - follow-up 2Process files in all subdirectories and save output to new files based on their current pathLocating the Bash history file for a userSimple Bash Parallel Tool (env_parallel dies on big env)bash input parameter logicLocal backup bash






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








6












$begingroup$


I am using a Bash script to execute a Python script multiple times. In order to speed up the execution, I would like to execute these (independent) processes in parallel. The code below does so:



#!/usr/bin/env bash

script="path_to_python_script"
N=16 # number of processors
mkdir -p data/

for i in `seq 1 1 100`; do
for j in 1..100; do
((q=q%N)); ((q++==0)) && wait

if [ -e data/file_$i-$j.txt ]
then
echo "data/file_$i-$j.txt exists"
else
($script -args_1 $i > data/file_$i-$j.txt ;
$script -args_1 $i -args_2 value -args_3 value >> data/file_$i-$j.txt) &
fi
done
done


However, I am wondering if this code follow common best practices of parallelization of for loops in Bash? Are there ways to improve the efficiency of this code?










share|improve this question









New contributor



user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$


















    6












    $begingroup$


    I am using a Bash script to execute a Python script multiple times. In order to speed up the execution, I would like to execute these (independent) processes in parallel. The code below does so:



    #!/usr/bin/env bash

    script="path_to_python_script"
    N=16 # number of processors
    mkdir -p data/

    for i in `seq 1 1 100`; do
    for j in 1..100; do
    ((q=q%N)); ((q++==0)) && wait

    if [ -e data/file_$i-$j.txt ]
    then
    echo "data/file_$i-$j.txt exists"
    else
    ($script -args_1 $i > data/file_$i-$j.txt ;
    $script -args_1 $i -args_2 value -args_3 value >> data/file_$i-$j.txt) &
    fi
    done
    done


    However, I am wondering if this code follow common best practices of parallelization of for loops in Bash? Are there ways to improve the efficiency of this code?










    share|improve this question









    New contributor



    user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$














      6












      6








      6





      $begingroup$


      I am using a Bash script to execute a Python script multiple times. In order to speed up the execution, I would like to execute these (independent) processes in parallel. The code below does so:



      #!/usr/bin/env bash

      script="path_to_python_script"
      N=16 # number of processors
      mkdir -p data/

      for i in `seq 1 1 100`; do
      for j in 1..100; do
      ((q=q%N)); ((q++==0)) && wait

      if [ -e data/file_$i-$j.txt ]
      then
      echo "data/file_$i-$j.txt exists"
      else
      ($script -args_1 $i > data/file_$i-$j.txt ;
      $script -args_1 $i -args_2 value -args_3 value >> data/file_$i-$j.txt) &
      fi
      done
      done


      However, I am wondering if this code follow common best practices of parallelization of for loops in Bash? Are there ways to improve the efficiency of this code?










      share|improve this question









      New contributor



      user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$




      I am using a Bash script to execute a Python script multiple times. In order to speed up the execution, I would like to execute these (independent) processes in parallel. The code below does so:



      #!/usr/bin/env bash

      script="path_to_python_script"
      N=16 # number of processors
      mkdir -p data/

      for i in `seq 1 1 100`; do
      for j in 1..100; do
      ((q=q%N)); ((q++==0)) && wait

      if [ -e data/file_$i-$j.txt ]
      then
      echo "data/file_$i-$j.txt exists"
      else
      ($script -args_1 $i > data/file_$i-$j.txt ;
      $script -args_1 $i -args_2 value -args_3 value >> data/file_$i-$j.txt) &
      fi
      done
      done


      However, I am wondering if this code follow common best practices of parallelization of for loops in Bash? Are there ways to improve the efficiency of this code?







      bash concurrency iteration multiprocessing






      share|improve this question









      New contributor



      user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share|improve this question









      New contributor



      user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|improve this question




      share|improve this question








      edited 5 hours ago









      200_success

      133k20165437




      133k20165437






      New contributor



      user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked 8 hours ago









      user213544user213544

      333




      333




      New contributor



      user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      user213544 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          7












          $begingroup$

          Some suggestions:



          • The trailing slash in the mkdir command is redundant.


          • $(…) is preferred over backticks for command substitution.

          • Why use seq in one command? They both do the same loop, so you might as well use 1..100 in both places.

          • Semicolons are unnecessary in the vast majority of cases. Simply use a newline to achieve the same separation between commands.

          • Use More Quotes™


          • set -o errexit -o noclobber -o nounset at the start of the script will be helpful. It'll exit the script instead of overwriting any files, for example, so you can get rid of the inner if statement if it's OK that the script stops when the file exists.


          • [[ is preferred over [.

          • The whole exercise is probably easier to achieve with some standard pattern like GNU parallel. Currently the script starts N commands, then waits for all of them to finish before starting any more. Unless the processes take very similar time this is going to waste a lot of time waiting.


          • N (or for example processors for readability) should be determined dynamically, using for example nproc --all, rather than hardcoded.

          • If you're worried about speed you should probably not create a subshell for your two script commands. and will group commands without creating a subshell.

          • For the same reason you probably want to do a single redirection like "$script" … && "$script" …; > "data/file_$i-$j.txt"


          • Since you're "only" counting to 10,000 you don't need to reset q every time. You can for example set process_count=0 outside the outer loop and check the modulo in a readable way such as:



            if [[ "$process_count" % "$processors" -eq 0 ]]
            then
            wait
            fi


          • The inner code (from the line starting with ((q=q%N))) should be indented one more time.





          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            [ is more portable and more consistent than [[. So your preference is certainly arguable.
            $endgroup$
            – Toby Speight
            6 hours ago











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          user213544 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f222141%2fparallelized-for-loop-in-bash%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          7












          $begingroup$

          Some suggestions:



          • The trailing slash in the mkdir command is redundant.


          • $(…) is preferred over backticks for command substitution.

          • Why use seq in one command? They both do the same loop, so you might as well use 1..100 in both places.

          • Semicolons are unnecessary in the vast majority of cases. Simply use a newline to achieve the same separation between commands.

          • Use More Quotes™


          • set -o errexit -o noclobber -o nounset at the start of the script will be helpful. It'll exit the script instead of overwriting any files, for example, so you can get rid of the inner if statement if it's OK that the script stops when the file exists.


          • [[ is preferred over [.

          • The whole exercise is probably easier to achieve with some standard pattern like GNU parallel. Currently the script starts N commands, then waits for all of them to finish before starting any more. Unless the processes take very similar time this is going to waste a lot of time waiting.


          • N (or for example processors for readability) should be determined dynamically, using for example nproc --all, rather than hardcoded.

          • If you're worried about speed you should probably not create a subshell for your two script commands. and will group commands without creating a subshell.

          • For the same reason you probably want to do a single redirection like "$script" … && "$script" …; > "data/file_$i-$j.txt"


          • Since you're "only" counting to 10,000 you don't need to reset q every time. You can for example set process_count=0 outside the outer loop and check the modulo in a readable way such as:



            if [[ "$process_count" % "$processors" -eq 0 ]]
            then
            wait
            fi


          • The inner code (from the line starting with ((q=q%N))) should be indented one more time.





          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            [ is more portable and more consistent than [[. So your preference is certainly arguable.
            $endgroup$
            – Toby Speight
            6 hours ago















          7












          $begingroup$

          Some suggestions:



          • The trailing slash in the mkdir command is redundant.


          • $(…) is preferred over backticks for command substitution.

          • Why use seq in one command? They both do the same loop, so you might as well use 1..100 in both places.

          • Semicolons are unnecessary in the vast majority of cases. Simply use a newline to achieve the same separation between commands.

          • Use More Quotes™


          • set -o errexit -o noclobber -o nounset at the start of the script will be helpful. It'll exit the script instead of overwriting any files, for example, so you can get rid of the inner if statement if it's OK that the script stops when the file exists.


          • [[ is preferred over [.

          • The whole exercise is probably easier to achieve with some standard pattern like GNU parallel. Currently the script starts N commands, then waits for all of them to finish before starting any more. Unless the processes take very similar time this is going to waste a lot of time waiting.


          • N (or for example processors for readability) should be determined dynamically, using for example nproc --all, rather than hardcoded.

          • If you're worried about speed you should probably not create a subshell for your two script commands. and will group commands without creating a subshell.

          • For the same reason you probably want to do a single redirection like "$script" … && "$script" …; > "data/file_$i-$j.txt"


          • Since you're "only" counting to 10,000 you don't need to reset q every time. You can for example set process_count=0 outside the outer loop and check the modulo in a readable way such as:



            if [[ "$process_count" % "$processors" -eq 0 ]]
            then
            wait
            fi


          • The inner code (from the line starting with ((q=q%N))) should be indented one more time.





          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            [ is more portable and more consistent than [[. So your preference is certainly arguable.
            $endgroup$
            – Toby Speight
            6 hours ago













          7












          7








          7





          $begingroup$

          Some suggestions:



          • The trailing slash in the mkdir command is redundant.


          • $(…) is preferred over backticks for command substitution.

          • Why use seq in one command? They both do the same loop, so you might as well use 1..100 in both places.

          • Semicolons are unnecessary in the vast majority of cases. Simply use a newline to achieve the same separation between commands.

          • Use More Quotes™


          • set -o errexit -o noclobber -o nounset at the start of the script will be helpful. It'll exit the script instead of overwriting any files, for example, so you can get rid of the inner if statement if it's OK that the script stops when the file exists.


          • [[ is preferred over [.

          • The whole exercise is probably easier to achieve with some standard pattern like GNU parallel. Currently the script starts N commands, then waits for all of them to finish before starting any more. Unless the processes take very similar time this is going to waste a lot of time waiting.


          • N (or for example processors for readability) should be determined dynamically, using for example nproc --all, rather than hardcoded.

          • If you're worried about speed you should probably not create a subshell for your two script commands. and will group commands without creating a subshell.

          • For the same reason you probably want to do a single redirection like "$script" … && "$script" …; > "data/file_$i-$j.txt"


          • Since you're "only" counting to 10,000 you don't need to reset q every time. You can for example set process_count=0 outside the outer loop and check the modulo in a readable way such as:



            if [[ "$process_count" % "$processors" -eq 0 ]]
            then
            wait
            fi


          • The inner code (from the line starting with ((q=q%N))) should be indented one more time.





          share|improve this answer











          $endgroup$



          Some suggestions:



          • The trailing slash in the mkdir command is redundant.


          • $(…) is preferred over backticks for command substitution.

          • Why use seq in one command? They both do the same loop, so you might as well use 1..100 in both places.

          • Semicolons are unnecessary in the vast majority of cases. Simply use a newline to achieve the same separation between commands.

          • Use More Quotes™


          • set -o errexit -o noclobber -o nounset at the start of the script will be helpful. It'll exit the script instead of overwriting any files, for example, so you can get rid of the inner if statement if it's OK that the script stops when the file exists.


          • [[ is preferred over [.

          • The whole exercise is probably easier to achieve with some standard pattern like GNU parallel. Currently the script starts N commands, then waits for all of them to finish before starting any more. Unless the processes take very similar time this is going to waste a lot of time waiting.


          • N (or for example processors for readability) should be determined dynamically, using for example nproc --all, rather than hardcoded.

          • If you're worried about speed you should probably not create a subshell for your two script commands. and will group commands without creating a subshell.

          • For the same reason you probably want to do a single redirection like "$script" … && "$script" …; > "data/file_$i-$j.txt"


          • Since you're "only" counting to 10,000 you don't need to reset q every time. You can for example set process_count=0 outside the outer loop and check the modulo in a readable way such as:



            if [[ "$process_count" % "$processors" -eq 0 ]]
            then
            wait
            fi


          • The inner code (from the line starting with ((q=q%N))) should be indented one more time.






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 8 hours ago

























          answered 8 hours ago









          l0b0l0b0

          5,1991127




          5,1991127







          • 2




            $begingroup$
            [ is more portable and more consistent than [[. So your preference is certainly arguable.
            $endgroup$
            – Toby Speight
            6 hours ago












          • 2




            $begingroup$
            [ is more portable and more consistent than [[. So your preference is certainly arguable.
            $endgroup$
            – Toby Speight
            6 hours ago







          2




          2




          $begingroup$
          [ is more portable and more consistent than [[. So your preference is certainly arguable.
          $endgroup$
          – Toby Speight
          6 hours ago




          $begingroup$
          [ is more portable and more consistent than [[. So your preference is certainly arguable.
          $endgroup$
          – Toby Speight
          6 hours ago










          user213544 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          user213544 is a new contributor. Be nice, and check out our Code of Conduct.












          user213544 is a new contributor. Be nice, and check out our Code of Conduct.











          user213544 is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f222141%2fparallelized-for-loop-in-bash%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          ParseJSON using SSJSUsing AMPscript with SSJS ActivitiesHow to resubscribe a user in Marketing cloud using SSJS?Pulling Subscriber Status from Lists using SSJSRetrieving Emails using SSJSProblem in updating DE using SSJSUsing SSJS to send single email in Marketing CloudError adding EmailSendDefinition using SSJS

          Кампала Садржај Географија Географија Историја Становништво Привреда Партнерски градови Референце Спољашње везе Мени за навигацију0°11′ СГШ; 32°20′ ИГД / 0.18° СГШ; 32.34° ИГД / 0.18; 32.340°11′ СГШ; 32°20′ ИГД / 0.18° СГШ; 32.34° ИГД / 0.18; 32.34МедијиПодациЗванични веб-сајту

          Кастелфранко ди Сопра Становништво Референце Спољашње везе Мени за навигацију43°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.5588543°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.558853179688„The GeoNames geographical database”„Istituto Nazionale di Statistica”проширитиууWorldCat156923403n850174324558639-1cb14643287r(подаци)