word frequency from file using partial matchHow to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows

practicality of 30 year fix mortgage at 55 years of age

A high quality contribution but an annoying error is present in my published article

Why does NASA publish all the results/data it gets?

Can a broken/split chain be reassembled?

How can an attacker use robots.txt?

Can I take NEW (still in their boxes) PC PARTS in my checked in luggage?

Why is (inf + 0j)*1 == inf + nanj?

Is it true that, "just ten trading days represent 63 per cent of the returns of the past 50 years"?

Two trains move towards each other, a bird moves between them. How many trips can the bird make?

2000s Animated TV show where teenagers could physically go into a virtual world

Why are there two fundamental laws of logic?

Order of ingredients when making Pizza dough

When is it acceptable to write a bad letter of recommendation?

Is it possible to encode a message in such a way that can only be read by someone or something capable of seeing into the very near future?

Which place in our solar system is the most fit for terraforming?

On the meaning of 'anyways' in "What Exactly Is a Quartz Crystal, Anyways?"

Why did UK NHS pay for homeopathic treatments?

Organisational search option

Strange Sticky Substance on Digital Camera

Guitar tuning (EADGBE), "perfect" fourths?

What can a pilot do if an air traffic controller is incapacitated?

Should the average user with no special access rights be worried about SMS-based 2FA being theoretically interceptable?

Is this a Sherman, and if so what model?

Is it impolite to ask for halal food when traveling to and in Thailand?



word frequency from file using partial match


How to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2















I have a text file like this:



tom
and
jerry
went
to
america
and
england


I want to get the frequency of each word.



When I tried the following command



cat test.txt |sort|uniq -c


I got the following output



 1 america
2 and
1 england
1 jerry
1 to
1 tom
1 went


But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?










share|improve this question









New contributor



TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



























    2















    I have a text file like this:



    tom
    and
    jerry
    went
    to
    america
    and
    england


    I want to get the frequency of each word.



    When I tried the following command



    cat test.txt |sort|uniq -c


    I got the following output



     1 america
    2 and
    1 england
    1 jerry
    1 to
    1 tom
    1 went


    But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?










    share|improve this question









    New contributor



    TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      2












      2








      2








      I have a text file like this:



      tom
      and
      jerry
      went
      to
      america
      and
      england


      I want to get the frequency of each word.



      When I tried the following command



      cat test.txt |sort|uniq -c


      I got the following output



       1 america
      2 and
      1 england
      1 jerry
      1 to
      1 tom
      1 went


      But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?










      share|improve this question









      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have a text file like this:



      tom
      and
      jerry
      went
      to
      america
      and
      england


      I want to get the frequency of each word.



      When I tried the following command



      cat test.txt |sort|uniq -c


      I got the following output



       1 america
      2 and
      1 england
      1 jerry
      1 to
      1 tom
      1 went


      But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?







      text-processing command-line






      share|improve this question









      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share|improve this question









      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|improve this question




      share|improve this question








      edited 11 hours ago









      terdon

      143k35 gold badges295 silver badges472 bronze badges




      143k35 gold badges295 silver badges472 bronze badges






      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked 12 hours ago









      TweetManTweetMan

      1133 bronze badges




      1133 bronze badges




      New contributor



      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.

























          3 Answers
          3






          active

          oldest

          votes


















          3
















          Here's one way, but it isn't very elegant:



          $ sort -u file | while IFS= read -r word; do 
          printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
          done
          america 1
          and 3
          england 1
          jerry 1
          to 2
          tom 1
          went 1





          share|improve this answer


































            2
















            An awk approach:



            awk '
            !x c[$0]; next
            for (i in c) if (index($0, i)) c[i]++
            ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


            Which on your input give



            3 and
            2 to
            1 america
            1 england
            1 jerry
            1 tom
            1 went





            share|improve this answer

























            • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

              – TweetMan
              5 hours ago











            • @TweetMan depends how many unique words there are. It stores all unique words in memory.

              – Stéphane Chazelas
              5 hours ago











            • Hmm. then that would be a problem. it may crash the system.

              – TweetMan
              5 hours ago











            • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

              – A.Danischewski
              2 hours ago


















            0
















            This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



            sort -u < in | while read w
            do
            printf "%dt%sn" `grep -c "$w" in` "$w"
            done


            which on your input got me:



            1 america
            3 and
            1 england
            1 jerry
            2 to
            1 tom
            1 went





            share|improve this answer



























              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );







              TweetMan is a new contributor. Be nice, and check out our Code of Conduct.









              draft saved

              draft discarded
















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3
















              Here's one way, but it isn't very elegant:



              $ sort -u file | while IFS= read -r word; do 
              printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
              done
              america 1
              and 3
              england 1
              jerry 1
              to 2
              tom 1
              went 1





              share|improve this answer































                3
















                Here's one way, but it isn't very elegant:



                $ sort -u file | while IFS= read -r word; do 
                printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
                done
                america 1
                and 3
                england 1
                jerry 1
                to 2
                tom 1
                went 1





                share|improve this answer





























                  3














                  3










                  3









                  Here's one way, but it isn't very elegant:



                  $ sort -u file | while IFS= read -r word; do 
                  printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
                  done
                  america 1
                  and 3
                  england 1
                  jerry 1
                  to 2
                  tom 1
                  went 1





                  share|improve this answer















                  Here's one way, but it isn't very elegant:



                  $ sort -u file | while IFS= read -r word; do 
                  printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
                  done
                  america 1
                  and 3
                  england 1
                  jerry 1
                  to 2
                  tom 1
                  went 1






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 5 hours ago









                  Stéphane Chazelas

                  335k58 gold badges654 silver badges1031 bronze badges




                  335k58 gold badges654 silver badges1031 bronze badges










                  answered 11 hours ago









                  terdonterdon

                  143k35 gold badges295 silver badges472 bronze badges




                  143k35 gold badges295 silver badges472 bronze badges


























                      2
















                      An awk approach:



                      awk '
                      !x c[$0]; next
                      for (i in c) if (index($0, i)) c[i]++
                      ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                      Which on your input give



                      3 and
                      2 to
                      1 america
                      1 england
                      1 jerry
                      1 tom
                      1 went





                      share|improve this answer

























                      • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                        – TweetMan
                        5 hours ago











                      • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                        – Stéphane Chazelas
                        5 hours ago











                      • Hmm. then that would be a problem. it may crash the system.

                        – TweetMan
                        5 hours ago











                      • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                        – A.Danischewski
                        2 hours ago















                      2
















                      An awk approach:



                      awk '
                      !x c[$0]; next
                      for (i in c) if (index($0, i)) c[i]++
                      ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                      Which on your input give



                      3 and
                      2 to
                      1 america
                      1 england
                      1 jerry
                      1 tom
                      1 went





                      share|improve this answer

























                      • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                        – TweetMan
                        5 hours ago











                      • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                        – Stéphane Chazelas
                        5 hours ago











                      • Hmm. then that would be a problem. it may crash the system.

                        – TweetMan
                        5 hours ago











                      • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                        – A.Danischewski
                        2 hours ago













                      2














                      2










                      2









                      An awk approach:



                      awk '
                      !x c[$0]; next
                      for (i in c) if (index($0, i)) c[i]++
                      ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                      Which on your input give



                      3 and
                      2 to
                      1 america
                      1 england
                      1 jerry
                      1 tom
                      1 went





                      share|improve this answer













                      An awk approach:



                      awk '
                      !x c[$0]; next
                      for (i in c) if (index($0, i)) c[i]++
                      ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn


                      Which on your input give



                      3 and
                      2 to
                      1 america
                      1 england
                      1 jerry
                      1 tom
                      1 went






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered 5 hours ago









                      Stéphane ChazelasStéphane Chazelas

                      335k58 gold badges654 silver badges1031 bronze badges




                      335k58 gold badges654 silver badges1031 bronze badges















                      • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                        – TweetMan
                        5 hours ago











                      • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                        – Stéphane Chazelas
                        5 hours ago











                      • Hmm. then that would be a problem. it may crash the system.

                        – TweetMan
                        5 hours ago











                      • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                        – A.Danischewski
                        2 hours ago

















                      • thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                        – TweetMan
                        5 hours ago











                      • @TweetMan depends how many unique words there are. It stores all unique words in memory.

                        – Stéphane Chazelas
                        5 hours ago











                      • Hmm. then that would be a problem. it may crash the system.

                        – TweetMan
                        5 hours ago











                      • Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                        – A.Danischewski
                        2 hours ago
















                      thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                      – TweetMan
                      5 hours ago





                      thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

                      – TweetMan
                      5 hours ago













                      @TweetMan depends how many unique words there are. It stores all unique words in memory.

                      – Stéphane Chazelas
                      5 hours ago





                      @TweetMan depends how many unique words there are. It stores all unique words in memory.

                      – Stéphane Chazelas
                      5 hours ago













                      Hmm. then that would be a problem. it may crash the system.

                      – TweetMan
                      5 hours ago





                      Hmm. then that would be a problem. it may crash the system.

                      – TweetMan
                      5 hours ago













                      Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                      – A.Danischewski
                      2 hours ago





                      Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

                      – A.Danischewski
                      2 hours ago











                      0
















                      This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                      sort -u < in | while read w
                      do
                      printf "%dt%sn" `grep -c "$w" in` "$w"
                      done


                      which on your input got me:



                      1 america
                      3 and
                      1 england
                      1 jerry
                      2 to
                      1 tom
                      1 went





                      share|improve this answer





























                        0
















                        This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                        sort -u < in | while read w
                        do
                        printf "%dt%sn" `grep -c "$w" in` "$w"
                        done


                        which on your input got me:



                        1 america
                        3 and
                        1 england
                        1 jerry
                        2 to
                        1 tom
                        1 went





                        share|improve this answer



























                          0














                          0










                          0









                          This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                          sort -u < in | while read w
                          do
                          printf "%dt%sn" `grep -c "$w" in` "$w"
                          done


                          which on your input got me:



                          1 america
                          3 and
                          1 england
                          1 jerry
                          2 to
                          1 tom
                          1 went





                          share|improve this answer













                          This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":



                          sort -u < in | while read w
                          do
                          printf "%dt%sn" `grep -c "$w" in` "$w"
                          done


                          which on your input got me:



                          1 america
                          3 and
                          1 england
                          1 jerry
                          2 to
                          1 tom
                          1 went






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 2 hours ago









                          sitaramsitaram

                          1015 bronze badges




                          1015 bronze badges
























                              TweetMan is a new contributor. Be nice, and check out our Code of Conduct.









                              draft saved

                              draft discarded

















                              TweetMan is a new contributor. Be nice, and check out our Code of Conduct.












                              TweetMan is a new contributor. Be nice, and check out our Code of Conduct.











                              TweetMan is a new contributor. Be nice, and check out our Code of Conduct.














                              Thanks for contributing an answer to Unix & Linux Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              ParseJSON using SSJSUsing AMPscript with SSJS ActivitiesHow to resubscribe a user in Marketing cloud using SSJS?Pulling Subscriber Status from Lists using SSJSRetrieving Emails using SSJSProblem in updating DE using SSJSUsing SSJS to send single email in Marketing CloudError adding EmailSendDefinition using SSJS

                              Кампала Садржај Географија Географија Историја Становништво Привреда Партнерски градови Референце Спољашње везе Мени за навигацију0°11′ СГШ; 32°20′ ИГД / 0.18° СГШ; 32.34° ИГД / 0.18; 32.340°11′ СГШ; 32°20′ ИГД / 0.18° СГШ; 32.34° ИГД / 0.18; 32.34МедијиПодациЗванични веб-сајту

                              19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу