word frequency from file using partial matchHow to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows

practicality of 30 year fix mortgage at 55 years of age

A high quality contribution but an annoying error is present in my published article

Why does NASA publish all the results/data it gets?

Can a broken/split chain be reassembled?

How can an attacker use robots.txt?

Can I take NEW (still in their boxes) PC PARTS in my checked in luggage?

Why is (inf + 0j)*1 == inf + nanj?

Is it true that, "just ten trading days represent 63 per cent of the returns of the past 50 years"?

Two trains move towards each other, a bird moves between them. How many trips can the bird make?

2000s Animated TV show where teenagers could physically go into a virtual world

Why are there two fundamental laws of logic?

Order of ingredients when making Pizza dough

When is it acceptable to write a bad letter of recommendation?

Is it possible to encode a message in such a way that can only be read by someone or something capable of seeing into the very near future?

Which place in our solar system is the most fit for terraforming?

On the meaning of 'anyways' in "What Exactly Is a Quartz Crystal, Anyways?"

Why did UK NHS pay for homeopathic treatments?

Organisational search option

Strange Sticky Substance on Digital Camera

Guitar tuning (EADGBE), "perfect" fourths?

What can a pilot do if an air traffic controller is incapacitated?

Should the average user with no special access rights be worried about SMS-based 2FA being theoretically interceptable?

Is this a Sherman, and if so what model?

Is it impolite to ask for halal food when traveling to and in Thailand?

word frequency from file using partial match

How to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 12 hours ago

TweetMan

1133 bronze badges

New contributor

add a comment
|

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 12 hours ago

TweetMan

1133 bronze badges

New contributor

add a comment
|

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 12 hours ago

TweetMan

1133 bronze badges

New contributor

I have a text file like this:

tom
and
jerry
went
to
america
and
england

I want to get the frequency of each word.

When I tried the following command

cat test.txt |sort|uniq -c

I got the following output

 1 america
 2 and
 1 england
 1 jerry
 1 to
 1 tom
 1 went

But I need partial matches too. ie, the word to present in the word tom. So my expected word count of to is 2. Is it possible using unix commands?

text-processing command-line

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 12 hours ago

TweetMan

1133 bronze badges

New contributor

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 12 hours ago

TweetMan

1133 bronze badges

New contributor

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

edited 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

asked 12 hours ago

TweetMan

1133 bronze badges

New contributor

asked 12 hours ago

TweetMan

1133 bronze badges

asked 12 hours ago

TweetMan

1133 bronze badges

New contributor

add a comment
|

3 Answers
3

active

oldest

votes

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
5 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
5 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
5 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
2 hours ago

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 2 hours ago

sitaram

1015 bronze badges

add a comment
|

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

TweetMan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

Here's one way, but it isn't very elegant:

$ sort -u file | while IFS= read -r word; do 
 printf '%st%sn' "$word" "$(grep -cFe "$word" file)"; 
 done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

edited 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

answered 11 hours ago

terdon♦

143k35 gold badges295 silver badges472 bronze badges

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
5 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
5 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
5 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
2 hours ago

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
5 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
5 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
5 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
2 hours ago

add a comment
|

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

An awk approach:

awk '
 !x c[$0]; next
 for (i in c) if (index($0, i)) c[i]++
 ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn

Which on your input give

3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

answered 5 hours ago

Stéphane Chazelas

335k58 gold badges654 silver badges1031 bronze badges

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
5 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
5 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
5 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
2 hours ago

add a comment
|

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
5 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
5 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
5 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
2 hours ago

thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?

– TweetMan
5 hours ago

@TweetMan depends how many unique words there are. It stores all unique words in memory.

– Stéphane Chazelas
5 hours ago

Hmm. then that would be a problem. it may crash the system.

– TweetMan
5 hours ago

Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.

– A.Danischewski
2 hours ago

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 2 hours ago

sitaram

1015 bronze badges

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 2 hours ago

sitaram

1015 bronze badges

add a comment
|

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 2 hours ago

sitaram

1015 bronze badges

This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":

sort -u < in | while read w
do
 printf "%dt%sn" `grep -c "$w" in` "$w"
done

which on your input got me:

1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went

answered 2 hours ago

sitaram

1015 bronze badges

answered 2 hours ago

sitaram

1015 bronze badges

answered 2 hours ago

sitaram

1015 bronze badges

answered 2 hours ago

sitaram

1015 bronze badges

add a comment
|

TweetMan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

TweetMan is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

U9WlgD H3dhigMzyhR,xD

搜尋此網誌

Xjyuk

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

3 Answers
3

3 Answers
3

3 Answers
3