word frequency from file using partial matchHow to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows
practicality of 30 year fix mortgage at 55 years of age
A high quality contribution but an annoying error is present in my published article
Why does NASA publish all the results/data it gets?
Can a broken/split chain be reassembled?
How can an attacker use robots.txt?
Can I take NEW (still in their boxes) PC PARTS in my checked in luggage?
Why is (inf + 0j)*1 == inf + nanj?
Is it true that, "just ten trading days represent 63 per cent of the returns of the past 50 years"?
Two trains move towards each other, a bird moves between them. How many trips can the bird make?
2000s Animated TV show where teenagers could physically go into a virtual world
Why are there two fundamental laws of logic?
Order of ingredients when making Pizza dough
When is it acceptable to write a bad letter of recommendation?
Is it possible to encode a message in such a way that can only be read by someone or something capable of seeing into the very near future?
Which place in our solar system is the most fit for terraforming?
On the meaning of 'anyways' in "What Exactly Is a Quartz Crystal, Anyways?"
Why did UK NHS pay for homeopathic treatments?
Organisational search option
Strange Sticky Substance on Digital Camera
Guitar tuning (EADGBE), "perfect" fourths?
What can a pilot do if an air traffic controller is incapacitated?
Should the average user with no special access rights be worried about SMS-based 2FA being theoretically interceptable?
Is this a Sherman, and if so what model?
Is it impolite to ask for halal food when traveling to and in Thailand?
word frequency from file using partial match
How to divide a list of values by a number in command line?How to count duplicated last columns without removing them?Adding only existing words in a file from a another file and removing the rest (unix)?How can I append an incremental count to every predefined word of a text file?Sorting some lines in a fileCounting occurrences of word in text fileSearching match of multi-line regex in files (without pcregrep)Compare two text files, extract matching rows of file2 plus additional rows
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a text file like this:
tom
and
jerry
went
to
america
and
england
I want to get the frequency of each word.
When I tried the following command
cat test.txt |sort|uniq -c
I got the following output
1 america
2 and
1 england
1 jerry
1 to
1 tom
1 went
But I need partial matches too. ie, the word to
present in the word tom
. So my expected word count of to
is 2. Is it possible using unix
commands?
text-processing command-line
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment
|
I have a text file like this:
tom
and
jerry
went
to
america
and
england
I want to get the frequency of each word.
When I tried the following command
cat test.txt |sort|uniq -c
I got the following output
1 america
2 and
1 england
1 jerry
1 to
1 tom
1 went
But I need partial matches too. ie, the word to
present in the word tom
. So my expected word count of to
is 2. Is it possible using unix
commands?
text-processing command-line
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment
|
I have a text file like this:
tom
and
jerry
went
to
america
and
england
I want to get the frequency of each word.
When I tried the following command
cat test.txt |sort|uniq -c
I got the following output
1 america
2 and
1 england
1 jerry
1 to
1 tom
1 went
But I need partial matches too. ie, the word to
present in the word tom
. So my expected word count of to
is 2. Is it possible using unix
commands?
text-processing command-line
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have a text file like this:
tom
and
jerry
went
to
america
and
england
I want to get the frequency of each word.
When I tried the following command
cat test.txt |sort|uniq -c
I got the following output
1 america
2 and
1 england
1 jerry
1 to
1 tom
1 went
But I need partial matches too. ie, the word to
present in the word tom
. So my expected word count of to
is 2. Is it possible using unix
commands?
text-processing command-line
text-processing command-line
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 11 hours ago


terdon♦
143k35 gold badges295 silver badges472 bronze badges
143k35 gold badges295 silver badges472 bronze badges
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 12 hours ago
TweetManTweetMan
1133 bronze badges
1133 bronze badges
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
TweetMan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment
|
add a comment
|
3 Answers
3
active
oldest
votes
Here's one way, but it isn't very elegant:
$ sort -u file | while IFS= read -r word; do
printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1
add a comment
|
An awk
approach:
awk '
!x c[$0]; next
for (i in c) if (index($0, i)) c[i]++
ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn
Which on your input give
3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went
thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?
– TweetMan
5 hours ago
@TweetMan depends how many unique words there are. It stores all unique words in memory.
– Stéphane Chazelas
5 hours ago
Hmm. then that would be a problem. it may crash the system.
– TweetMan
5 hours ago
Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.
– A.Danischewski
2 hours ago
add a comment
|
This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":
sort -u < in | while read w
do
printf "%dt%sn" `grep -c "$w" in` "$w"
done
which on your input got me:
1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
TweetMan is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's one way, but it isn't very elegant:
$ sort -u file | while IFS= read -r word; do
printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1
add a comment
|
Here's one way, but it isn't very elegant:
$ sort -u file | while IFS= read -r word; do
printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1
add a comment
|
Here's one way, but it isn't very elegant:
$ sort -u file | while IFS= read -r word; do
printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1
Here's one way, but it isn't very elegant:
$ sort -u file | while IFS= read -r word; do
printf '%st%sn' "$word" "$(grep -cFe "$word" file)";
done
america 1
and 3
england 1
jerry 1
to 2
tom 1
went 1
edited 5 hours ago


Stéphane Chazelas
335k58 gold badges654 silver badges1031 bronze badges
335k58 gold badges654 silver badges1031 bronze badges
answered 11 hours ago


terdon♦terdon
143k35 gold badges295 silver badges472 bronze badges
143k35 gold badges295 silver badges472 bronze badges
add a comment
|
add a comment
|
An awk
approach:
awk '
!x c[$0]; next
for (i in c) if (index($0, i)) c[i]++
ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn
Which on your input give
3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went
thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?
– TweetMan
5 hours ago
@TweetMan depends how many unique words there are. It stores all unique words in memory.
– Stéphane Chazelas
5 hours ago
Hmm. then that would be a problem. it may crash the system.
– TweetMan
5 hours ago
Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.
– A.Danischewski
2 hours ago
add a comment
|
An awk
approach:
awk '
!x c[$0]; next
for (i in c) if (index($0, i)) c[i]++
ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn
Which on your input give
3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went
thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?
– TweetMan
5 hours ago
@TweetMan depends how many unique words there are. It stores all unique words in memory.
– Stéphane Chazelas
5 hours ago
Hmm. then that would be a problem. it may crash the system.
– TweetMan
5 hours ago
Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.
– A.Danischewski
2 hours ago
add a comment
|
An awk
approach:
awk '
!x c[$0]; next
for (i in c) if (index($0, i)) c[i]++
ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn
Which on your input give
3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went
An awk
approach:
awk '
!x c[$0]; next
for (i in c) if (index($0, i)) c[i]++
ENDfor (i in c) print c[i]"t"i' file x=1 file | sort -k1rn
Which on your input give
3 and
2 to
1 america
1 england
1 jerry
1 tom
1 went
answered 5 hours ago


Stéphane ChazelasStéphane Chazelas
335k58 gold badges654 silver badges1031 bronze badges
335k58 gold badges654 silver badges1031 bronze badges
thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?
– TweetMan
5 hours ago
@TweetMan depends how many unique words there are. It stores all unique words in memory.
– Stéphane Chazelas
5 hours ago
Hmm. then that would be a problem. it may crash the system.
– TweetMan
5 hours ago
Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.
– A.Danischewski
2 hours ago
add a comment
|
thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?
– TweetMan
5 hours ago
@TweetMan depends how many unique words there are. It stores all unique words in memory.
– Stéphane Chazelas
5 hours ago
Hmm. then that would be a problem. it may crash the system.
– TweetMan
5 hours ago
Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.
– A.Danischewski
2 hours ago
thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?
– TweetMan
5 hours ago
thank you. this command works. if i run this command against a large file around 30gb, will a machine of 8gb ram handle that?
– TweetMan
5 hours ago
@TweetMan depends how many unique words there are. It stores all unique words in memory.
– Stéphane Chazelas
5 hours ago
@TweetMan depends how many unique words there are. It stores all unique words in memory.
– Stéphane Chazelas
5 hours ago
Hmm. then that would be a problem. it may crash the system.
– TweetMan
5 hours ago
Hmm. then that would be a problem. it may crash the system.
– TweetMan
5 hours ago
Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.
– A.Danischewski
2 hours ago
Awk isn't safe with large files and it bogs down. You may want to look into loading the data into a SQL database and querying it that way.
– A.Danischewski
2 hours ago
add a comment
|
This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":
sort -u < in | while read w
do
printf "%dt%sn" `grep -c "$w" in` "$w"
done
which on your input got me:
1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went
add a comment
|
This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":
sort -u < in | while read w
do
printf "%dt%sn" `grep -c "$w" in` "$w"
done
which on your input got me:
1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went
add a comment
|
This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":
sort -u < in | while read w
do
printf "%dt%sn" `grep -c "$w" in` "$w"
done
which on your input got me:
1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went
This won't crash the system but it may take a long time to run, since it parses the input multiple times. Assuming the input file is called "in":
sort -u < in | while read w
do
printf "%dt%sn" `grep -c "$w" in` "$w"
done
which on your input got me:
1 america
3 and
1 england
1 jerry
2 to
1 tom
1 went
answered 2 hours ago
sitaramsitaram
1015 bronze badges
1015 bronze badges
add a comment
|
add a comment
|
TweetMan is a new contributor. Be nice, and check out our Code of Conduct.
TweetMan is a new contributor. Be nice, and check out our Code of Conduct.
TweetMan is a new contributor. Be nice, and check out our Code of Conduct.
TweetMan is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542850%2fword-frequency-from-file-using-partial-match%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown