split a six digits number column into separated columns with one digitHow do you split a list into evenly sized chunks?How to add an extra column to a NumPy arrayRenaming columns in pandasAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasConvert list of dictionaries to a pandas DataFrame
Tying double knot of garbarge bag
What is the majority of the UK Government as of 2019-09-04?
What fraction of 2x2 USA call signs are vanity calls?
Solve the given inequality below in the body.
Is a paralyzed creature limp or rigid?
Bidirectional Dictionary
How can I oppose my advisor granting gift authorship to a collaborator?
If I have an accident, should I file a claim with my car insurance company?
What are some countries where you can be imprisoned for reading or owning a Bible?
How many people can lift Thor's hammer?
How do I stop making people jump at home and at work?
MOSFET broke after attaching capacitor bank
How can I implement regular expressions on an embedded device?
What's this constructed number's starter?
Is directly echoing the user agent in PHP a security hole?
A Meal fit for a King
Are there mathematical concepts that exist in the fourth dimension, but not in the third dimension?
What quests do you need to stop at before you make an enemy of a faction for each faction?
If magnetic force can't do any work, then how can we define a potential?
Do 643,000 Americans go bankrupt every year due to medical bills?
FORMAT returns large row size and data size
What's the point of this macro?
Entering the US with dual citizenship but US passport is long expired?
Why is a pressure canner needed when canning?
split a six digits number column into separated columns with one digit
How do you split a list into evenly sized chunks?How to add an extra column to a NumPy arrayRenaming columns in pandasAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasConvert list of dictionaries to a pandas DataFrame
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
add a comment |
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
If you don't have to use numpy or pandas -for num in str(my_number): print(num)
– wcarhart
9 hours ago
What is source of your data?numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?
– Daweo
8 hours ago
add a comment |
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
how can I by using pandas or numpy separate one column of 6 integer digits into 6 columns with one digit each?
import pandas as pd
import numpy as np
df = pd.Series(range(123456,123465))
df = pd.DataFrame(df)
df.head()
what I have is like this one below
Number
654321
223344
The desired outcome should be like this one below.
Number | x1 | x2 | x3 | x4 | x5 | x6 |
654321 | 6 | 5 | 4 | 3 | 2 | 1 |
223344 | 2 | 2 | 3 | 3 | 4 | 4 |
python pandas numpy
python pandas numpy
edited 7 hours ago
msalem85
asked 9 hours ago
msalem85msalem85
364 bronze badges
364 bronze badges
If you don't have to use numpy or pandas -for num in str(my_number): print(num)
– wcarhart
9 hours ago
What is source of your data?numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?
– Daweo
8 hours ago
add a comment |
If you don't have to use numpy or pandas -for num in str(my_number): print(num)
– wcarhart
9 hours ago
What is source of your data?numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?
– Daweo
8 hours ago
If you don't have to use numpy or pandas -
for num in str(my_number): print(num)
– wcarhart
9 hours ago
If you don't have to use numpy or pandas -
for num in str(my_number): print(num)
– wcarhart
9 hours ago
What is source of your data?
numpy.array
or pandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?– Daweo
8 hours ago
What is source of your data?
numpy.array
or pandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?– Daweo
8 hours ago
add a comment |
8 Answers
8
active
oldest
votes
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
8 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
6 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
5 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57792952%2fsplit-a-six-digits-number-column-into-separated-columns-with-one-digit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
8 Answers
8
active
oldest
votes
8 Answers
8
active
oldest
votes
active
oldest
votes
active
oldest
votes
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
MCVE
Here is a simple suggestion:
import pandas as pd
# MCVE dataframe:
df = pd.DataFrame([123456, 456789, 135797, 123, 123456789], columns=['number'])
def digit(x, n):
"""Return the n-th digit of integer in base 10"""
return (x // 10**n) % 10
def digitize(df, key, n):
"""Extract n less significant digits from an integer in base 10"""
for i in range(n):
df['x%d' % i] = digit(df[key], n-i-1)
# Apply function on dataframe (inplace):
digitize(df, 'number', 6)
For the trial dataframe, it returns:
number x0 x1 x2 x3 x4 x5
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
3 123 0 0 0 1 2 3
4 123456789 4 5 6 7 8 9
Observations
This method avoids the need to cast into string
and then cast again to int
.
It relies on modular integer arithmetic, bellow details of operations:
10**3 # int: 1000 (integer power)
54321 // 10**3 # int: 54 (quotient of integer division)
(54321 // 10**3) % 10 # int: 4 (remainder of integer division, modulo)
Last but not least, it is fail safe and exact for number shorter than n
digits or greater than (notice it returns the n
less significant digits in latter case).
edited 7 hours ago
answered 8 hours ago
jlandercyjlandercy
1,9761 gold badge17 silver badges31 bronze badges
1,9761 gold badge17 silver badges31 bronze badges
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
1
get rid offapply
, you can simply dodigit(df['Number'], i)
.
– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
1
1
get rid off
apply
, you can simply do digit(df['Number'], i)
.– Quang Hoang
8 hours ago
get rid off
apply
, you can simply do digit(df['Number'], i)
.– Quang Hoang
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
@QuangHoang Thank you for pointing this out, is there any benefit (performance) alongside with code compactness and readability?
– jlandercy
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
Without apply, it's vectorized, so you would see big improvement in terms of speed.
– Quang Hoang
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
@QuangHoang updated thank you
– jlandercy
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Some fun with views, assuming that each number has 6 digits:
u = df[['Number']].to_numpy().astype('U6').view('U1').astype(int)
df.join(pd.DataFrame(u).rename(columns=lambda c: f'xc+1'))
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
answered 8 hours ago
user3483203user3483203
38.5k8 gold badges32 silver badges62 bronze badges
38.5k8 gold badges32 silver badges62 bronze badges
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Impressive one-liner, although it breaks if there are numbers with different number of digits.
– jdehesa
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
Yea, that assumption has to be made, definitely more of a trick than something to use.
– user3483203
8 hours ago
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
add a comment |
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
Turn it into a string first!
Also, included a zfill
just in case not all numbers are 6 digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
df.join(d)
Number x1 x2 x3 x4 x5 x6
0 654321 6 5 4 3 2 1
1 223344 2 2 3 3 4 4
Details
This gets the digits
dat = [list(map(int, str(x).zfill(6))) for x in df.Number]
dat
[[6, 5, 4, 3, 2, 1], [2, 2, 3, 3, 4, 4]]
This creates a new dataframe with the same index as df
AND renames the columns to have an 'x'
in front and begin with 'x1'
and not 'x0'
d = pd.DataFrame(dat, df.index).rename(columns=lambda x: f'xx + 1')
d
x1 x2 x3 x4 x5 x6
0 6 5 4 3 2 1
1 2 2 3 3 4 4
answered 8 hours ago
piRSquaredpiRSquared
178k26 gold badges195 silver badges352 bronze badges
178k26 gold badges195 silver badges352 bronze badges
add a comment |
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
8 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
8 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
While string-based solutions are simpler and probably good enough in most cases, you can do this with math which, if you have a big data set, can make a significant difference in speed.
import numpy as np
import pandas as pd
df = pd.DataFrame('Number': [654321, 223344])
num_cols = int(np.log10(df['Number'].max() - 1)) + 1
vals = (df['Number'].values[:, np.newaxis] // (10 ** np.arange(num_cols - 1, -1, -1))) % 10
df_digits = pd.DataFrame(vals, columns=[f'xi + 1' for i in range(num_cols)
df2 = pd.concat([df, df_digits])], axis=1)
print(df2)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
answered 8 hours ago
jdehesajdehesa
35k4 gold badges42 silver badges66 bronze badges
35k4 gold badges42 silver badges66 bronze badges
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
8 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
add a comment |
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
8 hours ago
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.
– piRSquared
6 hours ago
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
8 hours ago
I definitely like this approach. I'm trying to make this prettier (-:
– piRSquared
8 hours ago
1
1
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.– piRSquared
6 hours ago
vals = (df.to_numpy() // 10 ** np.arange(6) % 10)[:, ::-1]
Obviously, assumptions have to be made. I basically made some golf improvements at the expense of generalization.– piRSquared
6 hours ago
add a comment |
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
6 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
5 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
6 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
5 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
You could use np.unravel_index
df = pd.DataFrame('Number': [654321,223344])
def split_digits(df):
# get data as numpy array
numbers = df['Number'].to_numpy()
# extract digits
digits = np.unravel_index(numbers, 6*(10,))
# create column headers
columns = ['Number', *(f'xi' for i in "123456")]
# build and return new data frame
return pd.DataFrame(np.stack([numbers, *digits], axis=1), columns=columns, index=df.index)
split_digits(df)
# Number x1 x2 x3 x4 x5 x6
# 0 654321 6 5 4 3 2 1
# 1 223344 2 2 3 3 4 4
timeit(lambda:split_digits(df),number=1000)
# 0.3550272472202778
Thanks @GZ0 for some pandas
tips.
edited 1 hour ago
answered 6 hours ago
Paul PanzerPaul Panzer
35.1k2 gold badges23 silver badges53 bronze badges
35.1k2 gold badges23 silver badges53 bronze badges
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
6 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
5 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
|
show 2 more comments
1
This is an excellent trick and one-lines @Paul +1, What does**
inassign
, would you mind explaining the code.
– Karn Kumar
6 hours ago
@KarnKumar**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much aboutpandas
, so this part of the code may be far from being optimal.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
One alternative way to return a new data frame usingdigits
isdf.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also,df['Number']
can be used as a numpy array directly without explicitly accessing the.values
attribute.
– GZ0
5 hours ago
1
@PaulPanzer You solution is indeed a lot more performant.df.assign
makes a copy of the orignal dataframe and then add columns one by one. Thedf.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) Inpandas
version >= 0.24.0,df.to_numpy()
is recommended in favor ofdf.values
; (2) the index of the original data frame should be preserved by passingindex=df.index
into the constructor function.
– GZ0
2 hours ago
1
1
This is an excellent trick and one-lines @Paul +1, What does
**
in assign
, would you mind explaining the code.– Karn Kumar
6 hours ago
This is an excellent trick and one-lines @Paul +1, What does
**
in assign
, would you mind explaining the code.– Karn Kumar
6 hours ago
@KarnKumar
**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much about pandas
, so this part of the code may be far from being optimal.– Paul Panzer
5 hours ago
@KarnKumar
**
"unrolls" the dictionary, so each key-value pair becomes a keyword argument to the function (assign
in this case). Btw. I don't know much about pandas
, so this part of the code may be far from being optimal.– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
@KarnKumar I've made an annotated version in case you are interested.
– Paul Panzer
5 hours ago
1
1
One alternative way to return a new data frame using
digits
is df.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also, df['Number']
can be used as a numpy array directly without explicitly accessing the .values
attribute.– GZ0
5 hours ago
One alternative way to return a new data frame using
digits
is df.assign(**dict(zip((f'xi' for i in range(1,7)), digits)))
. Also, df['Number']
can be used as a numpy array directly without explicitly accessing the .values
attribute.– GZ0
5 hours ago
1
1
@PaulPanzer You solution is indeed a lot more performant.
df.assign
makes a copy of the orignal dataframe and then add columns one by one. The df.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) In pandas
version >= 0.24.0, df.to_numpy()
is recommended in favor of df.values
; (2) the index of the original data frame should be preserved by passing index=df.index
into the constructor function.– GZ0
2 hours ago
@PaulPanzer You solution is indeed a lot more performant.
df.assign
makes a copy of the orignal dataframe and then add columns one by one. The df.copy()
call actually takes a lot more time than adding columns for some unknown reasons. IMO there are two things that could be improved in your solution though: (1) In pandas
version >= 0.24.0, df.to_numpy()
is recommended in favor of df.values
; (2) the index of the original data frame should be preserved by passing index=df.index
into the constructor function.– GZ0
2 hours ago
|
show 2 more comments
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
add a comment |
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
add a comment |
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
Assuming that all numbers are of same length (have equal number of digits), I would do it following way using numpy
:
import numpy as np
a = np.array([[654321],[223344]])
str_a = a.astype(str)
out = np.apply_along_axis(lambda x:list(x[0]),1,str_a)
print(out)
Output:
[['6' '5' '4' '3' '2' '1']
['2' '2' '3' '3' '4' '4']]
Note that out
is currently np.array
of str
s, you might convert it to int
if such need arise.
answered 8 hours ago
DaweoDaweo
2,0651 gold badge2 silver badges6 bronze badges
2,0651 gold badge2 silver badges6 bronze badges
add a comment |
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
add a comment |
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
I really liked @user3483203's answer. I think .str.findall
could work with any number of digits:
df = pd.DataFrame(
'Number' : [65432178888, 22334474343]
)
u = df['Number'].astype(str).str.findall(r'(w)')
df.join(pd.DataFrame(list(u)).rename(columns=lambda c: f'xc+1')).apply(pd.to_numeric)
Number x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
0 65432178888 6 5 4 3 2 1 7 8 8 8 8
1 22334474343 2 2 3 3 4 4 7 4 3 4 3
edited 8 hours ago
answered 8 hours ago
political scientistpolitical scientist
1,8121 gold badge8 silver badges18 bronze badges
1,8121 gold badge8 silver badges18 bronze badges
add a comment |
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
add a comment |
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
Simple way around:
>>> df
number
0 123456
1 456789
2 135797
First convert the column into string
>>> df['number'] = df['number'].astype(str)
Create the new columns using string indexing
>>> df['x1'] = df['number'].str[0]
>>> df['x2'] = df['number'].str[1]
>>> df['x3'] = df['number'].str[2]
>>> df['x4'] = df['number'].str[3]
>>> df['x5'] = df['number'].str[4]
>>> df['x6'] = df['number'].str[5]
>>> df
number x1 x2 x3 x4 x5 x6
0 123456 1 2 3 4 5 6
1 456789 4 5 6 7 8 9
2 135797 1 3 5 7 9 7
>>> df.drop('number', axis=1, inplace=True)
>>> df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
@another trick with str.split()
>>> df = df['number'].str.split('(d1)', expand=True).add_prefix('x').drop(columns=['x0', 'x2', 'x4', 'x6', 'x8', 'x10', 'x12'])
>>> df
x1 x3 x5 x7 x9 x11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns='x3':'x2', 'x5':'x3', 'x7':'x4', 'x9':'x5', 'x11':'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
OR
>>> df = df['number'].str.split(r'(d1)', expand=True).T.replace('', np.nan).dropna().T
>>> df
1 3 5 7 9 11
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
>>> df.rename(columns=1:'x1', 3:'x2', 5:'x3', 7:'x4', 9:'x5', 11:'x6')
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 4 5 6 7 8 9
2 1 3 5 7 9 7
edited 6 hours ago
answered 8 hours ago
Karn KumarKarn Kumar
3,7081 gold badge7 silver badges22 bronze badges
3,7081 gold badge7 silver badges22 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57792952%2fsplit-a-six-digits-number-column-into-separated-columns-with-one-digit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you don't have to use numpy or pandas -
for num in str(my_number): print(num)
– wcarhart
9 hours ago
What is source of your data?
numpy.array
orpandas.dataframe
are delivered to you or you are getting just text with numbers separated by newlines?– Daweo
8 hours ago