Small story: Divide the alphabet into arbitrary numbers and output (solved)

Overview

When I'm researching and studying various things related to pandas data frames, I sometimes come across a situation where I have to store appropriate data in the name of creating a data frame for the time being. I could do as many integers as I could, but I couldn't think of the alphabet, and I've been working on it all the time.

It doesn't take much time because it's not a big number, but even though I'm trying to automate simple tasks, I felt like doing simple tasks in the process of studying.

As a result of a little research, I was able to solve it with the string in the standard library, so I will post it for the purpose of promoting memory and not drying the brush.

string module

According to this site, many of the functions that were once implemented in string have been migrated to str and unicode objects as methods. I thought it was reasonable that I hadn't seen it so much.

The> string module starts with an early version of Python. In version 2.0, many functions implemented only in this module have been migrated to the methods of str and unicode objects.

This time, we will use the constant ascii_lowercase provided in this string module.

string.ascii_lowercase A lowercase letter that contains a lowercase alphabet. Super simple. To use it, you need to import the string in advance.

python


import string

print(string.ascii_lowercase)
print(type(string.ascii_lowercase)

output


abcdefghijklmnopqrstuvwxyz
<class 'str'>

Since it is a character string, it also supports notation using indexes.

python


print(string.ascii_lowercase[2])
print(string.ascii_lowercase[8:11])
print(string.ascii_lowercase[::-1])

output


c
ijk
zyxwvutsrqponmlkjihgfedcba

This time, I will use this to create an appropriate data frame.

Data frame creation

After practicing, I made a function so that alphabets can be assigned arbitrarily according to the given index_list and columns_list. I'm not good at turning numbers like this, so it's a pretty messy function.

python


import pandas as pd
import string
from tabulate import tabulate

def make_variable_alphabet_dataframe(index_size, columns_size):
    index_list = [f"index{i}" for i in range(1, index_size+1)]
    columns_list = [f"column{i}" for i in range(1, columns_size+1)]
    alphabet_list = [list(string.ascii_lowercase[i-(columns_size-1):i+1]) for i in range(columns_size-1, index_size*columns_size, columns_size)]
    df = pd.DataFrame(alphabet_list, index=index_list, columns=columns_list)
    
    return df


df1 = make_variable_alphabet_dataframe(3, 4)
print(tabulate(df1, df1.columns, tablefmt='github', showindex=True))

output


|        | column1   | column2   | column3   | column4   |
|--------|-----------|-----------|-----------|-----------|
| index1 | a         | b         | c         | d         |
| index2 | e         | f         | g         | h         |
| index3 | i         | j         | k         | l         |

It seems that None will be stored when the size is larger than 26.

python


df2 = make_variable_alphabet_dataframe(6, 6)
print(tabulate(df2, df2.columns, tablefmt='github', showindex=True))

output


|        | column1   | column2   | column3   | column4   | column5   | column6   |
|--------|-----------|-----------|-----------|-----------|-----------|-----------|
| index1 | a         | b         | c         | d         | e         | f         |
| index2 | g         | h         | i         | j         | k         | l         |
| index3 | m         | n         | o         | p         | q         | r         |
| index4 | s         | t         | u         | v         | w         | x         |
| index5 | y         | z         |           |           |           |           |
| index6 |           |           |           |           |           |           |

To be honest, I don't know **. When I moved the sample to write "Of course I throw an alphabet error", it worked. This may be the rumored "for some reason it works". Is it per list or the specification of DataFrame itself? I tried to find it, but I don't understand after all. I am concerned about the dried squid caught in the back teeth, so I would appreciate it if you could teach me.

Postscript (2020/10/04)

Information was provided in the comments. It seems that None is automatically stored in the data frame as long as the number of elements up to the first line is sufficient. So, to be precise, I think that "** If you set the column so that the first row fits within 26 columns, no error will be thrown **". Please refer to the post for details, including the sample program.

that's all. Thank you for visiting. Next time, I will talk about gspread_formatting, which I feel that there is little information on the net including Qiita for some reason.

Recommended Posts

Small story: Divide the alphabet into arbitrary numbers and output (solved)
Divide the dataset (ndarray) into arbitrary proportions with NumPy
The story of Python and the story of NaN
[Python] Change the alphabet to numbers