I write scraped data to a spreadsheet every day via gspread. The sheet name is the date of the data to be written (yyyy/mm/dd) If it continues to operate normally, the sheets should be arranged in chronological order, but in rare cases the dates may be out of alignment due to the effects of writing failures.
Such a shape.
I had to sort them manually every time I found them, but it's a hassle ... ↓ Yes, let's automate.
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import os
import sys
# GoogleAPI
scope = ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']
#Authentication key
json_keyfile_path = f'credentials.json path'
#Read the service account key
credentials = ServiceAccountCredentials.from_json_keyfile_name(json_keyfile_path, scope)
#Perform OAuth authentication for pydrive
gauth = GoogleAuth()
gauth.credentials = credentials
drive = GoogleDrive(gauth)
#Spreadsheet storage folder
folder_id = 'Folder ID that stores the workbook'
#Get a list of files in the spreadsheet storage folder
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % folder_id}).GetList()
#Extract only the file name from the file list
title_list = [file['title'] for file in file_list]
#Authenticate for gspread
gc = gspread.authorize(credentials)
#Workbook name
book_title = 'Test book'
#Get the spreadsheet ID of the workbook you want to open
sheet_id = [file['id'] for file in file_list if file['title'] == book_title]
sheet_id = sheet_id[0]
#Open workbook
workbook = gc.open_by_key(sheet_id)
#Get all the information on existing worksheets
worksheets = workbook.worksheets()
#Store the title of the current worksheet in the list
tmp_worksheets_title_list = [worksheet.title for worksheet in worksheets]
#Sort titles in descending order
worksheets_title_list = sorted(tmp_worksheets_title_list, reverse=True)
#Store Worksheet type objects in a list in sorted order
worksheets_obj_list = [worksheet for title in worksheets_title_list for worksheet in worksheets if worksheet.title == title]
#Sort worksheets
workbook.reorder_worksheets(worksheets_obj_list)
First of all, I asked Google Sensei a lot, but I can't find anyone doing similar work. Apparently I have to do something myself.
Next, I searched for various official API references. Then, I found those things.
I will read it for the time being.
reorder_worksheets(worksheets_in_desired_order)
Updates the index property of each Worksheets to reflect its index in the provided sequence of Worksheets.
Parameters: worksheets_in_desired_order – Iterable of Worksheet objects in desired order.
Note: If you omit some of the Spreadsheet’s existing Worksheet objects from the provided sequence, those Worksheets will be appended to the end of the sequence in the order that they appear in the list returned by Spreadsheet.worksheets().
New in version 3.4.
Japanese translation
Update the index properties of each worksheet to reflect that index in the set of worksheets provided.
Parameters: worksheets_in_desired_order – You can iterate worksheet objects in any order.
Note: If you omit some of the existing worksheet objects in your spreadsheet from the sequence provided, those worksheets will be Spreadsheet..They are added to the end of the sequence in the order they appear in the list returned by worksheets ().
Version 3.4 new features.
I see, I don't know.
I decided to read the source directly.
def reorder_worksheets(self, worksheets_in_desired_order):
"""Updates the ``index`` property of each Worksheets to reflect
its index in the provided sequence of Worksheets.
:param worksheets_in_desired_order: Iterable of Worksheet objects in desired order.
Note: If you omit some of the Spreadsheet's existing Worksheet objects from
the provided sequence, those Worksheets will be appended to the end of the sequence
in the order that they appear in the list returned by ``Spreadsheet.worksheets()``.
.. versionadded:: 3.4
"""
idx_map = {}
for idx, w in enumerate(worksheets_in_desired_order):
idx_map[w.id] = idx
for w in self.worksheets():
if w.id in idx_map:
continue
idx += 1
idx_map[w.id] = idx
body = {
'requests': [
{
'updateSheetProperties': {
'properties': {'sheetId': key, 'index': val},
'fields': 'index',
}
}
for key, val in idx_map.items()
]
}
return self.batch_update(body)
I see, it seems that w.id is used to get the worksheet ID and sort based on it.
#Open workbook
workbook = gc.open_by_key(sheet_id)
#Get all the information on existing worksheets
worksheets = workbook.worksheets()
print(worksheets)
print(type(worksheets))
print(type(worksheets[0]))
Execution result
[<Worksheet '2020/12/10' id:0>, <Worksheet '2020/12/11' id:750827161>, <Worksheet '2020/12/12' id:868021966>, <Worksheet '2020/12/13' id:1358131870>, <Worksheet '2020/12/14' id:224556508>, <Worksheet '2020/12/16' id:2001082452>, <Worksheet '2020/12/17' id:207127532>, <Worksheet '2020/12/15' id:653171131>, <Worksheet '2020/12/18' id:1969672638>, <Worksheet '2020/12/19' id:428756371>]
<class 'list'>
<class 'gspread.models.Worksheet'>
In this way, worksheet information is stored in the list as Worksheet type. The sheet name and ID are stored, and this ID was acquired by the library earlier.
Pass the list that stores this Worksheet information to the library as an argument. Then, ** change the order in this list to the order you want to sort, and then ** pass it, and it will be sorted in that order.
#Store the title of the current worksheet in the list
tmp_worksheets_title_list = [worksheet.title for worksheet in worksheets]
print(tmp_worksheets_title_list)
#Sort titles in descending order
worksheets_title_list = sorted(tmp_worksheets_title_list, reverse=True)
print(worksheets_title_list)
#Store Worksheet type objects in a list in sorted order
worksheets_obj_list = [worksheet for title in worksheets_title_list for worksheet in worksheets if worksheet.title == title]
print(worksheets_obj_list)
Execution result
['2020/12/10', '2020/12/11', '2020/12/12', '2020/12/13', '2020/12/14', '2020/12/16', '2020/12/17', '2020/12/15', '2020/12/18', '2020/12/19']
['2020/12/19', '2020/12/18', '2020/12/17', '2020/12/16', '2020/12/15', '2020/12/14', '2020/12/13', '2020/12/12', '2020/12/11', '2020/12/10']
[<Worksheet '2020/12/19' id:428756371>, <Worksheet '2020/12/18' id:1969672638>, <Worksheet '2020/12/17' id:207127532>, <Worksheet '2020/12/16' id:2001082452>, <Worksheet '2020/12/15' id:653171131>, <Worksheet '2020/12/14' id:224556508>, <Worksheet '2020/12/13' id:1358131870>, <Worksheet '2020/12/12' id:868021966>, <Worksheet '2020/12/11' id:750827161>, <Worksheet '2020/12/10' id:0>]
Since it is not possible to sort the list that stores Worksheet directly, create a new list using the sheet name as a key.
Now you have a Worksheet information storage list sorted in any order.
#Sort worksheets
workbook.reorder_worksheets(worksheets_obj_list)
Sorting is completed by passing the created list to the library.
Official reference It's hard to understand ... Pien.
Recommended Posts