For the first time, I made an API that returns scraped information with json using Flask and Heroku of Python, so I would like to summarize the method I did at that time.
The environment construction until *** HelloWorld *** on heroku is the first part Re: Heroku life - Environment start with Flask from zero and the Hello World ~ At The code of API scraping with selenium and Beautiful soup to be deployed on heroku this time and its explanation etc. are the second part. Re: Life in Heroku starting from scratch with Flask ~ Selenium & PhantomJS & Beautifulsoup ~ Since it is written in, please see it together.
This time, I will write how to scrape using Selenium and PhantomJS using SlideShare as a theme.
*** Since it became long when I put it together in one article, the contents of the program operated and scraped by selenium are divided into the second part. *** ***
Again, we need to raise the current file state to deploy to Heroku.
$ pip freeze > requirements.txt
Prepare a file that specifies which version of the runtime Heroku will use when using Python.
$ echo "python-3.5.2" >> runtime.txt
When you get here, just like you did last time
$ git init
#Set git in the folder
$ git add api.py Procfile requirements.txt runtime.txt
#Add everything
#See below.Does not include buildpacks
$ git commit -m "firstcommit"
#commit
$ heroku login
#Log in to heroku Enter your registered email address and pass
$ heroku create slideshare-api
#Create a new project for heroku
#You cannot use the same project name, so please change it if you do.
$ git push heroku master
#Push to your heroku project
It doesn't work on heroku as PhantomJS isn't ready yet at this stage.
After uploading the app to Heroku, you need to change the settings to use buildpack-multi, which can use both PhantomJS and Python buildpacks. Describe which buildpack to use in the `` `.buildpacks``` file, change the settings with the command, and then push to heroku. Do all three commands.
$ heroku config:add BUILDPACK_URL=https://github.com/heroku/heroku-buildpack-multi.git
$ heroku buildpacks:set https://github.com/heroku/heroku-buildpack-multi.git
$ heroku config:add LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib:/app/vendor/phantomjs/lib
https://github.com/heroku/heroku-buildpack-python.git
https://github.com/stomita/heroku-buildpack-phantomjs.git
Once you've done this, push to heroku!
$ git add .buildpacks
$ git commit -m "add buildpacks"
$ git push heroku master
Reference: This time, I'm doing it in a way similar to this one.
I wrote another method on this site, but it didn't work for me.
Make sure it works
moved! !! !! !!
See also the error mentioned in the previous article. Please think that the command to write to heroku's config etc. has not been written to heroku unless you push it to heroku.
①: Is the Framework of Heroku's app Multipack, and is Buildpacks `` `https://github.com/heroku/heroku-buildpack-multi.git```?
If not, add buildpack-multi to heroku`` and` `buildpack-multi to heroku It seems that the
part is not working well, so after typing the command again, Try changing some files and pushing back to heroku.
②: Check with heroku logs As I wrote in the previous article, if you get an error
$ heroku logs
Please check with.
driver = webdriver.PhantomJS()If the error message shows the part of, or the place where the element is specified by using the driver at the beginning, it seems that phantomjs cannot be used well on heroku side than the program.**①**Check the same place as, or try hitting the part that sets the PhantomJS path to heroku again, and then push to heroku.
③: Check if there is a vendor on heroku
Let's access the storage of heroku itself.
$ heroku run bash
You can access the heroku server like ssh with. If phantomJS is successfully on heroku
~ $ ls api.py Procfile requirements.txt runtime.txt vendor
*** vendor *** is generated like this, and PhantomJS is included in it.
If it doesn't, it's not working, so it's a good idea to check it.
## Afterword
Although it works locally, there are many places where PhantomJS is raised to heroku and it does not work well on heroku side, so I think that it may get stuck, but please do your best!
Also, if there are any improvements or mistakes, we would appreciate it if you could point them out in the comments section.
Twitter:@ymgn_ll
Recommended Posts