[JAVA] Scraping with jsoup and taking the "Like Count" ranking of Qiita organizations

Somehow Qiita Orgs Ranking has stopped, so I took the "likes" of the organization in java and arranged them. (Unlike Qiita Orgs Ranking, it is not the number of contributions.)

The library of jsoup is used for scraping. The implementation is quite appropriate, so I think it only works now.

(If you do too much, Qiita may get angry)

Source code

import java.io.IOException;
import java.io.UncheckedIOException;
import java.net.URISyntaxException;
import java.net.URL;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Set;

import org.jsoup.helper.HttpConnection;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class QiitaOrgsRank {
	static class Data {
		private final String org;
		private final URL url;
		private final int iine;

		public Data(URL url) {
			this.url = url;
			System.out.println("connect:" + url);
			Document document;
			try {
				document = HttpConnection.connect(url).get();
			} catch (IOException e) {
				throw new UncheckedIOException(e);
			}
			Elements stats = document.select(".organizationHeader_stats_value");
			Elements name = document.select(".organizationHeader_profile_orgName");

			org = name.get(0).text();
			iine = Integer.parseInt(stats.get(1).text().trim());
		}

		public int getIine() {
			return iine;
		}

		@Override
		public String toString() {
			return org + "\t" + iine + "\t" + url;
		}
	}

	public static void main(String[] args) throws URISyntaxException {

		Set<URL> urls = getOrgUrls();
		int rank = 1;
		for (Data data : (Iterable<Data>) () -> urls.stream()
				.map(Data::new)
				.sorted(Comparator.comparing(Data::getIine).reversed())
				.iterator()) {
			System.out.println(rank++ + "\t" + data.toString());
		}

	}

	private static Set<URL> getOrgUrls() throws URISyntaxException {
		Set<URL> urls = new HashSet<>();
		int i = 1;
		while (true) {
			try {
				Set<URL> orgs = getOrgUrls(i++);
				if (orgs.isEmpty()) {
					break;
				}
				urls.addAll(orgs);
			} catch (IOException e) {
				break;
			}
		}
		return urls;
	}

	private static Set<URL> getOrgUrls(int i) throws IOException, URISyntaxException {
		URL pageUrl = new URL("http://qiita.com/organizations?page=" + i);
		System.out.println("connect:" + pageUrl);

		Set<URL> urls = new HashSet<>();
		Document document = HttpConnection.connect(pageUrl).get();

		Elements elements = document.select(".organizationsList_orgName").select("a");
		for (Element element : elements) {
			URL url = pageUrl.toURI().resolve(element.attr("href")).toURL();
			urls.add(url);
		}
		return urls;
	}

}

result

As of July 22, 2017, we are in 15th place

1 TIS Co., Ltd. 49212 http://qiita.com/organizations/tis
2	Mercari	47221	http://qiita.com/organizations/mercari
3	Wantedly, Inc.	45934	http://qiita.com/organizations/wantedly
4 Increments Inc. 38725 http://qiita.com/organizations/increments
5 Sonic Garden Co., Ltd. 37053 http://qiita.com/organizations/sonicgarden
6 Rector 32109 http://qiita.com/organizations/rector
7 Basic, Inc. 25553 http://qiita.com/organizations/basicinc
8 Topgate Co., Ltd. 22962 http://qiita.com/organizations/topgate
9	ShouldBee	22815	http://qiita.com/organizations/shouldbee
10 Dwango Co., Ltd. 22029 http://qiita.com/organizations/dwango
11 Pixiv Inc. 21340 http://qiita.com/organizations/pixiv
12	Drivemode, Inc.	18300	http://qiita.com/organizations/drivemode
13 Atrae Co., Ltd. 16562 http://qiita.com/organizations/atrae
14	freee	16294	http://qiita.com/organizations/freee
15 Future Architect Co., Ltd. 16168 http://qiita.com/organizations/future

that? Isn't this number significantly reduced with Increments?

Recommended Posts

Scraping with jsoup and taking the "Like Count" ranking of Qiita organizations
Website scraping with jsoup
Easy web scraping with Jsoup