Figuring out GitHub Search

I have been wanting to play around with GitHub data as a side project so when I first started poking around for what GitHub meta & event data I could grab, not yet being an API ninja, I went to the simplest point-click-Enter-Return shop in town: GitHub Search.

But when I was there, I noticed something strange. Each time I would refresh a search using the same parameters (e.g., “created:<2013-12-31 fork:true”), the results would produce different numbers. Moreover, conflicting counts of total repositories were displayed in the same search results: under “Repositories” and in “We’ve found [] repository results”.

search4search7search14search10

What was happening? I asked my to-go guy regarding all-things-tech (the technical stuff beyond the default “just try restarting your computer”). He gave me a “Hmmmm that’s strange. Something to do with caching?”. Not quite convinced, I decided to seek a more authoritative explanation from GitHub.

question_v2

I wasn’t waiting with bated breath (already having generated low expectations from a lifetime of mediocre customer service, unfair as it is to batch many tech startups/companies with bureaucratic behemoths like “Comcrap”, as my roommate amusingly nicknames).

Less than two days later I received a response from GitHub Support.

Thanks for getting in touch and asking about this!

 Some queries are computationally expensive for our Search to execute. For performance reasons, there are limits to how long queries can be executed. In rare situations when a timeout is reached, Search collects and returns matches that were already found, and the execution of the query is stopped.

That particular query you’re running is hitting this timeout, and as a result — you’re getting small variations in the number or results that were collected before the timeout was hit.

 The Search team is discussing this internally and looking into ways to make such situations less confusing, so your feedback is more than welcome!

 The reason why the two numbers provided under “We’ve found X repository results” and “Repositories” are different is caching. The number under “Repositories” is cached so it doesn’t change with page refreshes, as does the “We’ve found X repository results” number. That’s definitely confusing, and I’ll open an internal issue to get that fixed. 

Hope this explains things! If you have any other questions — please let us know.

Cheers

What a phenomenal answer!

GitHub Support not only responded in days time but addressed each question in a way that did not pass it off with a canned auto-reply or diminish it as “some silly question” wasting their customer support engineer’s precious time. The response even came with a swaddling of exclamations (“!”), welcoming the input as helpful feedback to be acted on for improved user experience.

I continued my conversation with the GitHub Support person with questions regarding a second-party data source. Likewise a timely, thorough, and – best of all – human response followed.

Thank you, GitHub!

Leave a comment