Granularity and Usage

Granularity and Usage

What is Granularity?


Granularity is a data analytics term that basically refers to how close or far away of a look at the data you have.
Here’s a great, brief breakdown of the term from Dremio.

In the context of this site, it means how closely we can analyze and filter NPB stats. There are three ‘layers’ – Season, Game, and Plate Appearance (PA). Season reports can break data down by year/season, game reports can filter to or filter out data down to individual games, and the plate appearance reports can do the same for in-game events such as bases-loaded scenarios.

For example, let’s compare the below career stats for Fukudome, Kosuke from the Season/Series Granularity – Player Lookup report (left) versus the Game Granularity – Player Lookup report (right).

Season Granularity
Game Granularity
NPB Season Granularity
NPB Game Granularity

If you look at Fukudome’s stats for 2022 you’ll see that the number of games he appeared in, his plate appearances, and all other stats match. What then is the difference? The difference lies in how these numbers can be broken out. Look at the 2019 season for Fukudome on the left. We can break out the 2019 season down into his regular season stats, and then his stats in the first two stages of the playoffs (he did not make the Japan Series that year). This is what is meant by Season Granularity – stats from this ‘layer’ of reports can be broken down by year, and by type of series (regular season or playoffs). They can go no closer. Compare to his same stats on the right from the Game Granularity version. Here I am showing otherwise the same thing, but I have expanded his 2019 stats to show by month, and even by day for the month of March. In the Game Granularity reports, you can break out stats down to the level of an individual game.

So why then even bother with the Season Granularity reports?

The answer depends on what you’re trying to find. Notice in the Game Granularity report that Plate Appearances, OBP, OPS, and some other numbers are missing before 2016? That is because I have not been able to gather those stats for each and every game throughout NPB history. I can only determine those numbers from 2016 on (which is the year that NPB started posting a lot more comprehensive stats on an individual game basis). So that’s the first reason why you might want to use Season Granularity – the stats you’re interested in may only be available at that layer of detail. A second reason you may consider sticking with Season Granularity is that you just don’t need that level of detail. If you just want to see career stats, or particular years, there’s no reason to look deeper. Indeed, the Season Granularity reports are much quicker to work with as there is less number crunching going on in the background.

Why bother with the more detailed reports?

One is probably obvious just from the screenshot above – maybe you’re interested to see how a player does month to month or day to day. But the options are actually much more expansive when you start looking at the data on a game to game basis. Take a look at just one section of the ways you can parse the data by in the Game Granularity Team Lookup report:

NPB Stats
Filter Breakdown

Whereas with Season Granularity your filtering options are limited to years and game types (regular season or playoff stages), you can parse the data down by a whole lot more variables in the Game Granularity reports. Home or away, Wins or losses, specific scores or numbers of hits or errors, and even specific stadiums or types of stadiums. With just a few clicks you can see how your favorite players or teams performed at a specific stadium or stadiums located in specific regions or lengths. Majorly, since we’re looking at individual games, we can also compare to particular opponents. In just a few clicks you can see how many home runs your favorite team has hit against everyone else in Japan:

NPB Stats by Opposing Team
NPB Stats by Opposing Team


Also, remember when I said this is just one section of the available filtering you can do? It’s true, you can break out or filter data by opponent, by days of the week, before or after the all star break, and so much more.

So really, which level of granularity you use is up to you. But if you want to get real into the weeds then experiment with the Game Granularity or even Plate Appearance Granularity reports. As the name there implies, in the Plate Appearance Granularity reports you can even go within a game and show only stats in particular situations. How does your favorite team or player do in bases loaded situations? It’s just a few clicks away in the Plate Appearance Granularity reports (though unfortunately you can only go back to the year 2016 in the Plate Appearance Granularity reports).

Caveats

One big thing to keep in mind is the limitations that may be present at the deeper levels of granularity. Recall that prior to 2016 we cannot determine the number of plate appearances or runs a team or player had in the Game Granularity reports. From 2016 on though this is a possibility, and as a result the totals you see listed on your screen may not be showing you what you think they’re showing you depending on how you’re viewing the data. For example, in the above screenshot it is accurate that the Hanshin Tigers have hit 1,606 home runs total against the Swallows (as of September 24th, 2022) – but the plate appearances, OBP, OPS, runs, and a few other stats are NOT accurate. The totals you see above for those are the totals calculated based on the available data we have. Confusing? Don’t worry, it becomes much more obvious when you expand to show the breakdown by year:

Expanded Stats
Expanded Stats

Now it is more apparent that the number of plate appearances are only being counted for the years where that data is available. In the above, the data is still sorting the years based on the number of home runs hit by the Hanshin Tigers against the Yakult Swallows, so the years appear out of sequence (as 1986 is the year they hit the most home runs against the Swallows, that year appears first). Still, it is easy to see for what years we are showing plate appearances for and for what years we are not. The plate appearances shown in total for the Yakult Swallows of 6,565 is actually the total number from 2016 to September 24, 2022 (the day the above report was looked at). Of course, if you know baseball a bit it should be very obvious that the number of plate appearances is not accurate as the number of at bats is much greater. Still, if you’re not looking closely then it’s not as obvious that the OBP and OPS are not actually showing the true all-time totals of the Tigers vs Swallows, but rather what they are from 2016 on.

In short…

while the deeper levels of granularity allow for much more flexibility and can give you some unique insights into your favorite players or teams, you need to exercise caution and make sure you’re not interpreting something wrong. A good date to keep in mind always is 2016 – from that year on just about every stat you can view at the Season Granularity is also available at the Game Granularity level, so you’re not likely to run into an issue like the above.

Sorted NPB Stats
Sorted NPB Stats

When sorting by years descending you can see that there is a large cutoff in the available stats from 2015 down.





And that’s all there is to it! Please reach out if anything is unclear and I’m more than happy to help out!