Analytics for Small Business: Applied Analytics for Food and Beverage (F&B) Business Acceleration

Opening a new Food and Beverage (F&B) business in NYC is attractive, yet challenging. With an estimated 8.49 million people who frequent the 40,000+ F&B establishments in the city, NYC is a highly competitive market for any new business owner to penetrate, even when one does not consider the high time, energy, and monetary costs that it takes to open shop. Previously, business owners had to go through a complex process in order to open a new restaurant, which was often time-consuming and expensive – the process for getting the necessary permits in order to serve customers was particularly frustrating, as the inspection schedules for the Fire Department, Health Department, and Department of Buildings rarely aligned. The DOB conducts building inspections at the lot level, so we were able to merge DOB violations with PLUTO on the BBL to get more detailed building information such as building class, zip code, built year, lot area, etc. The original DOB dataset documents many similar building violations with a small number of observations, so we aggregated the 41 different DOB violations that a business might be written up for into fourteen categories to avoid sparse data and generate better statistical results.

In this project, we focused on two major metrics: the number of violations and the violation types, and developed key indicators from these two metrics at various geographic units. The key indicators are as follows:

  • The total number of DOB violations by violation type per zip code area: We aggregated all DOB violations by BBL to generate a dataset of DOB violations at the zip code level. This dataset contains all postal code area as observations and the count of each violation type as features. This dataset enables us to get statistical insights of building violations by zip code area and is not limited to restaurants as it includes all buildings in New York City.
  • The total number of DOB violations by violation type per restaurant: We merged all DOB violations with restaurant health inspection data to have a subset of all restaurant DOB violations. We then grouped violation data by business name (DBA) and violation type to have a breakdown summary of all DOB violation types per restaurant. This enables us to sort all current restaurants by the total number of building violations, and to query building violation types that each restaurant is potentially at risk for.
  • Over prevalence of certain violations compared to all other zip codes: We computed the average share of each violation type out of all of the violations that can possibly be received by restaurants in each zip code. For each cuisine type in a zip code, we then computed the difference between its share, and its zip code share. We call this the over-prevalence of a certain violation.
  • Following interactive maps visualize the data on health grade and building violations by all restaurants in Manhattan. The results indicate that health quality and building operation quality vary by neighborhoods. Therefore, a more location-based and neighborhood-specific measurement should be considered to better identify issues and evaluate the business.

    diagram

    Diagram above illustrates the workflow of data merging to get DOB violation related to restaurants on tax lot level.

    Findings: As we aggregated and merged the datasets, visualization of the final DOB dataset revealed that the building violation seemed correlated to location. We observed that there are clusters of DOB violations in the city, regarding both the number of total violations and the major violation type. Taking Manhattan as an example, it shows that Lower Manhattan and Diamond District have the largest number of building violations, and this geographical finding also applies to all restaurants regarding their total number of DOB violations. (Figure 1) Further, since we aggregated data by building violation type and business, we were able to understand the building violations in more depth through both numbers of violations and type of violations. For example, the majority of building violations in Greenwich Village is classified as ‘Landmark’, while the majority of DOB violations in East Village is ‘Boiler’. (Figure 2)  We suspect that there are deep spatial factors for building violations, such as whether or not a building is in a historical district, the time period in which it was built/renovated, and general zoning and land use permits. Such location-based knowledge on building violations is valuable and crucial in this ‘Violation Dashboard,’ as it allows the new business owner can be better informed about previous violations and their potential risks based on a specific address. This knowledge is also useful for city agencies, as it reveals some common violations by neighborhood. For instance, when DOB realizes that there is a hotspot in Greenwich Village on violations related to landmarks, certain violation-type measures can be executed at the neighborhood level. On a city scale, visualization on restaurant building violation frequency by zip code helps agencies such as DOB, DCP, and NYCBA to identify this issue on the zip code level.

    This is a team project with Erwan LeCun, Arno Amabile, and Carlyle Davis in NYU Center for Urban Science and Progress.