Calculating and Comparing Product Production Time for VS One-Health Products

June 11, 2019

This Python notebook is a workable and scalable example on how to make a notebook that cleans and analyzes data in a transparent way.

This notebook serves as an abbreviated project and due to formatting errors.

Please utilize the follwoing badge to view the notebook in binder.

We have an excel workbook that details the progress of all One-Health/Veterinary Services products to include posters, brochures and newsletters. Taking a look at the initial structure of this spreadsheet, we have missing dates or progress for certain products. This may be due to certain products not needing certain levels of approval. It may be advantageous for us to split the data by product and then look at the time related to each product.
For my sanity, I will be getting rid of the incomplete records.

Analysis

First let’s go ahead and load the .csv into python

import pandas as pd # import the module and alias it as pd

timedateVOH_data = pd.read_csv('C:\Users\waugh\Dropbox\Documents\One_Health\VS_Product_Analysis_dates.csv', parse_dates=True)
timedateVOH_data.head()

	Type	Draft Review	SME Content Review	PMD Content Review	VID Design Work/Edits	Submitted to CRC	CRC Review Complete
0	Brochure	NaN	NaN	5/17/2016	5/23/2016	6/1/2016	6/7/2016
1	Brochure	NaN	NaN	5/17/2016	5/23/2016	6/3/2016	6/13/2016
2	Brochure	NaN	NaN	5/17/2016	10/20/2016	11/1/2016	12/20/2016
3	Brochure	NaN	NaN	7/5/2016	7/5/2016	8/19/2016	9/2/2016
4	Brochure	5/9/2017	5/16/2017	6/6/2017	6/23/2017	8/4/2017	10/10/2017

timedateVOH_data.describe()

	Type	Draft Review	SME Content Review	PMD Content Review	VID Design Work/Edits	Submitted to CRC	CRC Review Complete
count	81	52	48	62	70	75	69
unique	3	36	29	31	33	33	21
top	Newsletter	6/1/2017	4/20/2016	7/11/2016	4/20/2016	8/29/2017	8/15/2016
freq	41	4	5	8	6	8	9

We see that the dates we have on the csv file did not read into python as a datetime object. Without this, we’re unable to determine the elapsed time for each column.

Lets fix for the CRC columns at least so since the product wouldn’t exist without CRC approval.

df_clean = timedateVOH_data[pd.notnull(timedateVOH_data['Submitted to CRC '])]
df_clean_1 = df_clean[pd.notnull(df_clean['CRC Review Complete'])]
df_clean_1
df_clean_2 = df_clean_1.copy()

df_clean_2['CRC Review Complete'] =  pd.to_datetime(df_clean_2['CRC Review Complete'], format = '%m/%d/%Y')
df_clean_2['Submitted to CRC '] =  pd.to_datetime(df_clean_2['Submitted to CRC '], format = '%m/%d/%Y')

df_clean_2['CRCtime'] = df_clean_2['CRC Review Complete'] - df_clean_2['Submitted to CRC ']

df_clean_2.head()

	Type	Draft Review	SME Content Review	PMD Content Review	VID Design Work/Edits	Submitted to CRC	CRC Review Complete	CRCtime
0	Brochure	NaN	NaN	5/17/2016	5/23/2016	2016-06-01	2016-06-07	6 days
1	Brochure	NaN	NaN	5/17/2016	5/23/2016	2016-06-03	2016-06-13	10 days
2	Brochure	NaN	NaN	5/17/2016	10/20/2016	2016-11-01	2016-12-20	49 days
3	Brochure	NaN	NaN	7/5/2016	7/5/2016	2016-08-19	2016-09-02	14 days
4	Brochure	5/9/2017	5/16/2017	6/6/2017	6/23/2017	2017-08-04	2017-10-10	67 days

Now that we finished the column, we now have the calculated the elapsed time! We see, however, that certain products don’t have full processing times through all columns. This probably means that certain products may not need certain approvals. So from now on, let’s go ahead and split the created data.frame by the two main One-Health Products (Newsletters and Brochures).

#Create the dataframe subset
df_clean_2_News = df_clean_2.loc[df_clean_2['Type'] == 'Newsletter']
df_clean_2_News.head()

#Erase rows with NaN values
df_clean_3_News = df_clean_2_News[pd.notnull(df_clean_2_News['VID Design Work/Edits'])]
df_clean_4_News = df_clean_3_News[pd.notnull(df_clean_3_News['PMD Content Review'])]
df_clean_5_News = df_clean_4_News[pd.notnull(df_clean_4_News['SME Content Review'])]
df_clean_6_News = df_clean_5_News[pd.notnull(df_clean_5_News['Draft Review'])]

#Cenvert strings to datetime objects
df_clean_6_News['VID Design Work/Edits'] =  pd.to_datetime(df_clean_6_News['VID Design Work/Edits'], format = '%m/%d/%Y')
df_clean_6_News['PMD Content Review'] =  pd.to_datetime(df_clean_6_News['PMD Content Review'], format = '%m/%d/%Y')
df_clean_6_News['SME Content Review'] =  pd.to_datetime(df_clean_6_News['SME Content Review'], format = '%m/%d/%Y')
df_clean_6_News['Draft Review'] =  pd.to_datetime(df_clean_6_News['Draft Review'], format = '%m/%d/%Y')

df_clean_6_News.head()

#Create the elapsed time columns
df_clean_6_News['VID_Edittime'] = df_clean_6_News['Submitted to CRC '] - df_clean_6_News['VID Design Work/Edits']
df_clean_6_News['PMD_Reviewtime'] = df_clean_6_News['VID Design Work/Edits'] - df_clean_6_News['PMD Content Review']
df_clean_6_News['SME_Reviewtime'] = df_clean_6_News['PMD Content Review'] - df_clean_6_News['SME Content Review']
df_clean_6_News['Draft_Reviewtime'] = df_clean_6_News['SME Content Review'] - df_clean_6_News['Draft Review']

df_clean_7_News = df_clean_6_News.drop(df_clean_6_News[df_clean_6_News.PMD_Reviewtime == '-10 days'].index)
df_clean_8_News = df_clean_7_News.drop(df_clean_7_News[df_clean_7_News.PMD_Reviewtime == '-6 days'].index)
#df_clean_6_News['PMD_Reviewtime'].total_seconds()

#Good to go!
df_clean_8_News

	Type	Draft Review	SME Content Review	PMD Content Review	VID Design Work/Edits	Submitted to CRC	CRC Review Complete	CRCtime	VID_Edittime	PMD_Reviewtime	SME_Reviewtime	Draft_Reviewtime
40	Newsletter	2016-04-13	2016-04-20	2016-04-20	2016-04-20	2016-04-28	2016-05-09	11 days	8 days	0 days	0 days	7 days
41	Newsletter	2016-04-15	2016-04-20	2016-04-20	2016-04-20	2016-04-28	2016-05-09	11 days	8 days	0 days	0 days	5 days
42	Newsletter	2016-04-15	2016-04-20	2016-04-20	2016-04-20	2016-04-28	2016-05-09	11 days	8 days	0 days	0 days	5 days
43	Newsletter	2016-04-15	2016-04-20	2016-04-20	2016-04-20	2016-04-28	2016-05-09	11 days	8 days	0 days	0 days	5 days
48	Newsletter	2016-05-17	2016-05-17	2016-05-17	2016-05-17	2016-06-03	2016-06-07	4 days	17 days	0 days	0 days	0 days
54	Newsletter	2016-07-01	2016-07-05	2016-07-11	2016-07-11	2016-07-22	2016-08-15	24 days	11 days	0 days	6 days	4 days
56	Newsletter	2016-06-27	2016-06-27	2016-07-11	2016-07-11	2016-07-22	2016-08-15	24 days	11 days	0 days	14 days	0 days
58	Newsletter	2016-11-17	2016-11-23	2016-11-28	2016-11-29	2017-01-27	2017-03-30	62 days	59 days	1 days	5 days	6 days
59	Newsletter	2016-11-08	2016-11-23	2016-11-28	2016-11-29	2017-01-27	2017-03-30	62 days	59 days	1 days	5 days	15 days
61	Newsletter	2017-02-25	2017-03-15	2017-03-15	2017-03-17	2017-03-17	2017-05-08	52 days	0 days	2 days	0 days	18 days
62	Newsletter	2017-02-10	2017-03-13	2017-03-15	2017-03-20	2017-04-04	2017-06-01	58 days	15 days	5 days	2 days	31 days
63	Newsletter	2017-02-10	2017-03-10	2017-03-13	2017-03-15	2017-04-04	2017-06-01	58 days	20 days	2 days	3 days	28 days
64	Newsletter	2017-03-01	2017-03-16	2017-03-16	2017-03-20	2017-04-04	2017-06-01	58 days	15 days	4 days	0 days	15 days
66	Newsletter	2017-06-01	2017-06-23	2017-06-29	2017-07-25	2017-08-29	2017-08-29	0 days	35 days	26 days	6 days	22 days
68	Newsletter	2017-06-01	2017-06-23	2017-06-28	2017-07-25	2017-08-29	2017-08-29	0 days	35 days	27 days	5 days	22 days
70	Newsletter	2016-07-01	2016-07-05	2016-07-11	2017-08-28	2017-08-29	2017-08-29	0 days	1 days	413 days	6 days	4 days
71	Newsletter	2016-06-07	2016-06-30	2016-07-11	2017-08-28	2017-08-29	2017-08-29	0 days	1 days	413 days	11 days	23 days
72	Newsletter	2016-06-27	2016-06-27	2016-07-11	2017-08-28	2017-08-29	2017-08-29	0 days	1 days	413 days	14 days	0 days
77	Newsletter	2018-08-15	2018-08-24	2018-08-24	2018-08-29	2018-08-30	2018-09-04	5 days	1 days	5 days	0 days	9 days
78	Newsletter	2018-08-19	2018-08-30	2018-08-29	2018-08-29	2018-08-30	2018-09-04	5 days	1 days	0 days	-1 days	11 days
79	Newsletter	2018-08-15	2018-08-24	2018-08-24	2018-08-29	2018-08-30	2018-09-04	5 days	1 days	5 days	0 days	9 days
80	Newsletter	2018-08-16	2018-08-23	2018-08-24	2018-08-29	2018-08-30	2018-09-04	5 days	1 days	5 days	1 days	7 days

Now lets take a look at the summary data…..

df_clean_8_News.describe()

	CRCtime	VID_Edittime	PMD_Reviewtime	SME_Reviewtime	Draft_Reviewtime
count	22	22	22	22	22
mean	21 days 04:21:49.090909	14 days 08:43:38.181818	60 days 02:10:54.545454	3 days 12:00:00	11 days 04:21:49.090909
std	24 days 06:22:33.479019	17 days 14:47:37.238031	143 days 17:24:49.094471	4 days 14:02:38.929146	9 days 05:32:01.438379
min	0 days 00:00:00	0 days 00:00:00	0 days 00:00:00	-1 days +00:00:00	0 days 00:00:00
25%	4 days 06:00:00	1 days 00:00:00	0 days 00:00:00	0 days 00:00:00	5 days 00:00:00
50%	11 days 00:00:00	8 days 00:00:00	2 days 00:00:00	1 days 12:00:00	8 days 00:00:00
75%	45 days 00:00:00	16 days 12:00:00	5 days 00:00:00	5 days 18:00:00	17 days 06:00:00
max	62 days 00:00:00	59 days 00:00:00	413 days 00:00:00	14 days 00:00:00	31 days 00:00:00

print(df_clean_8_News.CRCtime.median())
print(df_clean_8_News.VID_Edittime.median())
print(df_clean_8_News.PMD_Reviewtime.median())
print(df_clean_8_News.SME_Reviewtime.median())
print(df_clean_8_News.Draft_Reviewtime.median())

days 00:00:00
days 00:00:00
days 00:00:00
days 12:00:00
days 00:00:00

Yikes, we have some weird data that led to some negative elapsed times. We could do one of three things:

Delete the records (rows) with the offending dates.
Delete the PMD_Reviewtime column and analyze the other criteria.
Leave the data unedited and ask MAJ Watkins about the dates. Let’s go ahead and do the other products.

This type of discrepancy also demonstrates the lack of an effective way of recording data. In order to effectively produce an analysis and report of any substance, our data must be reliable.

A recommendation for the future is that an accurate working document MUST be established in order to produce reliable production time data and reports.

For this analysis, I’ve decided to delete the offending products with a negative elapsed time.

#Create the dataframe subset
df_clean_2_1 = df_clean_2.copy()
df_clean_2_bro = df_clean_2_1.loc[df_clean_2['Type'] == 'Brochure']
df_clean_2_bro.head()
pd.options.mode.chained_assignment = None

#Erase rows with NaN values
df_clean_3_bro = df_clean_2_bro[pd.notnull(df_clean_2_bro['VID Design Work/Edits'])]
df_clean_4_bro = df_clean_3_bro[pd.notnull(df_clean_3_bro['PMD Content Review'])]
df_clean_5_bro = df_clean_4_bro[pd.notnull(df_clean_4_bro['SME Content Review'])]
df_clean_6_bro = df_clean_5_bro[pd.notnull(df_clean_5_bro['Draft Review'])]

#Cenvert strings to datetime objects
df_clean_6_bro['VID Design Work/Edits'] =  pd.to_datetime(df_clean_6_bro['VID Design Work/Edits'], format = '%m/%d/%Y')
df_clean_6_bro['PMD Content Review'] =  pd.to_datetime(df_clean_6_bro['PMD Content Review'], format = '%m/%d/%Y')
df_clean_6_bro['SME Content Review'] =  pd.to_datetime(df_clean_6_bro['SME Content Review'], format = '%m/%d/%Y')
df_clean_6_bro['Draft Review'] =  pd.to_datetime(df_clean_6_bro['Draft Review'], format = '%m/%d/%Y')

df_clean_6_News.head()

#Create the elapsed time columns
df_clean_6_bro['VID_Edittime'] = df_clean_6_bro['Submitted to CRC '] - df_clean_6_bro['VID Design Work/Edits']
df_clean_6_bro['PMD_Reviewtime'] = df_clean_6_bro['VID Design Work/Edits'] - df_clean_6_bro['PMD Content Review']
df_clean_6_bro['SME_Reviewtime'] = df_clean_6_bro['PMD Content Review'] - df_clean_6_bro['SME Content Review']
df_clean_6_bro['Draft_Reviewtime'] = df_clean_6_bro['SME Content Review'] - df_clean_6_bro['Draft Review']

#Good to go!
df_clean_6_bro

	Type	Draft Review	SME Content Review	PMD Content Review	VID Design Work/Edits	Submitted to CRC	CRC Review Complete	CRCtime	VID_Edittime	PMD_Reviewtime	SME_Reviewtime	Draft_Reviewtime
4	Brochure	2017-05-09	2017-05-16	2017-06-06	2017-06-23	2017-08-04	2017-10-10	67 days	42 days	17 days	21 days	7 days
7	Brochure	2016-11-23	2016-12-01	2016-12-02	2016-12-06	2017-01-30	2017-03-30	59 days	55 days	4 days	1 days	8 days
9	Brochure	2016-12-07	2017-02-09	2017-02-10	2017-05-17	2017-06-06	2017-06-30	24 days	20 days	96 days	1 days	64 days
10	Brochure	2016-12-09	2017-01-26	2017-01-27	2017-01-30	2017-02-02	2017-03-30	56 days	3 days	3 days	1 days	48 days
11	Brochure	2016-09-06	2016-09-09	2016-09-09	2016-09-10	2016-11-29	2017-01-02	34 days	80 days	1 days	0 days	3 days
13	Brochure	2017-03-01	2017-03-10	2017-03-13	2017-04-10	2017-06-05	2017-06-27	22 days	56 days	28 days	3 days	9 days

df_clean_6_bro.describe()

	CRCtime	VID_Edittime	PMD_Reviewtime	SME_Reviewtime	Draft_Reviewtime
count	6	6	6	6	6
mean	43 days 16:00:00	42 days 16:00:00	24 days 20:00:00	4 days 12:00:00	23 days 04:00:00
std	19 days 09:32:32.152061	27 days 14:37:12.911653	36 days 08:56:33.395270	8 days 03:25:10.375190	26 days 00:15:41.341229
min	22 days 00:00:00	3 days 00:00:00	1 days 00:00:00	0 days 00:00:00	3 days 00:00:00
25%	26 days 12:00:00	25 days 12:00:00	3 days 06:00:00	1 days 00:00:00	7 days 06:00:00
50%	45 days 00:00:00	48 days 12:00:00	10 days 12:00:00	1 days 00:00:00	8 days 12:00:00
75%	58 days 06:00:00	55 days 18:00:00	25 days 06:00:00	2 days 12:00:00	38 days 06:00:00
max	67 days 00:00:00	80 days 00:00:00	96 days 00:00:00	21 days 00:00:00	64 days 00:00:00

print(df_clean_6_bro.CRCtime.median())
print(df_clean_6_bro.VID_Edittime.median())
print(df_clean_6_bro.PMD_Reviewtime.median())
print(df_clean_6_bro.SME_Reviewtime.median())
print(df_clean_6_bro.Draft_Reviewtime.median())

days 00:00:00
days 12:00:00
days 12:00:00
days 00:00:00
days 12:00:00

Plots and Figures

Now lets go ahead and produce some figures to visualize our data and demonstrate which production stage takes the most and least amount of time.
First let’s take a look at the brochure products…

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('CRC', 'VID', 'PMD Review', 'SME Review', 'Draft Review')
y_pos = np.arange(len(objects))
performance = [df_clean_6_bro.CRCtime.dt.days.median(),df_clean_6_bro.VID_Edittime.dt.days.median(),df_clean_6_bro.PMD_Reviewtime.dt.days.median(),df_clean_6_bro.SME_Reviewtime.dt.days.median(),df_clean_6_bro.Draft_Reviewtime.dt.days.median()]

plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Days (Median)')
plt.title('VS (One-Health) Product Production Time (Brochure)')
for i, v in enumerate(performance):
    plt.text(i-.15, v+1 , str(v), color='blue', fontweight='bold', va='center')

plt.show()

png

Now let’s take a look at the Newsletters…

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('CRC', 'VID', 'PMD Review', 'SME Review', 'Draft Review')
y_pos = np.arange(len(objects))
performance = [df_clean_8_News.CRCtime.dt.days.median(),df_clean_8_News.VID_Edittime.dt.days.median(),df_clean_8_News.PMD_Reviewtime.dt.days.median(),df_clean_8_News.SME_Reviewtime.dt.days.median(),df_clean_8_News.Draft_Reviewtime.dt.days.median()]

plt.bar(y_pos, performance, align='center', alpha=0.5, color=['green','green','green','green','green'])
plt.xticks(y_pos, objects)
plt.ylabel('Days (Median)')
plt.title('VS (One-Health) Product Production Time (Newsletter)')
for i, v in enumerate(performance):
    plt.text(i-.15, v+.18 , str(v), color='blue', fontweight='bold', va='center')

plt.show()

png

Discussion

So what can we say about these graphs and analyses?

With a first look, the data analysis and subsequent graphs, doesn’t really tell us much about spent time due to the large discrepancies. We can see that VID processing and Review and the CRC Review process takes up the majority of time of One-Health Product development, with a wait time of 48.5 and 45 days, respectively.

We see that comparing the brochures and newsletters, the newsletters take a shorter median amount of production time. The reasoning behind this may possibly be due to the higher frequency of newsletter production, increasing the potential efficiency of the newsletter production process.

With a potential brochure program utilizing First Year Graduate Veterinary Education (FYGVE) students as potential authors, a strong and defined review process is necessary to eliminate potential hang up in the review process.

This shows that our processing time is really dependent on the work and collaboration of other directorates regarding visual output and CRC review.

With the subject matter, draft and PMD review lasting roughly 75% less than VID and CRC time, recommendations should:

Include discussions, focus groups and meetings to potentially streamline the CRC process; potentially decreasing the time at certain approval levels
Require submitters of VS One-Health products to have a clear visual outline of what their product should look like before VID work order submission
- A clear and defined endstate by the customer will assist the VID designer in obtaining a product that the VID directorate and the customer will agree upon, potentially decreasing the production and processing time. Additionally, focus groups, surveys and interviews should be done within VID to discuss other potential bottlenecks during the production process.
- Additionally, with our aforementioned FYGVE brochure program, a dedicated person within the One-Health Division should be responsible for the overall setup and design of the brochures. With this, this would increase positive interaction with VID and decrease the VID production time potentially decreasing the overall production time, significantly.

Conclusion

In conclusion, my analysis shows that VID production and CRC approval take up more than 50 percent of the total production time in both One-health newsletters and brochures. I recommend the One-Health Division to open communication between VID and CRC to develop potential ideas to decrease processing and approval times.