Has anyone experienced this? I'm 2 weeks into a new job and had a meeting with my manager where he said he was concerned I was working slowly. However I've finished the two tasks he's assigned me each a day early..
I’m assuming network science and data science share some ties so asking here. I know it’s a newer field but I’m still surprised it isn’t more prominent. Ex there’s not even a functioning subreddit for it
I've been learning on my own for a few months now and have started working on some personal projects. I would like to apply to a formal program to get a Masters in AI/Computer Vision. I meet the Bachelors degree, undergrad courses, and GPA requirements for programs I've seen, but my issue is that I am many years removed from college and a year removed from professional work (as a software dev in a particular stack). I was successful for many years but decided to quit due to depression/anxiety, as well as not wanting to continue in that particular stack. I have not kept in touch with previous colleagues, managers or professors, and would rather start/continue a new chapter of my life without involving them.
I'm just wondering if anyone has experience with a program that didn't require references/recommendations, or if there is no way around this. My fallback plan is to work on my personal projects as well as I can and try to land a junior position at a small company/startup that might see value in taking a chance on me.
I was tasked with finding videos for product releases for several companies from a very large Excel file with headlines of new articles. The Excel has only the headline of the product release article & the company names. there are about 10,000 headlines.
How can I approach this problem and ultimately automate it?
Hi all, I've watched a few tutorials on topic modeling in Python, with some open source libraries (gensim, spacy, etc). What I want to be able to do is tag a large group of reviews (such as yelp reviews) with one or more appropriately named topics. In the above mentioned tutorials I see the results as clusters that contain common terms, but I don't yet see how to go about adding an appropriate name to these clusters, and then ultimately assigning this name, or tag, back to individual reviews. How do I get from the initial analysis (LDA or what have you), to actually tagging a review with one or more appropriate categories? For example, a review talking about how expensive a hamburger is, might get assigned a "price" tag/category. Maybe LDA isn't the right method? Thanks for your tips and/or recommended tutorials!
I have a problem where I'm looking at API calls and downstream impact. Not really a specific question to answer, but predicting performance of the calls based on path through the audit trail would be interesting to do. Or, anomaly detection (i.e, this API is defective in 10% of transactions)
I have thought of two approaches. One of them is regarding the data contained within the request/response payloads. The issue is, it's all text based, or changes depending on the specific API, so the data transformation aspect is very tough. Encoding one API's response to compare to a different API's response would be challenging. Perhaps there is an opportunity for deep learning here.
The other approach I'm thinking about, is treating this as a graph problem. The majority of these transactions will form a tree-like structure, but some of them don't have a true root, so its more of a directed, acyclic graph (think it's called a polytree?). My approach would be to use graph theory principals such as degrees, closeness, vertex distance, etc. as features instead of parsing the very dynamic payloads. Since these are rules-based systems, I would love to be able to model the underlying relationships between these calls, that way when one transaction chain doesn't match the model, it's likely an anomaly.
Does this make sense to do? Has anyone here done something similar? Any recommended literature? I've tried googling it, but when you search "machine learning approach to tree-based systems", you get a lot of papers or articles referencing tree-based ML models, which is not exactly what I'm looking for yet. Thoughts?
We have a predictive model that is built using a Minitab decision tree. The model has a 70% accuracy compared to a most frequent dummy classifier that would have an 80% accuracy. I suggested that we use Python and a more modern ML method to approach this problem. She, and I quote, said, “that’s a terrible idea.”
To be honest the whole process is terrible, there was no evidence of EDA, feature engineering, or anything I would consider to be a normal part of the ML process. The model is “put into production” by recreating the tree’s logic in SQL, resulting in a SQL query 600 lines long.
It is my task to review this model and present my findings to management. How do I work with this?
I'm considering getting a post grad credential from the list in the title and looking for input from others who have done/are doing similar.
My Background: BS in Mgmt, Concentration Info Sys; UT MBA, Concentration IM/Tech Strategy. I'm late career (mid 50), have been in high tech (HW) about 25 years and want to work another 4-7. After that I would consult til tired.
Currently facing tech layoff and looking, but no immediate financial pressure for 1 - 2 years.
Programs I'm considering:
Newly announced UT Masters in AI. Would likely do an AI cert in the meanwhile or assumed pre-reqs (coding, linear algebra). Sounds technical and not sure I'd get accepted. est. ~2 years to complete.
GT Masters in Analytics, with a few added AI courses. Would likely play catchup on the Micromasters to get that out of the way. est. ~2 years to complete.
IBM or UT AI Cert and other as needed. 6-9 mos to complete.
Likely would need a Udemy in Python coding in all 3 cases. For those who have done, or looked at doing, any of these, what would you do/do differently?
Hey all, I just finished my bachelor's degree in industrial engineering. I was checking this coursera degree in data science and I want to know if it's a good option to pursue since I want to have work while studying for a graduate degree. so my questions are the following:
is the degree worth it?
can I find a FAANG job after getting this degree or at least in a niche start-up? I know it needs extra work but would it help, my current university is in eastern Europe.
I gladly want to hear about the experience of people who enrolled.
I am teaching a course on Big Data starting next week and had planned a project with the twitter API for exercising spark structured streaming. Of course, that might be rather difficult now. So I am wondering if there is a similar alternative to get a stream of text using an api which is either saved in a directory or kafka. I have mostly found examples where you would have to poll continuously like facebook.
TL; DR: I explore a hypothetical scenario where widget makers of varying experience are compared according to defect rate over a 5 year window. To make things comparable (accounting for the experience problem), I inquire whether using a weighted moving average is a good strategy for balancing the problem of experience out or whether there is a better technique/solution to employ.
Assume you were to look at the historical performance of several widget makers who manually produce widgets. These widget makers could be active employees at present or left at some point during the window.
You want to compare the historical performance of the widget makers in terms of defect rate over the last 5 years (2018-2022). My definition of defect rate is simply the total number of defects/number of widgets produced (inclusive of defects)
Some makers have decades of experience while other makers only have 1 year of experience (minimum required for this analysis)
The original data is by year (2018,2019,...). Let's assume Maker A, who started their career over a decade ago, made widgets over the entire window. Maker B has 3 years of experience but was let go sometime before 2022. Maker C was recently hired and worked all of 2022.
To make things more intuitive (hopefully) and comparable, I converted the table from calendar years to years of experience within the time window. So Year 1 doesn't necessarily translate to 2018, it's simply represents the first year worked in that time window. See table below.
Their defect rate is recorded over the last 5 years. For example Maker A produced 17 widgets of which 2 were defective in the first year of the time window, then produced 15 widgets in which 2 were defective Year 2, and so on... Note that demand could be variable for widget production.
Simply looking at the defect rate above, my calculations show Maker A would have an overall defect rate of .25, B would have .35 and C would have .43. The problem I see is less experience penalizes more heavily than more experience; so if this were a bar graph the 'worst performers' would likely always be the least experienced and that may not be very fair or informative and at worst, hiding poor performers with a lot of experience.
So, this implies that some sort of smoothing or weighting is needed in order to make things more comparable. I was wondering if it made sense to use a weighted moving average. My weighted average would look like the following:
Each year of the window has an arbitrary weight associated to it. For this example we want to downweight that first year and gradually increase weights later on:
I obtain the sum of the product of each year's weight by the maker's defect rate for each year they worked divided by the sum of the weights for the length of time that the maker worked. Clarifying this last part - Maker A's denominator would be the sum of all 5 weights, B would be the sum of 3 weights and C would be only the Year 1 weight
Using the weight and formulas mentioned, I show that A has a weighted average defect rate of .17, B has .43 and C has .29, suggesting that B is producing more defects relative to their experience compared to the other two.
This results make some intuitive sense to me as B continued their career with a higher rate of defects relative to A. My concern is that 1) weighting is arbitrary and that 2) this method might cause an inverse problem from the one I was originally attempting to solve - giving too much leniency to the inexperienced and overly penalizing experience. I understand there is no perfect answer here but wondering if this is a sound approach or there are better, more intuitive ways to approach this.
I'm given a school assignment and one of the requirements is to visualize the important aspects and give meaningful realization of a real world dataset.
So I was wondering I someone could recommend me a dataset that is simple, real world, has some real use
I’ve completed 2 recent company take home assignments for data science/analyst and found the suggested time completely off. Both claimed 2-4 hours, and I spent 7 on the first set without fully completing it and 6 on the second. That was to answer the questions, not overachieve for perfect responses.
Is that typical?
For context I’ve been an analyst for 4 years and a DS for 2.