Total Pageviews

Showing posts with label Hacker News. Show all posts
Showing posts with label Hacker News. Show all posts

Launch HN: Syndetic (YC W20) – Software for explaining datasets https://news.ycombinator.com/item?id=22406560

Launch HN: Syndetic (YC W20) – Software for explaining datasets Hi HN, We're Allison and Steve of Syndetic ( https://www.getsyndetic.com ). Syndetic is a web app that data providers use to explain their datasets to their customers. Think ReadMe but for datasets instead of APIs. Every exchange of data ultimately comes down to a person at one company explaining their data to a person at another. Data buyers need to understand what's in the dataset (what are the fields and what do they mean) as well as how valuable it can be to them (how complete is it? how relevant?). Data providers solve this problem today with a "data dictionary" which is a meta spreadsheet explaining a dataset. This gets shared alongside some sample data over email. These artifacts are constantly getting stale as the underlying data changes. Syndetic replaces this with software connected directly to the data that's being exchanged. We scan the data and automatically summarize it through statistics (e.g., cardinality), coverage rates, frequency counts, and sample sets. We do this continuously to monitor data quality over time. If a field gets removed from the file or goes from 1% null to 20% null we automatically alert the provider so they can take a look. For an example of what we produce but on an open dataset check out the results of the NYC 2015 Tree census at https://www.getsyndetic.com/publish/datasets/f1691c5d-56a9-4... . We met at SevenFifty, a tech startup connecting the three tiers of the beverage alcohol trade in the United States. SevenFifty integrates with the backend systems of 1,000+ beverage wholesalers to produce a complete dataset of what a restaurant can buy wholesale, at what price, in any zipcode in America. While the core business is a marketplace between buyers and sellers of alcohol, we built a side product providing data feeds back to beverage wholesalers about their own data. Syndetic grew out of the problems we experienced doing that. Allison kept a spreadsheet in dropbox of our data schema, which was very difficult to maintain, especially across a distributed team of data engineers and account managers. We pulled sample sets ad hoc, and ran stats over the samples to make sure the quality was good. We spent hours on the phone with our customers putting it all together to convey the meaning and the value of our data. We wondered why there was no software out there specifically built for data-as-a-service. We also have backgrounds in quantitative finance (D. E. Shaw, Tower Research, BlackRock), large purchasers of external data, where we've seen the other side of this problem. Data purchasers spend a lot of time up-front evaluating the quality of a dataset, but they often don’t monitor how the quality changes over time. They also have a hard time assessing the intersection of external datasets with data they already have. We're focusing on data providers first but expect to expand to purchasers down the road. Our tech stack is one monolithic repo split into the frontend web app and backend data scanning. The frontend is a rails app and the data scanning is written in rust (we forked the amazing library xsv). One quirk is that we want to run the scanning in the same region as our customers' data to keep bandwidth costs and transfer time down, so we're actually running across both GCP and AWS. If you're interested in this field you might enjoy reading the paper "Datasheets for datasets" ( https://arxiv.org/pdf/1803.09010.pdf ) which proposes a standardized method for documenting datasets modeled after the spec sheets that come with electronics. The authors propose that “for dataset creators, the primary objective is to encourage careful reflection on the process of creating, distributing, and maintaining a dataset, including any underlying assumptions, potential risks or harms, and implications of use.” We agree with them that as more and more data is sold, the chance of misunderstanding what’s in the data increases. We think we can help here by building qualitative questions into Syndetic alongside automation. We have lots of ideas of where we could go with this, like fancier type detection (e.g. is this a phone number), validations, visualizations, anomaly detection, stability scores, configurable sampling, and benchmarking. We'd love feedback and to hear about your challenges working with datasets! February 24, 2020 at 11:38PM

Show HN: Asynq – Simple, reliable, and efficient distributed task queue for Go https://news.ycombinator.com/item?id=22403097

Show HN: Asynq – Simple, reliable, and efficient distributed task queue for Go https://github.com/hibiken/asynq February 24, 2020 at 04:35PM

Show HN: Neo.mjs, the webworkers driven UI framework https://news.ycombinator.com/item?id=22402774

Show HN: Neo.mjs, the webworkers driven UI framework https://github.com/neomjs February 24, 2020 at 03:10PM

Show HN: City Filter – Travel to the Right Cities at the Right Time https://news.ycombinator.com/item?id=22404061

Show HN: City Filter – Travel to the Right Cities at the Right Time https://www.city-filter.com/ February 24, 2020 at 07:41PM

Show HN: Bubblemon, System load meter for human beings https://news.ycombinator.com/item?id=22403935

Show HN: Bubblemon, System load meter for human beings https://walles.github.io/bubblemon/ February 24, 2020 at 07:23PM

Show HN: An Online Personal Organizer https://news.ycombinator.com/item?id=22403109

Show HN: An Online Personal Organizer https://getyourganize.com/ February 24, 2020 at 04:39PM

Show HN: Java full-stack starter template for side projects https://news.ycombinator.com/item?id=22400791

Show HN: Java full-stack starter template for side projects https://turbovar.com/turbovar/index.jsp February 24, 2020 at 06:43AM

Show HN: LinkedIn for Chatbots https://news.ycombinator.com/item?id=22401444

Show HN: LinkedIn for Chatbots https://presbot.com/ February 24, 2020 at 09:07AM

Show HN: Damnshort – dotcoms for startups and side projects https://news.ycombinator.com/item?id=22398147

Show HN: Damnshort – dotcoms for startups and side projects https://damnshort.com February 23, 2020 at 11:02PM

Show HN: Netboot diskless Windows machines from a Linux server for LAN parties https://news.ycombinator.com/item?id=22399599

Show HN: Netboot diskless Windows machines from a Linux server for LAN parties https://github.com/kentonv/lanparty February 24, 2020 at 02:49AM

Show HN: Memoly, a subscription manager app built without code https://news.ycombinator.com/item?id=22396465

Show HN: Memoly, a subscription manager app built without code Hey HN community, I'm Sebastian, a Product Manager/Designer/Maker. I'd like for anyone with an idea to build and validate something quickly. I got into the #nocode thing earlier last year and wanted to see what's possible with the tools out there today. Over the last few months, I spent a lot of late nights and weekends on my side project. I'm very proud to present the result today. Meet Memoly (https://memoly.app). It's built entirely without code. To achieve my goal, I used this tech stack: Adalo, Zapier, Airtable, and Carrd. Memoly is available for iOS and Android. I came up with the idea to solve my own pain point. I have way too many subscriptions! When I forgot to cancel on time for a yearly plan, that renewal was costly. I wanted to create something that helps me to keep track of my spendings. Would love to hear your thoughts! Cheers, Sebastian February 23, 2020 at 05:33PM

Show HN: Self Managed Link Aggregator https://news.ycombinator.com/item?id=22399302

Show HN: Self Managed Link Aggregator https://github.com/twosdai/contentGrabber February 24, 2020 at 01:53AM

Show HN: JAI: A Lego-Style Deep Learning Library on PyTorch https://news.ycombinator.com/item?id=22395497

Show HN: JAI: A Lego-Style Deep Learning Library on PyTorch https://github.com/gengjia0214/jai February 23, 2020 at 12:09PM

Show HN: Search code in GitHub repos using regular expressions https://news.ycombinator.com/item?id=22396824

Show HN: Search code in GitHub repos using regular expressions https://grep.app February 23, 2020 at 07:23PM

Show HN: Leadership-Library.dev – The Leadership Library for Engineers https://news.ycombinator.com/item?id=22396227

Show HN: Leadership-Library.dev – The Leadership Library for Engineers https://leadership-library.dev February 23, 2020 at 04:21PM

Show HN: Proxyman – native HTTP/HTTPS requests observation and manipulation app https://news.ycombinator.com/item?id=22394708

Show HN: Proxyman – native HTTP/HTTPS requests observation and manipulation app https://proxyman.io/ February 23, 2020 at 08:03AM

Show HN: GitSpo – Monitoring and Analytics for Open-Source Projects https://news.ycombinator.com/item?id=22393286

Show HN: GitSpo – Monitoring and Analytics for Open-Source Projects http://gitspo.com/ February 23, 2020 at 02:29AM

Show HN: WeKeep – A spreadsheets-first accounting software https://news.ycombinator.com/item?id=22393086

Show HN: WeKeep – A spreadsheets-first accounting software https://www.wekeep.co February 23, 2020 at 01:48AM

Show HN: Single-Instruction (Subleq) Programming Game https://news.ycombinator.com/item?id=22392965

Show HN: Single-Instruction (Subleq) Programming Game https://jaredkrinke.itch.io/sic-1 February 23, 2020 at 01:26AM

Show HN: AntiVirus Monitor – GitHub Action to combat false positives https://news.ycombinator.com/item?id=22390458

Show HN: AntiVirus Monitor – GitHub Action to combat false positives https://github.com/billziss-gh/avm/releases/tag/v1.0 February 22, 2020 at 04:54PM