Best Audio Format for Storage?

stevecrox@kbin.social · 6 months ago

Best Audio Format for Storage?

stevecrox@kbin.social · edit-2 11 months ago

This is about thew new starter cost.

When a developer joins a team, they will not be as productive as they have to learn the code, frameworks, libraries, the project purpose, the tooling, etc… Often this impacts other members of the team lowering the entire teams productivity.

When you use productivity tracking (e.g. things like capacity planning) you will see the teams performance drop and it will take time for it to exceed the previous measured performance. This is the cost of adding a new starter.

So if it takes 6 weeks for a new starter to increase overall team producitivty then planning someone on a project for 4 weeks is pointless since the team will have a higher delivery rate without the extra person. This is typically why an organsation loses its ability to migrate staff between projects.

Code formating affects the layout of the code and our brains do all sorts of tricks around pattern recognition, so if your code formatting rules are too different a someone migrating between projects has to spend time looking for code and retraining their brain.

Its an additional barrier and a one within an organisations skills to remove (by forcing a common code standard).

stevecrox@kbin.social · 11 months ago

The last part is why you use an IDE.

Several of them will ingest prettier files to build code formatting rules

IDE support is normally a good way to work out what the wider community is using.

stevecrox@kbin.social · edit-2 11 months ago

Python is unique in formatting forms part of the syntax, every language has linters but its far more common for orgs to tweak the default rules .

For example Java has Checkstyle. The default rules ‘sun checks’ give a line length of 80, tabs are 4 spaces and everything is placed on a new line.

Junior devs inevitably want to trash the line length (honestly on 1080p monitors, 120 makes sense,).

There is always a new line/same line discussion (everyone perfers same line but there is always one die hard new line person).

The tab width discussion always has one junior dev complain that “tabs are better”, as someone who started development on Visual Studio 6 where half the team double spaced, the other half used tabs. Those people get a lecture from me on how we can convert tabs to spaces but not the inverse so it will always be spaces if I am near.

With Checkstyle you upload the rule file as an artifact into your M2 repository. Then you can pull it down as a dependency when the checkstyle plugin runs.

stevecrox@kbin.social · 11 months ago

I avoid any company that requires a software test before the interview.

I worked for a company that introduced them after I joined, I collected evidence all of the companies top performers wouldn’t have joined since we all had multiple offers and having to do the test would put people off applying. The scores from it didn’t correlate with interview results so it was being ignored by everyone. Still took 2 years to get rid of it.

The best place used STAR (Situation Task Action Result) based interviews. The goal was to ask questions until you got 2 stars.

I thought these were great because it was more varied and conversational but there was a comparable consistency accross interviewers.

You would inevitably get references to past work and you switch to asking a few questions about that. Since it was around a situation you would get more complete technical explanations (e.g. on that project I wrote an X and Y was really challenging because of Z).

I loved asking “Tell me about something your really proud off”. Even a nervous junior would start opening up after that question.

After an hour interview you would end up with enough information you could compare them against the company gradings (junior, senior, etc…).

This was important because it changed the attitude of the interview. It wasn’t a case of if the candidate would be a good senior dev for project X, but an assessment of the candidate. If they came out as a lead and we had a lead role, lets offer them that.

stevecrox@kbin.social · 1 year ago

Basic rule if someone claims X magically solves a problem they don’t follow X and are a huge generator of the problem.

For example people who claim they don’t need to write comments because they write self documenting code are the people that use variable names x1,x2,y, etc…

Similarly anyone you meet claiming Test Driven Development means they have better tests will write code with appalling code coverage and epically bad tests.

stevecrox@kbin.social · 1 year ago

Thats two hundred years and would cover the end of Plantagenet reign and the Tudor era.

Henry VIII reign happened during that period, at the beginning of your time period everyone would be catholic and at the end Queen Mary of Scotts was executed because the idea of a Catholic on the throne was unthinkable.

The UK is littered with castles and estates, normally they focus on specific historic events which happened at that location.

stevecrox@kbin.social · edit-2 1 year ago

This advice isn’t grounded in reality.

Management normally defines ways to track and judge itself, these are typically called Key Performance Indicators.

KPI’s are normally things like contract value growth, new contracts signed, profit margin, etc…

So if the project manager is meeting or exceeding their KPI’s and you walk up to their boss telling them the PM is failing as basic job functions, the boss won’t care.

This is because the boss might have set the KPI’s or the boss might also be judged on them. In either situation its to the bosses advantage to ignore you.

The boss will only care if there is a KPI you can demonstrate the PM failing to meet.

Every person/group will have various incentives and motivations. To affect change you have to understand what they are.

stevecrox@kbin.social · 1 year ago

A project manager has responsibility for delivery of a project but they typically lack domain specific knowledge. As a result they can’t directly deliver something, merely ask subject matter experts for advice and facilitate a team to deliver.

Most PM’s cope with the stress of this position poorly.

This cartoon is an example of micro management (a common coping mechanisim), the manager has involved themselves in the low level decisions because that gives a sense of control. If a technical team then tell them its a bad decison the team are effectively attacking their coping mechanisim.

The solution isn’t to tell them their technical idea is terrible, when you’ve fallen down this rabbit hole you have to treat the PM as a stakeholder. They are someone you have to manage, so a common solution is to give them confidence there is a path to delivery, a way to track and understand it.

stevecrox@kbin.social · edit-2 1 year ago

Do not mix tabs and spaces.

Its impossible to automate checking that tabs were only used for indentation and spacing for precise alignment. So you then take on a burden of manually checking

You end up with the issue where someone didn’t realise and space idented or anouther person used tabs for precise alignment and people forget to check the whitespace characters in review and it ends up going inconsistent and becoming a huge pile of technical debt to fix.

Use only one, you can automate enforcement and ensure the code renders consistency.

stevecrox@kbin.social · edit-2 1 year ago

deleted by creator

stevecrox@kbin.social · edit-2 1 year ago

Years ago there was no way to share IDE settings between developers.

You ended up with some developers choosing a tab width of 2 spaces, some choosing 4 spaces and as there was no linting enforcement some people using 2-4 spaces depending on their IDE settings.

This resulted in an unreadable mess as stuff was idented to all sorts of random levels.

It doesn’t matter if you use tabs or spaces as long as only one type is consistently used within a project.

Spaces tends to win because inevitably there are times you need to use spaces and so its difficult to ensure a project only uses tabs for identation.

IDE’s support converting tabs into spaces based on tab width and code formatting will ensure correct indentation. You can now have centralised IDE settings so everyone gets the same setup.

Honestly 99% of people don’t care about formatting (they only care when consistency isn’t enforced and code is hard to read), there is always one person who wants a 60 charracter line width or only tabs or double new lined parathensis. Who then sucks up huge amounts of the team time arguing their thing is a must while they code in emacs, unlike the rest of the team using an actual ide.

stevecrox@kbin.social · 1 year ago

I am actually arguing for a stable ABI.

The few times I have had to compile out of tree drivers for the linux kernel its usually failed because the ABI has changed.

Each time I have looked into it, I found code churn, e.g. changing an enum to a char (or the other way) or messing with the parameter order.

If I was empire of the world, the linux kernel would be built using conan.io, with device trees pulling down drivers as dependencies.

The Linux ABI Headers would move out into their own seperately managed project. Which is released and managed at its own rate. Subsystem maintainers would have to raise pull requests to change the ABI and changing a parameter from enum to char because you prefer chars wouldn’t be good enough.

Each subsystem would be its own “project” and with a logical repository structure (e.g. intel and amd gpu drivers don’t share code so why would they be in the same repo?) And built against the appropriate ABI version with each repository released at its own rate.

Unsupported drivers would then be forked into their own repositories. This simplifies depreciation since its external to the supported drivers and doesn’t need to be refactored or maintained. If distributions can build them and want to include the driver they can.

Linus job would be to maintain the core kernel, device trees and ABI projects and provide a bill of materials for a selection of linux kernel/abi/drivers version which are supported.

Lastly since every driver is a descrete buildable component, it would make it far easier for distributions to check if the driver is compatible (e.g. change a dependency version and build) with the kernel ABI they are using and provide new drivers with the build.

None of this will ever happen. C/C++ developers loath dependency management and people can ve stringly attached to mono repos for some reason.

stevecrox@kbin.social · edit-2 1 year ago

The linux kernel is very old school in how it is run and originally a big part of the DevSecOps movement was removing a lot of manual overhead.

Moving on to something like Gitea (codeberg) would give you a better diff view and is quicker/easier than posting a patch to a mailing list.

The branching model of the kernel is something people write up on paper that looks great (much like Gitflow) but is really time consuming to manage. Moving to feature branch workflow and creating a release branches as part of the release process allows a ton of things to be automated and simplified.

Similarly file systems aren’t really device specific, so you could build system tests for them for benchmarking and standard use cases.

Setting up a CI to perform smoke testing and linting, is fairly standard.

Its really easy to setup a CI to trigger when a new branch/pr is created/updated, this means review becomes reduced to checking business logic which makes reviews really quick and easy.

Similarly moving on to a decent issue tracker, Jira’s support for Epic’s/stories/tasks/capabilities and its linking ability is a huge simplifier for long term planning.

You can do things like define OKR’s and then attach Epics to them and Stories/tasks to epics which lets you track progress to goals.

You can use issues the way the linux community currently uses mailing lists.

Combined with a Kanban board for tracking, progress of tickets. You remove a ton of pain.

Although open source issue trackers are missing the key productivity enablers of Jira, which makes these improvements hard to realise.

The issue is people, the linux kernel maintainers have been working one way for decades. Getting them to adopt new tools will be heavily resisted, same with changing how they work.

Its like everyone outside, knows a breaking the ABI definition from the sub system implementation would create a far more stable ABI which would solve a bunch of issues and allow change when needed, except no one in the kernel will entertain the idea.

stevecrox@kbin.social · 1 year ago

During the pandemic I had some unoccupied python graduates I wanted to teach data engineering to.

Initially I had them implement REST wrappers around Apache OpenNLP and SpaCy and then compare the results of random data sets (project Gutenberg, sharepoint, etc…).

I ended up stealing a grad data scientist because we couldn’t find a difference (while there was a difference in confidence, the actual matches were identical).

SpaCy required 1vCPU and 12GiB of RAM to produce the same result as OpenNLP that was running on 0.5 vCPU and 4.5 GiB of RAM.

2 grads were assigned a Spring Boot/Camel/OpenNLP stack and 2 a Spacy/Flask application. It took both groups 4 weeks to get a working result.

The team slowly acquired lockdown staff so I introduced Minio/RabbitMQ/Nifi/Hadoop/Express/React and then different file types (not raw UTF-8, but what about doc, pdf, etc…) for NLP pipelines. They built a fairly complex NLP processing system with a data exploration UI.

I figured I had a group to help me figure out Python best approach in the space, but Python limitations just lead to stuff like needing a Kubernetes volume to host data.

Conversely none of the data scientists we acquired were willing to code in anything but Python.

I tried arguing in my company of the time there was a huge unsolved bit of market there (e.g. MLOP’s)

Alas unless you can show profit on the first customer no business would invest. Which is why I am trying to start a business.

stevecrox@kbin.social · 1 year ago

This is why Java rocks with ETL, the language is built to access files via input/output streams.

It means you don’t need to download a local copy of a file, you can drop it into a data lake (S3, HDFS, etc…) and pass around a URI reference.

Considering the size of Large Language Models I really am surprised at how poor streaming is handled within Python.

stevecrox@kbin.social · 1 year ago

Maven has unit and integration test phases and there are a multitude of plugins designed to hook into those phases but there are constraints by design.

Trying to hook everything into the build management system is a source of technical debt, your using a tool for something it wasn’t designed.

I would look at what makes sense within the build management system and what makes sense in a CI pipeline.

CI tools have different DSL and usually provide a means to manage environments. Certain integration and system level tests are best performed there.

For instance I keep system tests as a seperate managed project. The project can be executed from developer machines for local builds but I also create a small build pipeline to build the project, deploy it and run the system tests against it triggered by pull requests.

This is why I say the build management system doesn’t really change, because you should treat everything as descrete standalone components.

The Parent POM gets updates once every six months, the basic build verification CI pipeline only changes to the latest language release, etc…

Projects which try to embed gitflow into a pom or integrate CD into the gradle file are the unbuildable messes I get asked to fix.

stevecrox@kbin.social · edit-2 1 year ago

Maven has a high learning curve, but once learned it is incredibly simple to use.

That high bar is created by the tool configuration. You can change and hack everything, but you have to understand how Maven works to do so. This generally blocks people from doing really stupid things, because you have to learn how maven works to successfully modify it and in doing so you learn why you shouldn’t.

This is the exact weakness of Gradle, the barrier for modification is far lower and the tool is far less rigid. So you get lots of people who are still learning implement all sorts of weird and terrible practice.

The end result is I can usually dust off someone elses old maven project and it will build immediately using “mvn clean install”, about half the gradle projects I have been brought in on won’t without reverse engineering effort because they have things hard coded all over them. A not small percentage are so mangled they can’t be built without the dev who wrote it’s machine.

Also you really shouldn’t be tinkering with your build pipelines that much. Initial constraints determine the initial solution, then periodically you review them to improve. DevSecOps exists to speed development and ease support it isn’t a goal in of itself

stevecrox@kbin.social · edit-2 1 year ago

The admins to perform upgrades, monitoring, fixes, etc… will require root access to the database. That means they can alter all your posts to say *blah blah blah" if they wanted.

Similarly passwords will be encrypted within the database and encryption algorithms have to be able to go in both directions. Normally they need a seed value to start random generation. The admin defines the seed as a result an admin can decrypt everything in the database.

stevecrox@kbin.social · edit-2 1 year ago

@ergoplato I didn’t suggest that.

Personally I don’t think its ego. I think you have two issues.

The first is people go through stages learning DevOps. Stage 1 has people deploy a CI because its cool, they build a few basic pipelines and then 90% of people get bored. The 2nd stage is people start extending those pipelines, it results in really complex pipelines requiring lots of unique changes based on the opinion of the writer. You move to the 3rd stage when your asked to recreate/extend for a new project and realise how specific your solutions are.

Learning how to make minor tweaks and hook in a few key points to get what you want takes years. Without that most packagers will want to make big changes upstream which won’t go down well.

The second issue, I have met quite a few developers who become highly stressed when the build system is doing something they haven’t needed to do or understand.

A really simple example I have a Jenkins function which I tend to slip into release pipelines, it captures the release version and creates a version in Jira.

I normally deploy it first as a test before a few other functions to automate various service management requirements.

Its surprising how many devs will suddenly decide every problem (test failed, code failed review, sharepoint breaks, bad os update, etc…) is due to that function.

For me this little function is a test, if the team don’t care I will work to integrate various bits. If they freak out, I’ll revert decide if it is worth walking them through the process or walk away.

stevecrox@kbin.social · edit-2 1 year ago

One of the reasons for the #DevOps movement is developers see building and packaging as #notmyjob.

The task would historically fall on the most junior member of the team, who would make a pigs ear out of it due to complete lack of experience.

This is compounded by the issue that most C/C++ build systems don’t really include dependency management.

Linux distributions have all tried to work out those dependency trees but they came up with slightly different solutions. This is why there are a few “root” distributions everything branches from.

That means developers have to learn about a few root distributions to design a deb/rpm/aur package systems to base their release around.

That is a considerable amount of learning in a subject most aren’t interested in.

The real question is why don’t package maintainers upstream a packaging solution?

stevecrox

Best Audio Format for Storage?

Best Audio Format for Storage?