Master de II. ULL. 1er cuatrimestre
Véase crguezl/gh-org-members/gh-org-members.js
This is an extension that shows information about the people inside an organization. The initial version was written using the GitHub REST API. The performance was poor due to chaining of subsequent requests. Parallelizing the requests to the REST API gave quite an improvement but changing to GraphQL was much better and since we were able to reduce all the REST requests to a single GraphQL one and furthermore we got rid of the need to use parallelism.
REST Seq | REST Parallel | GraphQL |
---|---|---|
12,754 | 3,3 | 1,5 |
the sequential version is in the branch sequential
:
Forn the gh help api
:
For GraphQL requests, … In
--paginate
mode, all pages of results will sequentially be requested until there are no more pages of results. For GraphQL requests, this requires that the original query accepts an$endCursor: String
variable and that it fetches thepageInfo{ hasNextPage, endCursor }
set of fields from a collection.
A git repository can support multiple working trees, allowing you to check out more than one branch at a time.
With git worktree add a new working tree is associated with the repository.
This new working tree is called a “linked working tree” as opposed to the “main working tree” prepared by git-init
or git-clone
.
A repository has one main working tree (if it’s not a bare repository) and zero or more linked working trees. When you are done with a linked working tree, remove it with git worktree remove
.
1
2
3
$ git worktree add ../gh-org-members-seq sequential
$ code ../gh-org-members-seq
$ git branch -a
If you want to know more about what the git worktree
command does, see the video Manage Branches easily using Git Worktree
Improvement of the algorithm unifying in a single query all the queries to check when repos are empty.
In the current version, the first query to make the search (option -s
) is still implemetned using an API REST query,
Times for only queries to search for the repos and check what are empty:
REST -n | GraphQL -n |
---|---|
11,8 | 2,1 |
Plus downloading 16 repos:
REST | GraphQL | Parallel | Par and all GQL |
---|---|---|---|
33,9 | 24 | 10,5 | 9,6 |
The column para and all GQL corresponds to the version where all the queries are implemented using graphql (plus parallelism). The best time for my laptop was obtained using 10 processes.
The problem with parallelizing the downloads of submodules is that when we put concurrent git submodule
s to work, git
complains
due to the race conditions that arise. Here is a capture of such kind of complaints:
1
2
3
4
5
Otro proceso git parece estar corriendo en el repositorio, es decir
un editor abierto con 'git commit'. Por favor asegúrese de que todos los procesos
están terminados y vuelve a intentar. Si el fallo permanece, un proceso git
puede haber roto el repositorio antes:
borra el archivo manualmente para continuar.
Fortunately git clone
can be done in parallel and so the idea was to make a parallel battery of git clone
s
and later to sequentially perform the git submodule add
commands.
The man page says:
git submodule [-f|--force] [--name <name>] [--depth <depth>] [--] <repository> [<path>]
adds the given <repository>
as a submodule at the given <path>
to the changeset to be
committed next to the current project: the current project is termed the superproject.
If <path>
exists and is already a valid Git repository, then it is staged for commit without cloning.
Since the repositories were cloned independently and later added as submodules the old setups have the submodules .git
directory inside the submodule instead of embedded into the superprojects git directory which is the way submodules work.
This forces us to issue an additional git submodule absorbgitdirs
commands per submodule folder.
This is what git submodule absorbgitdirs
does:
If a
.git
directory of a submodule is inside the submodule, move the.git
directory of the submodule into its superproject’s.git/modules
path and then connect the.git
directory and its working directory by setting thecore.worktree
[^1] and adding a.git
file pointing to the.git
directory embedded in the superprojects git directory. This command is recursive by default.
See also the question https://stackoverflow.com/questions/36386667/how-to-make-an-existing-directory-within-a-git-repository-a-git-submodule
Here is a table showing the variation of the time in my laptop when the number of concurrent processes varies:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|
23,4 | 16, | 14,3 | 13,2 | 12,5 | 12,9 | 11,3 | 11,0 | 10,3 | 10,9 |
Some of you have already written a gh extension using Node.JS. Let’s have a look at these.
Instructions about the delivery:
Create a repo for your extension in a repo ULL-MII-SYTWS-2122/gh-my-extension-name
inside the classroom organization ULL-MII-SYTWS-2122
. Add that repo as a git submodule
to the repo associated to this lab assignment. Just leave the link to the assignment and extensions repos in the campus virtual
Continue studying the basics of express and GraphQL
See the hello graphQL example at https://github.com/crguezl/learning-graphql-with-gh/tree/main/graphql-beginner-series